Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
At 02/03/2017 01:08 PM, Christoph Anton Mitterer wrote: Hey Qu. I'm sending this off-list for privacy reasons of the contained file- names / hash sums. It doesn't contained anything particularly secret, nevertheless, please try to avoid spreading it too far around and delete it once you don't need it anymore :) On Thu, 2017-02-02 at 10:12 +0800, Qu Wenruo wrote: Would you please to check the latest lowmem_tests branch? https://github.com/adam900710/btrfs-progs/tree/lowmem_tests Invocation was as usual: # btrfs check --mode=lowmem /dev/nbd0 ; echo $? stdout/err output in the attachment Great thanks for that! Although I made a mistake in the debug patch, the output info is still good enough for me to catch the bug in inline extent check. I also added missing error message output for other places I found, and updated the branch, the name remains as "lowmem_tests" Please try it. Thanks, Qu Sorry for the extra trouble. No worries :) Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: create helper for processing bits on contiguous pages
This introduces a new helper which can be used to process pages bits. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 37 ++--- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d5f3edb..2ef2d05 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1726,28 +1726,22 @@ STATIC u64 find_lock_delalloc_range(struct inode *inode, return found; } -void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end, -u64 delalloc_end, struct page *locked_page, -unsigned clear_bits, -unsigned long page_ops) +static void __process_pages_contig(struct address_space *mapping, + struct page *locked_page, + pgoff_t start_index, pgoff_t end_index, + unsigned long page_ops) { - struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; - int ret; + unsigned long nr_pages = end_index - start_index + 1; + pgoff_t index = start_index; struct page *pages[16]; - unsigned long index = start >> PAGE_SHIFT; - unsigned long end_index = end >> PAGE_SHIFT; - unsigned long nr_pages = end_index - index + 1; + unsigned ret; int i; - clear_extent_bit(tree, start, end, clear_bits, 1, 0, NULL, GFP_NOFS); - if (page_ops == 0) - return; - if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) - mapping_set_error(inode->i_mapping, -EIO); + mapping_set_error(mapping, -EIO); while (nr_pages > 0) { - ret = find_get_pages_contig(inode->i_mapping, index, + ret = find_get_pages_contig(mapping, index, min_t(unsigned long, nr_pages, ARRAY_SIZE(pages)), pages); for (i = 0; i < ret; i++) { @@ -1777,6 +1771,19 @@ void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end, } } +void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end, +u64 delalloc_end, struct page *locked_page, +unsigned clear_bits, +unsigned long page_ops) +{ + clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, clear_bits, 1, 0, +NULL, GFP_NOFS); + + __process_pages_contig(inode->i_mapping, locked_page, + start >> PAGE_SHIFT, end >> PAGE_SHIFT, + page_ops); +} + /* * count the number of bytes in the tree that have a given bit(s) * set. This can be fairly slow, except for EXTENT_DIRTY which is -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: use helper to simplify lock/unlock pages
Since we have a helper to set page bits, let lock_delalloc_pages and __unlock_for_delalloc use it. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 122 ++- fs/btrfs/extent_io.h | 3 +- 2 files changed, 54 insertions(+), 71 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2ef2d05..7e9639f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1549,33 +1549,24 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree, return found; } +static int __process_pages_contig(struct address_space *mapping, + struct page *locked_page, + pgoff_t start_index, pgoff_t end_index, + unsigned long page_ops, pgoff_t *index_ret); + static noinline void __unlock_for_delalloc(struct inode *inode, struct page *locked_page, u64 start, u64 end) { - int ret; - struct page *pages[16]; unsigned long index = start >> PAGE_SHIFT; unsigned long end_index = end >> PAGE_SHIFT; - unsigned long nr_pages = end_index - index + 1; - int i; + ASSERT(locked_page); if (index == locked_page->index && end_index == index) return; - while (nr_pages > 0) { - ret = find_get_pages_contig(inode->i_mapping, index, -min_t(unsigned long, nr_pages, -ARRAY_SIZE(pages)), pages); - for (i = 0; i < ret; i++) { - if (pages[i] != locked_page) - unlock_page(pages[i]); - put_page(pages[i]); - } - nr_pages -= ret; - index += ret; - cond_resched(); - } + __process_pages_contig(inode->i_mapping, locked_page, index, end_index, + PAGE_UNLOCK, NULL); } static noinline int lock_delalloc_pages(struct inode *inode, @@ -1584,59 +1575,19 @@ static noinline int lock_delalloc_pages(struct inode *inode, u64 delalloc_end) { unsigned long index = delalloc_start >> PAGE_SHIFT; - unsigned long start_index = index; + unsigned long index_ret = index; unsigned long end_index = delalloc_end >> PAGE_SHIFT; - unsigned long pages_locked = 0; - struct page *pages[16]; - unsigned long nrpages; int ret; - int i; - /* the caller is responsible for locking the start index */ + ASSERT(locked_page); if (index == locked_page->index && index == end_index) return 0; - /* skip the page at the start index */ - nrpages = end_index - index + 1; - while (nrpages > 0) { - ret = find_get_pages_contig(inode->i_mapping, index, -min_t(unsigned long, -nrpages, ARRAY_SIZE(pages)), pages); - if (ret == 0) { - ret = -EAGAIN; - goto done; - } - /* now we have an array of pages, lock them all */ - for (i = 0; i < ret; i++) { - /* -* the caller is taking responsibility for -* locked_page -*/ - if (pages[i] != locked_page) { - lock_page(pages[i]); - if (!PageDirty(pages[i]) || - pages[i]->mapping != inode->i_mapping) { - ret = -EAGAIN; - unlock_page(pages[i]); - put_page(pages[i]); - goto done; - } - } - put_page(pages[i]); - pages_locked++; - } - nrpages -= ret; - index += ret; - cond_resched(); - } - ret = 0; -done: - if (ret && pages_locked) { - __unlock_for_delalloc(inode, locked_page, - delalloc_start, - ((u64)(start_index + pages_locked - 1)) << - PAGE_SHIFT); - } + ret = __process_pages_contig(inode->i_mapping, locked_page, index, +end_index, PAGE_LOCK, &index_ret); + if (ret == -EAGAIN) + __unlock_for_delalloc(inode, locked_page, delalloc_start, + (u64)index_ret << PAGE_SHIFT); return ret; } @@ -1726,17 +1677,24 @@ STATIC u64 find_lock_delalloc_range(struct inode *inode, return found; }
Re: [REVERT][v4.10][V2] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
On 02/02/2017 03:57 PM, Greg KH wrote: > On Thu, Feb 02, 2017 at 02:58:16PM -0500, Joseph Salisbury wrote: >> On 02/02/2017 01:23 PM, Greg KH wrote: >>> On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote: Hello, Please consider reverting commit 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release. >>> What release can I remove it from? >>> >>> It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's >>> reverted in Linus's tree already? >>> >>> totally confused. >>> >>> greg k-h >> Sorry, commit 4c63c2454eff996c5e27991221106eb511f7db38 was introduced in >> mainline in v4.7-rc1. The request should be only for kernels newer than >> 4.7-rc1, so 4.9 and 4.10. >> >> I assume it will come down to 4.9 once reverted in mainline. > Let me know when it is reverted, and what that revert commit is please. > > thanks, > > greg k-h Will do. Thanks again, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REVERT][v4.10][V2] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
On Thu, Feb 02, 2017 at 02:58:16PM -0500, Joseph Salisbury wrote: > On 02/02/2017 01:23 PM, Greg KH wrote: > > On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote: > >> Hello, > >> > >> Please consider reverting commit > >> 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release. > > What release can I remove it from? > > > > It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's > > reverted in Linus's tree already? > > > > totally confused. > > > > greg k-h > > Sorry, commit 4c63c2454eff996c5e27991221106eb511f7db38 was introduced in > mainline in v4.7-rc1. The request should be only for kernels newer than > 4.7-rc1, so 4.9 and 4.10. > > I assume it will come down to 4.9 once reverted in mainline. Let me know when it is reverted, and what that revert commit is please. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[REVERT][v4.10][V2] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
On 02/02/2017 01:23 PM, Greg KH wrote: > On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote: >> Hello, >> >> Please consider reverting commit >> 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release. > What release can I remove it from? > > It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's > reverted in Linus's tree already? > > totally confused. > > greg k-h Sorry, commit 4c63c2454eff996c5e27991221106eb511f7db38 was introduced in mainline in v4.7-rc1. The request should be only for kernels newer than 4.7-rc1, so 4.9 and 4.10. I assume it will come down to 4.9 once reverted in mainline. I updated the subject. Thanks, Joe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove duplicated find_get_pages_contig
On Thu, Feb 02, 2017 at 11:11:15AM -0800, Liu Bo wrote: > On Thu, Feb 02, 2017 at 07:42:43PM +0100, David Sterba wrote: > > On Mon, Jan 30, 2017 at 12:26:30PM -0800, Liu Bo wrote: > > > This creates a helper to manipulate page bits to avoid duplicate uses. > > > > This seems too short for what the patch does. It adds a new page ops bit > > that would deserve a separate patch. Please try to split it to smaller > > parts. > > Hmm..the newly added 'PAGE_LOCK' is only used in lock_delalloc_pages in > order to make the helper lock pages. > > I could firstly introduce the helper from extent_clear_unlock_delalloc() > , then introduce the new 'PAGE_LOCK' and make lock_delalloc_pages and > __unlock_for_delalloc to use the helper, does it sound better to review > and bisect? This sounds better, thanks. The reason is mostly from the review side. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/24] fs: Provide infrastructure for dynamic BDIs in filesystems
Hi, On Thu, Feb 02, 2017 at 06:34:02PM +0100, Jan Kara wrote: > Provide helper functions for setting up dynamically allocated > backing_dev_info structures for filesystems and cleaning them up on > superblock destruction. Just one concern, will this cause problems for multiple superblock cases like nfs with nosharecache? Thanks, -liubo > > CC: linux-...@lists.infradead.org > CC: linux-...@vger.kernel.org > CC: Petr Vandrovec > CC: linux-ni...@vger.kernel.org > CC: cluster-de...@redhat.com > CC: osd-...@open-osd.org > CC: codal...@coda.cs.cmu.edu > CC: linux-...@lists.infradead.org > CC: ecryp...@vger.kernel.org > CC: linux-c...@vger.kernel.org > CC: ceph-de...@vger.kernel.org > CC: linux-btrfs@vger.kernel.org > CC: v9fs-develo...@lists.sourceforge.net > CC: lustre-de...@lists.lustre.org > Signed-off-by: Jan Kara > --- > fs/super.c | 49 > > include/linux/backing-dev-defs.h | 2 +- > include/linux/fs.h | 6 + > 3 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/fs/super.c b/fs/super.c > index ea662b0e5e78..31dc4c6450ef 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -446,6 +446,11 @@ void generic_shutdown_super(struct super_block *sb) > hlist_del_init(&sb->s_instances); > spin_unlock(&sb_lock); > up_write(&sb->s_umount); > + if (sb->s_iflags & SB_I_DYNBDI) { > + bdi_put(sb->s_bdi); > + sb->s_bdi = &noop_backing_dev_info; > + sb->s_iflags &= ~SB_I_DYNBDI; > + } > } > > EXPORT_SYMBOL(generic_shutdown_super); > @@ -1249,6 +1254,50 @@ mount_fs(struct file_system_type *type, int flags, > const char *name, void *data) > } > > /* > + * Setup private BDI for given superblock. I gets automatically cleaned up > + * in generic_shutdown_super(). > + */ > +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...) > +{ > + struct backing_dev_info *bdi; > + int err; > + va_list args; > + > + bdi = bdi_alloc(GFP_KERNEL); > + if (!bdi) > + return -ENOMEM; > + > + bdi->name = sb->s_type->name; > + > + va_start(args, fmt); > + err = bdi_register_va(bdi, NULL, fmt, args); > + va_end(args); > + if (err) { > + bdi_put(bdi); > + return err; > + } > + WARN_ON(sb->s_bdi != &noop_backing_dev_info); > + sb->s_bdi = bdi; > + sb->s_iflags |= SB_I_DYNBDI; > + > + return 0; > +} > +EXPORT_SYMBOL(super_setup_bdi_name); > + > +/* > + * Setup private BDI for given superblock. I gets automatically cleaned up > + * in generic_shutdown_super(). > + */ > +int super_setup_bdi(struct super_block *sb) > +{ > + static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); > + > + return super_setup_bdi_name(sb, "%.28s-%ld", sb->s_type->name, > + atomic_long_inc_return(&bdi_seq)); > +} > +EXPORT_SYMBOL(super_setup_bdi); > + > +/* > * This is an internal function, please use sb_end_{write,pagefault,intwrite} > * instead. > */ > diff --git a/include/linux/backing-dev-defs.h > b/include/linux/backing-dev-defs.h > index 2ecafc8a2d06..70080b4217f4 100644 > --- a/include/linux/backing-dev-defs.h > +++ b/include/linux/backing-dev-defs.h > @@ -143,7 +143,7 @@ struct backing_dev_info { > congested_fn *congested_fn; /* Function pointer if device is md/dm */ > void *congested_data; /* Pointer to aux data for congested func */ > > - char *name; > + const char *name; > > struct kref refcnt; /* Reference counter for the structure */ > unsigned int registered:1; /* Is bdi registered? */ > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c930cbc19342..8ed8b6d1bc54 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1267,6 +1267,9 @@ struct mm_struct; > /* sb->s_iflags to limit user namespace mounts */ > #define SB_I_USERNS_VISIBLE 0x0010 /* fstype already mounted */ > > +/* Temporary flag until all filesystems are converted to dynamic bdis */ > +#define SB_I_DYNBDI 0x0100 > + > /* Possible states of 'frozen' field */ > enum { > SB_UNFROZEN = 0,/* FS is unfrozen */ > @@ -2103,6 +2106,9 @@ extern int vfs_ustat(dev_t, struct kstatfs *); > extern int freeze_super(struct super_block *super); > extern int thaw_super(struct super_block *super); > extern bool our_mnt(struct vfsmount *mnt); > +extern __printf(2, 3) > +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); > +extern int super_setup_bdi(struct super_block *sb); > > extern int current_umask(void); > > -- > 2.10.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove duplicated find_get_pages_contig
On Thu, Feb 02, 2017 at 07:42:43PM +0100, David Sterba wrote: > On Mon, Jan 30, 2017 at 12:26:30PM -0800, Liu Bo wrote: > > This creates a helper to manipulate page bits to avoid duplicate uses. > > This seems too short for what the patch does. It adds a new page ops bit > that would deserve a separate patch. Please try to split it to smaller > parts. Hmm..the newly added 'PAGE_LOCK' is only used in lock_delalloc_pages in order to make the helper lock pages. I could firstly introduce the helper from extent_clear_unlock_delalloc() , then introduce the new 'PAGE_LOCK' and make lock_delalloc_pages and __unlock_for_delalloc to use the helper, does it sound better to review and bisect? > > > +static noinline void __unlock_for_delalloc(struct inode *inode, > > + struct page *locked_page, > > + u64 start, u64 end) > > +{ > > + unsigned long page_ops = PAGE_UNLOCK; > > Used only once, not necessary IMO. OK. > > > + > > + ASSERT(locked_page); > > + __process_pages_contig(inode->i_mapping, locked_page, > > + start >> PAGE_SHIFT, end >> PAGE_SHIFT, > > + page_ops, NULL); > > +} > > + > > +static noinline int lock_delalloc_pages(struct inode *inode, > > + struct page *locked_page, > > + u64 delalloc_start, > > + u64 delalloc_end) > > +{ > > + pgoff_t index = delalloc_start >> PAGE_SHIFT; > > + pgoff_t start_index = index; > > + pgoff_t end_index = delalloc_end >> PAGE_SHIFT; > > + unsigned long page_ops = PAGE_LOCK; > > Same here. > OK. Thanks, -liubo > > + int ret = 0; > > + > > + ASSERT(locked_page); > > + > > + ret = __process_pages_contig(inode->i_mapping, locked_page, start_index, > > + end_index, page_ops, &index); > > + if (ret == -EAGAIN) { > > + __unlock_for_delalloc(inode, locked_page, delalloc_start, > > + ((u64)index) << PAGE_SHIFT); > > } > > + > > return ret; > > } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix assertion failure when freeing block groups at close_ctree()
On Thu, Feb 02, 2017 at 05:56:33PM +, fdman...@kernel.org wrote: > From: Filipe Manana > > At close_ctree() we free the block groups and then only after we wait for > any running worker kthreads to finish and shutdown the workqueues. This > behaviour is racy and it triggers an assertion failure when freeing block > groups because while we are doing it we can have for example a block group > caching kthread running, and in that case the block group's reference > count can still be greater than 1 by the time we assert its reference count > is 1, leading to an assertion failure: > > [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: > fs/btrfs/extent-tree.c, line: 9799 > [19041.200584] [ cut here ] > [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! > [19041.202830] invalid opcode: [#1] PREEMPT SMP > [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod > crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis > parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop > autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi > ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last > unloaded: btrfs] > [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted > 4.9.0-rc7-btrfs-next-36+ #1 > [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 > [19041.208082] task: 88015f028980 task.stack: c9000ad34000 > [19041.208082] RIP: 0010:[] [] > assfail.constprop.41+0x1c/0x1e [btrfs] > [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 > [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: > 0001 > [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: > > [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: > > [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: > 88023431d170 > [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: > 88015ecb4100 > [19041.208082] FS: 7f44f3d42840() GS:88023f38() > knlGS: > [19041.208082] CS: 0010 DS: ES: CR0: 80050033 > [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: > 06e0 > [19041.208082] Stack: > [19041.208082] c9000ad37d98 a035989f 88015ecb4000 > 88015ecb5630 > [19041.208082] 88014f6be000 7ffcf0ba6a10 > c9000ad37df8 > [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 > 811a634d > [19041.208082] Call Trace: > [19041.208082] [] btrfs_free_block_groups+0x17f/0x392 > [btrfs] > [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] > [19041.208082] [] ? evict_inodes+0x132/0x141 > [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] > [19041.208082] [] generic_shutdown_super+0x6a/0xeb > [19041.208082] [] kill_anon_super+0x12/0x1c > [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] > [19041.208082] [] deactivate_locked_super+0x3b/0x68 > [19041.208082] [] deactivate_super+0x36/0x39 > [19041.208082] [] cleanup_mnt+0x58/0x76 > [19041.208082] [] __cleanup_mnt+0x12/0x14 > [19041.208082] [] task_work_run+0x6f/0x95 > [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 > [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 > [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad > [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 > c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> > 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 > [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e > [btrfs] > [19041.208082] RSP > [19041.279264] ---[ end trace 23330586f16f064d ]--- > > This started happening as of kernel 4.8, since commit f3bca8028bd9 > ("Btrfs: add ASSERT for block group's memory leak") introduced these > assertions. > > So fix this by freeing the block groups only after waiting for all > worker kthreads to complete and shutdown the workqueues. Reviewed-by: Liu Bo Thanks, -liubo > > Signed-off-by: Filipe Manana > --- > > v2: Make error path of open_ctree() also stop all workers before freeing > the block groups. Assert that when freeing block groups, no block > group can be in the caching started state. > > fs/btrfs/disk-io.c | 6 +++--- > fs/btrfs/extent-tree.c | 9 ++--- > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 066d9b9..bf54d7d 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3263,7 +3263,6 @@ int open_ctree(struct super_block *sb, > > fail_block_groups: > btrfs_put_block_group_cache(fs_info); > - btrfs_free_block_groups(fs_info); > > fail_tree_roots: > free_root_pointers(fs_info, 1); > @@ -3271,6 +3270,7 @@ int open_ctree(struct
Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()
On Thu, Feb 02, 2017 at 06:58:03PM +, Filipe Manana wrote: > On Thu, Feb 2, 2017 at 6:53 PM, Liu Bo wrote: > > On Thu, Feb 02, 2017 at 06:32:01PM +, Filipe Manana wrote: > >> On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo wrote: > >> > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote: > >> >> From: Filipe Manana > >> >> > >> >> At close_ctree() we free the block groups and then only after we wait > >> >> for > >> >> any running worker kthreads to finish and shutdown the workqueues. This > >> >> behaviour is racy and it triggers an assertion failure when freeing > >> >> block > >> >> groups because while we are doing it we can have for example a block > >> >> group > >> >> caching kthread running, and in that case the block group's reference > >> >> count is greater than 1, leading to an assertion failure: > >> >> > >> >> [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, > >> >> file: fs/btrfs/extent-tree.c, line: 9799 > >> >> [19041.200584] [ cut here ] > >> >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! > >> >> [19041.202830] invalid opcode: [#1] PREEMPT SMP > >> >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod > >> >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev > >> >> tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor > >> >> button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod > >> >> ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio > >> >> e1000 scsi_mod floppy [last unloaded: btrfs] > >> >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted > >> >> 4.9.0-rc7-btrfs-next-36+ #1 > >> >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > >> >> BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 > >> >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000 > >> >> [19041.208082] RIP: 0010:[] [] > >> >> assfail.constprop.41+0x1c/0x1e [btrfs] > >> >> [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 > >> >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: > >> >> 0001 > >> >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: > >> >> > >> >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: > >> >> > >> >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: > >> >> 88023431d170 > >> >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: > >> >> 88015ecb4100 > >> >> [19041.208082] FS: 7f44f3d42840() GS:88023f38() > >> >> knlGS: > >> >> [19041.208082] CS: 0010 DS: ES: CR0: 80050033 > >> >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: > >> >> 06e0 > >> >> [19041.208082] Stack: > >> >> [19041.208082] c9000ad37d98 a035989f 88015ecb4000 > >> >> 88015ecb5630 > >> >> [19041.208082] 88014f6be000 7ffcf0ba6a10 > >> >> c9000ad37df8 > >> >> [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 > >> >> 811a634d > >> >> [19041.208082] Call Trace: > >> >> [19041.208082] [] > >> >> btrfs_free_block_groups+0x17f/0x392 [btrfs] > >> >> [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] > >> >> [19041.208082] [] ? evict_inodes+0x132/0x141 > >> >> [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] > >> >> [19041.208082] [] generic_shutdown_super+0x6a/0xeb > >> >> [19041.208082] [] kill_anon_super+0x12/0x1c > >> >> [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] > >> >> [19041.208082] [] deactivate_locked_super+0x3b/0x68 > >> >> [19041.208082] [] deactivate_super+0x36/0x39 > >> >> [19041.208082] [] cleanup_mnt+0x58/0x76 > >> >> [19041.208082] [] __cleanup_mnt+0x12/0x14 > >> >> [19041.208082] [] task_work_run+0x6f/0x95 > >> >> [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 > >> >> [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 > >> >> [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad > >> >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 > >> >> f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 > >> >> d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 > >> >> [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e > >> >> [btrfs] > >> >> [19041.208082] RSP > >> >> [19041.279264] ---[ end trace 23330586f16f064d ]--- > >> >> > >> >> This started happening as of kernel 4.8, since commit f3bca8028bd9 > >> >> ("Btrfs: add ASSERT for block group's memory leak") introduced these > >> >> assertions. > >> >> > >> >> So fix this by freeing the block groups only after waiting for all > >> >> worker kthreads to complete and shutdown the workqueues. > >> > > >> > This looks good to me, but I don't understand how that could happen, if > >> > a block group is being cached by the caching worker thread,
Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()
On Thu, Feb 2, 2017 at 6:53 PM, Liu Bo wrote: > On Thu, Feb 02, 2017 at 06:32:01PM +, Filipe Manana wrote: >> On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo wrote: >> > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote: >> >> From: Filipe Manana >> >> >> >> At close_ctree() we free the block groups and then only after we wait for >> >> any running worker kthreads to finish and shutdown the workqueues. This >> >> behaviour is racy and it triggers an assertion failure when freeing block >> >> groups because while we are doing it we can have for example a block group >> >> caching kthread running, and in that case the block group's reference >> >> count is greater than 1, leading to an assertion failure: >> >> >> >> [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, >> >> file: fs/btrfs/extent-tree.c, line: 9799 >> >> [19041.200584] [ cut here ] >> >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! >> >> [19041.202830] invalid opcode: [#1] PREEMPT SMP >> >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod >> >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev >> >> tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor >> >> button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod >> >> ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio >> >> e1000 scsi_mod floppy [last unloaded: btrfs] >> >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted >> >> 4.9.0-rc7-btrfs-next-36+ #1 >> >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> >> BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 >> >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000 >> >> [19041.208082] RIP: 0010:[] [] >> >> assfail.constprop.41+0x1c/0x1e [btrfs] >> >> [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 >> >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: >> >> 0001 >> >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: >> >> >> >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: >> >> >> >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: >> >> 88023431d170 >> >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: >> >> 88015ecb4100 >> >> [19041.208082] FS: 7f44f3d42840() GS:88023f38() >> >> knlGS: >> >> [19041.208082] CS: 0010 DS: ES: CR0: 80050033 >> >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: >> >> 06e0 >> >> [19041.208082] Stack: >> >> [19041.208082] c9000ad37d98 a035989f 88015ecb4000 >> >> 88015ecb5630 >> >> [19041.208082] 88014f6be000 7ffcf0ba6a10 >> >> c9000ad37df8 >> >> [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 >> >> 811a634d >> >> [19041.208082] Call Trace: >> >> [19041.208082] [] btrfs_free_block_groups+0x17f/0x392 >> >> [btrfs] >> >> [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] >> >> [19041.208082] [] ? evict_inodes+0x132/0x141 >> >> [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] >> >> [19041.208082] [] generic_shutdown_super+0x6a/0xeb >> >> [19041.208082] [] kill_anon_super+0x12/0x1c >> >> [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] >> >> [19041.208082] [] deactivate_locked_super+0x3b/0x68 >> >> [19041.208082] [] deactivate_super+0x36/0x39 >> >> [19041.208082] [] cleanup_mnt+0x58/0x76 >> >> [19041.208082] [] __cleanup_mnt+0x12/0x14 >> >> [19041.208082] [] task_work_run+0x6f/0x95 >> >> [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 >> >> [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 >> >> [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad >> >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 >> >> f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 >> >> d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 >> >> [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e >> >> [btrfs] >> >> [19041.208082] RSP >> >> [19041.279264] ---[ end trace 23330586f16f064d ]--- >> >> >> >> This started happening as of kernel 4.8, since commit f3bca8028bd9 >> >> ("Btrfs: add ASSERT for block group's memory leak") introduced these >> >> assertions. >> >> >> >> So fix this by freeing the block groups only after waiting for all >> >> worker kthreads to complete and shutdown the workqueues. >> > >> > This looks good to me, but I don't understand how that could happen, if >> > a block group is being cached by the caching worker thread, the block >> > group cache has been marked as BTRFS_CACHE_STARTED so we should wait on >> > wait_block_group_cache_done() in btrfs_free_block_groups() before >> > getting to the ASSERT. Maybe something else broke? >> >> Simple. Look at extent-tree.c:caching_kthread
Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()
On Thu, Feb 02, 2017 at 06:32:01PM +, Filipe Manana wrote: > On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo wrote: > > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote: > >> From: Filipe Manana > >> > >> At close_ctree() we free the block groups and then only after we wait for > >> any running worker kthreads to finish and shutdown the workqueues. This > >> behaviour is racy and it triggers an assertion failure when freeing block > >> groups because while we are doing it we can have for example a block group > >> caching kthread running, and in that case the block group's reference > >> count is greater than 1, leading to an assertion failure: > >> > >> [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, > >> file: fs/btrfs/extent-tree.c, line: 9799 > >> [19041.200584] [ cut here ] > >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! > >> [19041.202830] invalid opcode: [#1] PREEMPT SMP > >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod > >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev > >> tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor > >> button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod > >> ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio > >> e1000 scsi_mod floppy [last unloaded: btrfs] > >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted > >> 4.9.0-rc7-btrfs-next-36+ #1 > >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > >> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 > >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000 > >> [19041.208082] RIP: 0010:[] [] > >> assfail.constprop.41+0x1c/0x1e [btrfs] > >> [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 > >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: > >> 0001 > >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: > >> > >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: > >> > >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: > >> 88023431d170 > >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: > >> 88015ecb4100 > >> [19041.208082] FS: 7f44f3d42840() GS:88023f38() > >> knlGS: > >> [19041.208082] CS: 0010 DS: ES: CR0: 80050033 > >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: > >> 06e0 > >> [19041.208082] Stack: > >> [19041.208082] c9000ad37d98 a035989f 88015ecb4000 > >> 88015ecb5630 > >> [19041.208082] 88014f6be000 7ffcf0ba6a10 > >> c9000ad37df8 > >> [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 > >> 811a634d > >> [19041.208082] Call Trace: > >> [19041.208082] [] btrfs_free_block_groups+0x17f/0x392 > >> [btrfs] > >> [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] > >> [19041.208082] [] ? evict_inodes+0x132/0x141 > >> [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] > >> [19041.208082] [] generic_shutdown_super+0x6a/0xeb > >> [19041.208082] [] kill_anon_super+0x12/0x1c > >> [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] > >> [19041.208082] [] deactivate_locked_super+0x3b/0x68 > >> [19041.208082] [] deactivate_super+0x36/0x39 > >> [19041.208082] [] cleanup_mnt+0x58/0x76 > >> [19041.208082] [] __cleanup_mnt+0x12/0x14 > >> [19041.208082] [] task_work_run+0x6f/0x95 > >> [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 > >> [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 > >> [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad > >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 > >> 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 > >> <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 > >> [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e > >> [btrfs] > >> [19041.208082] RSP > >> [19041.279264] ---[ end trace 23330586f16f064d ]--- > >> > >> This started happening as of kernel 4.8, since commit f3bca8028bd9 > >> ("Btrfs: add ASSERT for block group's memory leak") introduced these > >> assertions. > >> > >> So fix this by freeing the block groups only after waiting for all > >> worker kthreads to complete and shutdown the workqueues. > > > > This looks good to me, but I don't understand how that could happen, if > > a block group is being cached by the caching worker thread, the block > > group cache has been marked as BTRFS_CACHE_STARTED so we should wait on > > wait_block_group_cache_done() in btrfs_free_block_groups() before > > getting to the ASSERT. Maybe something else broke? > > Simple. Look at extent-tree.c:caching_kthread() - the the caching caching_thread() vs caching_kthread(), free space cache vs inode cache, confusing helper names... > state is updated
Re: [PATCH] Btrfs: remove duplicated find_get_pages_contig
On Mon, Jan 30, 2017 at 12:26:30PM -0800, Liu Bo wrote: > This creates a helper to manipulate page bits to avoid duplicate uses. This seems too short for what the patch does. It adds a new page ops bit that would deserve a separate patch. Please try to split it to smaller parts. > +static noinline void __unlock_for_delalloc(struct inode *inode, > +struct page *locked_page, > +u64 start, u64 end) > +{ > + unsigned long page_ops = PAGE_UNLOCK; Used only once, not necessary IMO. > + > + ASSERT(locked_page); > + __process_pages_contig(inode->i_mapping, locked_page, > +start >> PAGE_SHIFT, end >> PAGE_SHIFT, > +page_ops, NULL); > +} > + > +static noinline int lock_delalloc_pages(struct inode *inode, > + struct page *locked_page, > + u64 delalloc_start, > + u64 delalloc_end) > +{ > + pgoff_t index = delalloc_start >> PAGE_SHIFT; > + pgoff_t start_index = index; > + pgoff_t end_index = delalloc_end >> PAGE_SHIFT; > + unsigned long page_ops = PAGE_LOCK; Same here. > + int ret = 0; > + > + ASSERT(locked_page); > + > + ret = __process_pages_contig(inode->i_mapping, locked_page, start_index, > +end_index, page_ops, &index); > + if (ret == -EAGAIN) { > + __unlock_for_delalloc(inode, locked_page, delalloc_start, > + ((u64)index) << PAGE_SHIFT); > } > + > return ret; > } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()
On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo wrote: > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> At close_ctree() we free the block groups and then only after we wait for >> any running worker kthreads to finish and shutdown the workqueues. This >> behaviour is racy and it triggers an assertion failure when freeing block >> groups because while we are doing it we can have for example a block group >> caching kthread running, and in that case the block group's reference >> count is greater than 1, leading to an assertion failure: >> >> [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, >> file: fs/btrfs/extent-tree.c, line: 9799 >> [19041.200584] [ cut here ] >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! >> [19041.202830] invalid opcode: [#1] PREEMPT SMP >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis >> parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop >> autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi >> ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last >> unloaded: btrfs] >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted >> 4.9.0-rc7-btrfs-next-36+ #1 >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000 >> [19041.208082] RIP: 0010:[] [] >> assfail.constprop.41+0x1c/0x1e [btrfs] >> [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: >> 0001 >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: >> >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: >> >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: >> 88023431d170 >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: >> 88015ecb4100 >> [19041.208082] FS: 7f44f3d42840() GS:88023f38() >> knlGS: >> [19041.208082] CS: 0010 DS: ES: CR0: 80050033 >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: >> 06e0 >> [19041.208082] Stack: >> [19041.208082] c9000ad37d98 a035989f 88015ecb4000 >> 88015ecb5630 >> [19041.208082] 88014f6be000 7ffcf0ba6a10 >> c9000ad37df8 >> [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 >> 811a634d >> [19041.208082] Call Trace: >> [19041.208082] [] btrfs_free_block_groups+0x17f/0x392 >> [btrfs] >> [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] >> [19041.208082] [] ? evict_inodes+0x132/0x141 >> [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] >> [19041.208082] [] generic_shutdown_super+0x6a/0xeb >> [19041.208082] [] kill_anon_super+0x12/0x1c >> [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] >> [19041.208082] [] deactivate_locked_super+0x3b/0x68 >> [19041.208082] [] deactivate_super+0x36/0x39 >> [19041.208082] [] cleanup_mnt+0x58/0x76 >> [19041.208082] [] __cleanup_mnt+0x12/0x14 >> [19041.208082] [] task_work_run+0x6f/0x95 >> [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 >> [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 >> [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 >> 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 >> <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 >> [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e >> [btrfs] >> [19041.208082] RSP >> [19041.279264] ---[ end trace 23330586f16f064d ]--- >> >> This started happening as of kernel 4.8, since commit f3bca8028bd9 >> ("Btrfs: add ASSERT for block group's memory leak") introduced these >> assertions. >> >> So fix this by freeing the block groups only after waiting for all >> worker kthreads to complete and shutdown the workqueues. > > This looks good to me, but I don't understand how that could happen, if > a block group is being cached by the caching worker thread, the block > group cache has been marked as BTRFS_CACHE_STARTED so we should wait on > wait_block_group_cache_done() in btrfs_free_block_groups() before > getting to the ASSERT. Maybe something else broke? Simple. Look at extent-tree.c:caching_kthread() - the the caching state is updated (to error or finished), but only much later (at the very end) the kthread drops its ref count on the block group. So the assertion you added in commit f3bca8028bd9 fails because the block group's ref count is 2 and not 1. So nothing broke, just the assertion made an incorrect assumption. But I think it's good having that and the other assertion
Re: Subject: [REVERT][v4.x.y] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote: > Hello, > > Please consider reverting commit > 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release. What release can I remove it from? It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's reverted in Linus's tree already? totally confused. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()
On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote: > From: Filipe Manana > > At close_ctree() we free the block groups and then only after we wait for > any running worker kthreads to finish and shutdown the workqueues. This > behaviour is racy and it triggers an assertion failure when freeing block > groups because while we are doing it we can have for example a block group > caching kthread running, and in that case the block group's reference > count is greater than 1, leading to an assertion failure: > > [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: > fs/btrfs/extent-tree.c, line: 9799 > [19041.200584] [ cut here ] > [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! > [19041.202830] invalid opcode: [#1] PREEMPT SMP > [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod > crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis > parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop > autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi > ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last > unloaded: btrfs] > [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted > 4.9.0-rc7-btrfs-next-36+ #1 > [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 > [19041.208082] task: 88015f028980 task.stack: c9000ad34000 > [19041.208082] RIP: 0010:[] [] > assfail.constprop.41+0x1c/0x1e [btrfs] > [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 > [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: > 0001 > [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: > > [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: > > [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: > 88023431d170 > [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: > 88015ecb4100 > [19041.208082] FS: 7f44f3d42840() GS:88023f38() > knlGS: > [19041.208082] CS: 0010 DS: ES: CR0: 80050033 > [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: > 06e0 > [19041.208082] Stack: > [19041.208082] c9000ad37d98 a035989f 88015ecb4000 > 88015ecb5630 > [19041.208082] 88014f6be000 7ffcf0ba6a10 > c9000ad37df8 > [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 > 811a634d > [19041.208082] Call Trace: > [19041.208082] [] btrfs_free_block_groups+0x17f/0x392 > [btrfs] > [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] > [19041.208082] [] ? evict_inodes+0x132/0x141 > [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] > [19041.208082] [] generic_shutdown_super+0x6a/0xeb > [19041.208082] [] kill_anon_super+0x12/0x1c > [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] > [19041.208082] [] deactivate_locked_super+0x3b/0x68 > [19041.208082] [] deactivate_super+0x36/0x39 > [19041.208082] [] cleanup_mnt+0x58/0x76 > [19041.208082] [] __cleanup_mnt+0x12/0x14 > [19041.208082] [] task_work_run+0x6f/0x95 > [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 > [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 > [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad > [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 > c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> > 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 > [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e > [btrfs] > [19041.208082] RSP > [19041.279264] ---[ end trace 23330586f16f064d ]--- > > This started happening as of kernel 4.8, since commit f3bca8028bd9 > ("Btrfs: add ASSERT for block group's memory leak") introduced these > assertions. > > So fix this by freeing the block groups only after waiting for all > worker kthreads to complete and shutdown the workqueues. This looks good to me, but I don't understand how that could happen, if a block group is being cached by the caching worker thread, the block group cache has been marked as BTRFS_CACHE_STARTED so we should wait on wait_block_group_cache_done() in btrfs_free_block_groups() before getting to the ASSERT. Maybe something else broke? Thanks, -liubo > > Signed-off-by: Filipe Manana > --- > fs/btrfs/disk-io.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 066d9b9..a90e40e 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3985,8 +3985,6 @@ void close_ctree(struct btrfs_fs_info *fs_info) > > btrfs_put_block_group_cache(fs_info); > > - btrfs_free_block_groups(fs_info); > - > /* >* we must make sure there is not any read request to >* submit after we stoppi
Re: [PATCH] Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist
On Mon, Jan 30, 2017 at 12:25:28PM -0800, Liu Bo wrote: > run_delalloc_nocow has used trans in two places where they don't actually need > @trans. > > For btrfs_lookup_file_extent, we search for file extents without COWing > anything, and for btrfs_cross_ref_exist, the only place where we need @trans > is > deferencing it in order to get running_transaction which we could easily get > from the global fs_info. > > Signed-off-by: Liu Bo Reviewed-by: David Sterba Removing the transaction is ok, backed by some patch archeology. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: fix assertion failure when freeing block groups at close_ctree()
From: Filipe Manana At close_ctree() we free the block groups and then only after we wait for any running worker kthreads to finish and shutdown the workqueues. This behaviour is racy and it triggers an assertion failure when freeing block groups because while we are doing it we can have for example a block group caching kthread running, and in that case the block group's reference count can still be greater than 1 by the time we assert its reference count is 1, leading to an assertion failure: [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: fs/btrfs/extent-tree.c, line: 9799 [19041.200584] [ cut here ] [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! [19041.202830] invalid opcode: [#1] PREEMPT SMP [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 4.9.0-rc7-btrfs-next-36+ #1 [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [19041.208082] task: 88015f028980 task.stack: c9000ad34000 [19041.208082] RIP: 0010:[] [] assfail.constprop.41+0x1c/0x1e [btrfs] [19041.208082] RSP: 0018:c9000ad37d60 EFLAGS: 00010286 [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 0001 [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 88023431d170 [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 88015ecb4100 [19041.208082] FS: 7f44f3d42840() GS:88023f38() knlGS: [19041.208082] CS: 0010 DS: ES: CR0: 80050033 [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 06e0 [19041.208082] Stack: [19041.208082] c9000ad37d98 a035989f 88015ecb4000 88015ecb5630 [19041.208082] 88014f6be000 7ffcf0ba6a10 c9000ad37df8 [19041.208082] a0368cd4 88014e9658e0 c9000ad37e08 811a634d [19041.208082] Call Trace: [19041.208082] [] btrfs_free_block_groups+0x17f/0x392 [btrfs] [19041.208082] [] close_ctree+0x1c5/0x2e1 [btrfs] [19041.208082] [] ? evict_inodes+0x132/0x141 [19041.208082] [] btrfs_put_super+0x15/0x17 [btrfs] [19041.208082] [] generic_shutdown_super+0x6a/0xeb [19041.208082] [] kill_anon_super+0x12/0x1c [19041.208082] [] btrfs_kill_super+0x16/0x21 [btrfs] [19041.208082] [] deactivate_locked_super+0x3b/0x68 [19041.208082] [] deactivate_super+0x36/0x39 [19041.208082] [] cleanup_mnt+0x58/0x76 [19041.208082] [] __cleanup_mnt+0x12/0x14 [19041.208082] [] task_work_run+0x6f/0x95 [19041.208082] [] prepare_exit_to_usermode+0xa3/0xc1 [19041.208082] [] syscall_return_slowpath+0x16e/0x1d2 [19041.208082] [] entry_SYSCALL_64_fastpath+0xab/0xad [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 [19041.208082] RIP [] assfail.constprop.41+0x1c/0x1e [btrfs] [19041.208082] RSP [19041.279264] ---[ end trace 23330586f16f064d ]--- This started happening as of kernel 4.8, since commit f3bca8028bd9 ("Btrfs: add ASSERT for block group's memory leak") introduced these assertions. So fix this by freeing the block groups only after waiting for all worker kthreads to complete and shutdown the workqueues. Signed-off-by: Filipe Manana --- v2: Make error path of open_ctree() also stop all workers before freeing the block groups. Assert that when freeing block groups, no block group can be in the caching started state. fs/btrfs/disk-io.c | 6 +++--- fs/btrfs/extent-tree.c | 9 ++--- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 066d9b9..bf54d7d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3263,7 +3263,6 @@ int open_ctree(struct super_block *sb, fail_block_groups: btrfs_put_block_group_cache(fs_info); - btrfs_free_block_groups(fs_info); fail_tree_roots: free_root_pointers(fs_info, 1); @@ -3271,6 +3270,7 @@ int open_ctree(struct super_block *sb, fail_sb_buffer: btrfs_stop_all_workers(fs_info); + btrfs_free_block_groups(fs_info); fail_alloc: fail_iput: btrfs_mapping_tree_free(&fs_info->mapping_tree); @@ -3985,8 +3985,6 @@ void close_ctree(struct btrfs_fs_info *fs_info) btrfs_put_block_group_cache(fs_info); -
[PATCH 0/24 RFC] fs: Convert all embedded bdis into separate ones
Hello, this patch series converts all embedded occurences of struct backing_dev_info to use standalone dynamically allocated structures. This makes bdi handling unified across all bdi users and generally removes some boilerplate code from filesystems setting up their own bdi. It also allows us to remove some code from generic bdi implementation. The patches were only compile-tested for most filesystems (I've tested mounting only for NFS & btrfs) so fs maintainers please have a look whether the changes look sound to you. This series is based on top of bdi fixes that were merged into linux-block git tree. Honza -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head
On Mon, Jan 30, 2017 at 12:24:37PM -0800, Liu Bo wrote: > All we need is @delayed_refs, all callers have get it ahead of calling > btrfs_find_delayed_ref_head since lock needs to be acquired firstly, there is > no reason to deference it again inside the function. > > Signed-off-by: Liu Bo Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subject: [REVERT][v4.x.y] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
Hello, Please consider reverting commit 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release. It was included upstream as of v4.7-rc1 This commit introduced a regression, described in the following bug: http://bugs.launchpad.net/bugs/1619918 This new regression was discussed in the following thread: https://lkml.org/lkml/2017/1/6/467 Sincerely, Joseph Salisbury -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove unused trans in read_block_for_search
On Mon, Jan 30, 2017 at 12:23:42PM -0800, Liu Bo wrote: > @trans is not used at all, this removes it. > > Signed-off-by: Liu Bo Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/24] btrfs: Convert to separately allocated bdi
Allocate struct backing_dev_info separately instead of embedding it inside superblock. This unifies handling of bdi among users. CC: Chris Mason CC: Josef Bacik CC: David Sterba CC: linux-btrfs@vger.kernel.org Signed-off-by: Jan Kara --- fs/btrfs/ctree.h | 1 - fs/btrfs/disk-io.c | 36 +++- fs/btrfs/super.c | 7 +++ 3 files changed, 14 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6a823719b6c5..1dc06f66dfcf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -801,7 +801,6 @@ struct btrfs_fs_info { struct btrfs_super_block *super_for_commit; struct super_block *sb; struct inode *btree_inode; - struct backing_dev_info bdi; struct mutex tree_log_mutex; struct mutex transaction_kthread_mutex; struct mutex cleaner_mutex; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 37a31b12bb0c..b25723e729c0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1810,21 +1810,6 @@ static int btrfs_congested_fn(void *congested_data, int bdi_bits) return ret; } -static int setup_bdi(struct btrfs_fs_info *info, struct backing_dev_info *bdi) -{ - int err; - - err = bdi_setup_and_register(bdi, "btrfs"); - if (err) - return err; - - bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE; - bdi->congested_fn = btrfs_congested_fn; - bdi->congested_data = info; - bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK; - return 0; -} - /* * called by the kthread helper functions to finally call the bio end_io * functions. This is where read checksum verification actually happens @@ -2598,16 +2583,10 @@ int open_ctree(struct super_block *sb, goto fail; } - ret = setup_bdi(fs_info, &fs_info->bdi); - if (ret) { - err = ret; - goto fail_srcu; - } - ret = percpu_counter_init(&fs_info->dirty_metadata_bytes, 0, GFP_KERNEL); if (ret) { err = ret; - goto fail_bdi; + goto fail_srcu; } fs_info->dirty_metadata_batch = PAGE_SIZE * (1 + ilog2(nr_cpu_ids)); @@ -2715,7 +2694,6 @@ int open_ctree(struct super_block *sb, sb->s_blocksize = 4096; sb->s_blocksize_bits = blksize_bits(4096); - sb->s_bdi = &fs_info->bdi; btrfs_init_btree_inode(fs_info); @@ -2912,9 +2890,12 @@ int open_ctree(struct super_block *sb, goto fail_sb_buffer; } - fs_info->bdi.ra_pages *= btrfs_super_num_devices(disk_super); - fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages, - SZ_4M / PAGE_SIZE); + sb->s_bdi->congested_fn = btrfs_congested_fn; + sb->s_bdi->congested_data = fs_info; + sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK; + sb->s_bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE; + sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super); + sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE); sb->s_blocksize = sectorsize; sb->s_blocksize_bits = blksize_bits(sectorsize); @@ -3282,8 +3263,6 @@ int open_ctree(struct super_block *sb, percpu_counter_destroy(&fs_info->delalloc_bytes); fail_dirty_metadata_bytes: percpu_counter_destroy(&fs_info->dirty_metadata_bytes); -fail_bdi: - bdi_destroy(&fs_info->bdi); fail_srcu: cleanup_srcu_struct(&fs_info->subvol_srcu); fail: @@ -4010,7 +3989,6 @@ void close_ctree(struct btrfs_fs_info *fs_info) percpu_counter_destroy(&fs_info->dirty_metadata_bytes); percpu_counter_destroy(&fs_info->delalloc_bytes); percpu_counter_destroy(&fs_info->bio_counter); - bdi_destroy(&fs_info->bdi); cleanup_srcu_struct(&fs_info->subvol_srcu); btrfs_free_stripe_hash_table(fs_info); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index b5ae7d3d1896..08ef08b63132 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1133,6 +1133,13 @@ static int btrfs_fill_super(struct super_block *sb, #endif sb->s_flags |= MS_I_VERSION; sb->s_iflags |= SB_I_CGROUPWB; + + err = super_setup_bdi(sb); + if (err) { + btrfs_err(fs_info, "super_setup_bdi failed"); + return err; + } + err = open_ctree(sb, fs_devices, (char *)data); if (err) { btrfs_err(fs_info, "open_ctree failed"); -- 2.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/24] fs: Provide infrastructure for dynamic BDIs in filesystems
Provide helper functions for setting up dynamically allocated backing_dev_info structures for filesystems and cleaning them up on superblock destruction. CC: linux-...@lists.infradead.org CC: linux-...@vger.kernel.org CC: Petr Vandrovec CC: linux-ni...@vger.kernel.org CC: cluster-de...@redhat.com CC: osd-...@open-osd.org CC: codal...@coda.cs.cmu.edu CC: linux-...@lists.infradead.org CC: ecryp...@vger.kernel.org CC: linux-c...@vger.kernel.org CC: ceph-de...@vger.kernel.org CC: linux-btrfs@vger.kernel.org CC: v9fs-develo...@lists.sourceforge.net CC: lustre-de...@lists.lustre.org Signed-off-by: Jan Kara --- fs/super.c | 49 include/linux/backing-dev-defs.h | 2 +- include/linux/fs.h | 6 + 3 files changed, 56 insertions(+), 1 deletion(-) diff --git a/fs/super.c b/fs/super.c index ea662b0e5e78..31dc4c6450ef 100644 --- a/fs/super.c +++ b/fs/super.c @@ -446,6 +446,11 @@ void generic_shutdown_super(struct super_block *sb) hlist_del_init(&sb->s_instances); spin_unlock(&sb_lock); up_write(&sb->s_umount); + if (sb->s_iflags & SB_I_DYNBDI) { + bdi_put(sb->s_bdi); + sb->s_bdi = &noop_backing_dev_info; + sb->s_iflags &= ~SB_I_DYNBDI; + } } EXPORT_SYMBOL(generic_shutdown_super); @@ -1249,6 +1254,50 @@ mount_fs(struct file_system_type *type, int flags, const char *name, void *data) } /* + * Setup private BDI for given superblock. I gets automatically cleaned up + * in generic_shutdown_super(). + */ +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...) +{ + struct backing_dev_info *bdi; + int err; + va_list args; + + bdi = bdi_alloc(GFP_KERNEL); + if (!bdi) + return -ENOMEM; + + bdi->name = sb->s_type->name; + + va_start(args, fmt); + err = bdi_register_va(bdi, NULL, fmt, args); + va_end(args); + if (err) { + bdi_put(bdi); + return err; + } + WARN_ON(sb->s_bdi != &noop_backing_dev_info); + sb->s_bdi = bdi; + sb->s_iflags |= SB_I_DYNBDI; + + return 0; +} +EXPORT_SYMBOL(super_setup_bdi_name); + +/* + * Setup private BDI for given superblock. I gets automatically cleaned up + * in generic_shutdown_super(). + */ +int super_setup_bdi(struct super_block *sb) +{ + static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); + + return super_setup_bdi_name(sb, "%.28s-%ld", sb->s_type->name, + atomic_long_inc_return(&bdi_seq)); +} +EXPORT_SYMBOL(super_setup_bdi); + +/* * This is an internal function, please use sb_end_{write,pagefault,intwrite} * instead. */ diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 2ecafc8a2d06..70080b4217f4 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -143,7 +143,7 @@ struct backing_dev_info { congested_fn *congested_fn; /* Function pointer if device is md/dm */ void *congested_data; /* Pointer to aux data for congested func */ - char *name; + const char *name; struct kref refcnt; /* Reference counter for the structure */ unsigned int registered:1; /* Is bdi registered? */ diff --git a/include/linux/fs.h b/include/linux/fs.h index c930cbc19342..8ed8b6d1bc54 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1267,6 +1267,9 @@ struct mm_struct; /* sb->s_iflags to limit user namespace mounts */ #define SB_I_USERNS_VISIBLE0x0010 /* fstype already mounted */ +/* Temporary flag until all filesystems are converted to dynamic bdis */ +#define SB_I_DYNBDI0x0100 + /* Possible states of 'frozen' field */ enum { SB_UNFROZEN = 0,/* FS is unfrozen */ @@ -2103,6 +2106,9 @@ extern int vfs_ustat(dev_t, struct kstatfs *); extern int freeze_super(struct super_block *super); extern int thaw_super(struct super_block *super); extern bool our_mnt(struct vfsmount *mnt); +extern __printf(2, 3) +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); +extern int super_setup_bdi(struct super_block *sb); extern int current_umask(void); -- 2.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-02-02 09:25, Adam Borowski wrote: On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote: This is a severe bug that makes a not all that uncommon (albeit bad) use case fail completely. The fix had no dependencies itself and I don't see what's bad in mounting a RAID degraded. Yeah, it provides no redundancy but that's no worse than using a single disk from the start. And most people not doing storage/server farm don't have a stack of spare disks at hand, so getting a replacement might take a while. Running degraded is bad. Period. If you don't have a disk on hand to replace the failed one (and if you care about redundancy, you should have at least one spare on hand), you should be converting to a single disk, not continuing to run in degraded mode until you get a new disk. The moment you start talking about running degraded long enough that you will be _booting_ the system with the array degraded, you need to be converting to a single disk. This is of course impractical for something like a hardware array or an LVM volume, but it's _trivial_ with BTRFS, and protects you from all kinds of bad situations that can't happen with a single disk but can completely destroy the filesystem if it's a degraded array. Running a single disk is not exactly the same as running a degraded array, it's actually marginally safer (even if you aren't using dup profile for metadata) because there are fewer moving parts to go wrong. It's also exponentially more efficient. Being able to continue to run when a disk fails is the whole point of RAID -- despite what some folks think, RAIDs are not for backups but for uptime. And if your uptime goes to hell because the moment a disk fails you need to drop everything and replace the disk immediately, why would you use RAID? Because just replacing a disk and rebuilding the array is almost always much cheaper in terms of time than rebuilding the system from a backup. IOW, even if you have to drop everything and replace the disk immediately, it's still less time consuming than restoring from a backup. It also has the advantage that you don't lose any data. I /thought/ the immediate benefit was obvious enough that it would be mainline-merged by now, not hoovered-up into some long-term project with no real hint as to /when/ it might be merged. Oh, well... I think (although I'm not sure about it) that this: http://www.spinics.net/lists/linux-btrfs/msg47283.html is the first posting of the patch series. Is there a more recent version somewhere? Mechanically rebasing+resolving conflicts doesn't work, I'd need to do a more involved refresh, which would be a waste of time if it's already done by someone with an actual clue about this code. There may be, but I haven't looked very far. Qu would probably be the person to ask. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote: > This is a severe bug that makes a not all that uncommon (albeit bad) use > case fail completely. The fix had no dependencies itself and I don't see what's bad in mounting a RAID degraded. Yeah, it provides no redundancy but that's no worse than using a single disk from the start. And most people not doing storage/server farm don't have a stack of spare disks at hand, so getting a replacement might take a while. Being able to continue to run when a disk fails is the whole point of RAID -- despite what some folks think, RAIDs are not for backups but for uptime. And if your uptime goes to hell because the moment a disk fails you need to drop everything and replace the disk immediately, why would you use RAID? > > I /thought/ the immediate benefit was obvious enough that it > > would be mainline-merged by now, not hoovered-up into some long-term > > project with no real hint as to /when/ it might be merged. Oh, well... > I think (although I'm not sure about it) that this: > http://www.spinics.net/lists/linux-btrfs/msg47283.html > is the first posting of the patch series. Is there a more recent version somewhere? Mechanically rebasing+resolving conflicts doesn't work, I'd need to do a more involved refresh, which would be a waste of time if it's already done by someone with an actual clue about this code. Meow! -- Autotools hint: to do a zx-spectrum build on a pdp11 host, type: ./configure --host=zx-spectrum --build=pdp11 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs receive leaves new subvolume modifiable during operation
On 2017-02-02 05:52, Graham Cobb wrote: On 02/02/17 00:02, Duncan wrote: If it's a workaround, then many of the Linux procedures we as admins and users use every day are equally workarounds. Setting 007 perms on a dir that doesn't have anything immediately security vulnerable in it, simply to keep other users from even potentially seeing or being able to write to something N layers down the subdir tree, is standard practice. No. There is no need to normally place a read-only snapshot below a no-execute directory just to prevent write access to it. That is not part of the admin's expectation. That depends on the admin. I for example would absolutely expect that to be needed _while the snapshot is being created_ if the operation isn't being done by the kernel. Which is my point. This is no different than standard security practice, that an admin should be familiar with and using without even having to think about it. Btrfs is simply making the same assumptions that everyone else does, that an admin knows what they are doing and sets the upstream permissions with that in mind. If they don't, how is that btrfs' fault? Because btrfs intends the receive snapshot to be read-only. That is the expectation of the sysadmin. It is an important and useful feature which makes send/receive very useful for creating user-readable-but-not-modifiable backups (without it, send/receive are useful for many things but less useful for creating backups). That feature has a bug. If that's your use case, then from a consistency perspective, you should be receiving into a location the user can't read and then moving the subvolume into a user readable location once receive is done anyway. Otherwise, they may see a partial snapshot, and think something has been deleted that really hasn't. This is a pretty basic method of ensuring consistency that's used literally all over the place on UNIX systems to atomically replace or update data sets. Doing so also cleanly avoids needing to manipulate permissions on the directory the snapshots are in to prevent users from modifying the snapshots being received. Just because you don't personally use the feature, doesn't mean it isn't a bug! Many of us do rely on that feature. Even though it is security-related, I agree it isn't the highest priority btrfs bug. It can probably wait until receive is being worked on for other reasons. But if it isn't going to be fixed any time soon, it should be documented in the Wiki and the man page, with the suggested workround for anyone who needs to make sure the receive won't be tampered with. I agree that it should be documented. I don't agree that a specific workaround needs to be stated, as whether or not any particular workaround is needed is entirely dependent on the use case. For example, if your users can only access the received snapshots over a networked filesystem, there's a much simpler option of just exporting the directories the snapshots are received into as read-only. All that needs to be stated is that the way to prevent this is to make sure users can't write to the snapshots while they're being received. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-02-01 17:48, Duncan wrote: Adam Borowski posted on Wed, 01 Feb 2017 12:55:30 +0100 as excerpted: On Wed, Feb 01, 2017 at 05:23:16AM +, Duncan wrote: Hans Deragon posted on Tue, 31 Jan 2017 21:51:22 -0500 as excerpted: But the current scenario makes it difficult for me to put redundancy back into service! How much time did I waited until I find the mailing list, subscribe to it, post my email and get an answer? Wouldn't it be better if the user could actually add the disk at anytime, mostly ASAP? And to fix this, I have to learn how to patch and compile the kernel. I have not done this since the beginning of the century. More delays, more risk added to the system (what if I compile the kernel with the wrong parameters?). The patch fixing the problem and making return from degraded not the one- shot thing it tends to be now will eventually be merged Not anything like the one I posted to this thread -- this one merely removes a check that can't handle this particular (common) case of an otherwise healthy RAID that lost one device then was mounted degraded twice. We need instead a better check, one that sees whether every block group is present. This can be done quite easily, as as far as I know, the list of block group is at that moment fully present in memory, but someone actually has to code that, and I for one don't know btrfs internals (yet?). I didn't mention it because you spared me the trouble with your hack- patch that did the current job, but FWIW, there's actually a patch that does per-chunk testing as you indicate, but it got merged into a longer running feature-add project (hot-spares, IIRC), and thus isn't likely to be mainline-merged until that project is merged. But who knows when that might be? Could be years before it is considered ready. Meanwhile, perhaps it's simply because I'm not a dev and am not appreciating the complexities of some detail or other, but as demonstrated by the people who have local-merged that patch to get out of just this sort of jam, as well as the continuing saga of more and more people appearing here with the same problem, it could be an arguably high priority fix on its own, and should have been reviewed and ultimately mainline-merged on its own merits, instead of being stuck in someone's feature-add project queue for potentially years, while more and more people who could have definitely used it have to either suffer without it or go and find and local-merge it themselves. Even if this feature is critical to the longer term feature, merge of this little one now would make the final patch set for the longer term feature that much smaller. Agreed, it should have been mainlined. I have no issue with the hot-spare patches depending on it, but it's a severe bug. But that's a btrfs-using sysadmin's viewpoint, not a developer viewpoint, and it's the developer's doing the work, so they get to define when and how it gets merged, and us non-devs must either simply live with it, or if the circumstances allow, fund some dev to have our specific interests as their priority and take care of it for us. I don't care in this case if I draw some flak from the developers, but this particular developer viewpoint is wrong. If this was a commercial software product, the person responsible would at least be facing some serious reprimand, and depending on the company, possibly would be out of a job. This is a severe bug that makes a not all that uncommon (albeit bad) use case fail completely. The fix had no dependencies itself and Meanwhile, perhaps I should have bookmarked that patch at least as it appeared on-list, but I didn't, so while I know it exists, I too would have to go looking to actually find it, should I end up needing it. In my defense, I /thought/ the immediate benefit was obvious enough that it would be mainline-merged by now, not hoovered-up into some long-term project with no real hint as to /when/ it might be merged. Oh, well... I think (although I'm not sure about it) that this: http://www.spinics.net/lists/linux-btrfs/msg47283.html is the first posting of the patch series. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists
On Sunday 28 August 2016 15:29:08 Kai Krakow wrote: > Hello list! Hi list > It happened again. While using VirtualBox the following crash happened, > btrfs check found a lot of errors which it couldn't repair. Earlier > that day my system crashed which may already introduced errors into my > filesystem. Apparently, I couldn't create an image (not enough space > available), I only can give this trace from dmesg: > > [44819.903435] [ cut here ] > [44819.903443] WARNING: CPU: 3 PID: 2787 at fs/btrfs/extent-tree.c:2963 > btrfs_run_delayed_refs+0x26c/0x290 [44819.903444] BTRFS: Transaction > aborted (error -17) > [44819.903445] Modules linked in: nls_iso8859_15 nls_cp437 vfat fat fuse > rfcomm veth af_packet ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp > llc w83627ehf bnep hwmon_vid cachefiles btusb btintel bluetooth > snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic > snd_hda_intel snd_hda_codec rfkill snd_hwdep snd_hda_core snd_pcm snd_timer > coretemp hwmon snd r8169 mii kvm_intel kvm iTCO_wdt iTCO_vendor_support > rtc_cmos irqbypass soundcore ip_tables uas usb_storage nvidia_drm(PO) > vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) > nvidia(PO) efivarfs unix ipv6 [44819.903484] CPU: 3 PID: 2787 Comm: > BrowserBlocking Tainted: P O4.7.2-gentoo #2 [44819.903485] > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS > L2.16A 02/22/2013 [44819.903487] 8130af2d > 8800b7d03d20 [44819.903489] 810865fa > 880409374428 8800b7d03d70 8803bf299760 [44819.903491] > ffef 8803f677f000 8108666a > [44819.903493] Call Trace: > [44819.903496] [] ? dump_stack+0x46/0x59 > [44819.903500] [] ? __warn+0xba/0xe0 > [44819.903502] [] ? warn_slowpath_fmt+0x4a/0x50 > [44819.903504] [] ? btrfs_run_delayed_refs+0x26c/0x290 > [44819.903507] [] ? btrfs_release_path+0xe/0x80 > [44819.903509] [] ? > btrfs_start_dirty_block_groups+0x2da/0x420 [44819.903511] > [] ? btrfs_commit_transaction+0x143/0x990 [44819.903514] > [] ? kmem_cache_free+0x165/0x180 [44819.903516] > [] ? btrfs_wait_ordered_range+0x7c/0x110 [44819.903518] > [] ? btrfs_sync_file+0x286/0x360 [44819.903522] > [] ? do_fsync+0x33/0x60 > [44819.903524] [] ? SyS_fdatasync+0xa/0x10 > [44819.903528] [] ? entry_SYSCALL_64_fastpath+0x13/0x8f > [44819.903529] ---[ end trace 6944811e170a0e57 ]--- > [44819.903531] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2963: > errno=-17 Object already exists [44819.903533] BTRFS info (device bcache2): > forced readonly I got the same error myself, with this stack trace: -- Logs begin at Fr 2016-04-01 17:07:28 CEST, end at Mi 2017-02-01 22:03:57 CET. -- Feb 01 01:46:26 diefledermaus kernel: [ cut here ] Feb 01 01:46:26 diefledermaus kernel: WARNING: CPU: 1 PID: 16727 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0x278/0x2b0 Feb 01 01:46:26 diefledermaus kernel: BTRFS: Transaction aborted (error -17) Feb 01 01:46:26 diefledermaus kernel: BTRFS: error (device sdb2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists Feb 01 01:46:27 diefledermaus kernel: BTRFS info (device sdb2): forced readonly Feb 01 01:46:27 diefledermaus kernel: Modules linked in: msr ctr ccm tun arc4 snd_hda_codec_idt applesmc snd_hda_codec_generic input_polldev hwmon snd_hda_intel ath5k snd_hda_codec mac80211 snd_hda_core ath snd_pcm cfg80211 snd_timer video acpi_cpufreq snd backlight sky2 rfkill processor button soundcore sg usb_storage sr_mod cdrom ata_generic pata_acpi uhci_hcd ahci libahci ata_piix libata ehci_pci ehci_hcd Feb 01 01:46:27 diefledermaus kernel: CPU: 1 PID: 16727 Comm: kworker/u4:0 Not tainted 4.9.6-gentoo #1 Feb 01 01:46:27 diefledermaus kernel: Hardware name: Apple Inc. Macmini2,1/Mac-F4208EAA, BIOS MM21.88Z.009A.B00.0706281359 06/28/07 Feb 01 01:46:27 diefledermaus kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper Feb 01 01:46:27 diefledermaus kernel: 812cf739 c9000285fd60 Feb 01 01:46:27 diefledermaus kernel: 8104908a 8800428df1e0 c9000285fdb0 0020 Feb 01 01:46:27 diefledermaus kernel: 880003c1b1b8 8800bb73e900 810490fa Feb 01 01:46:27 diefledermaus kernel: Call Trace: Feb 01 01:46:27 diefledermaus kernel: [] ? dump_stack+0x46/0x5d Feb 01 01:46:27 diefledermaus kernel: [] ? __warn+0xba/0xe0 Feb 01 01:46:27 diefledermaus kernel: [] ? warn_slowpath_fmt+0x4a/0x50 Feb 01 01:46:27 diefledermaus kernel: [] ? btrfs_run_delayed_refs+0x278/0x2b0 Feb 01 01:46:27 diefledermaus kernel: [] ? delayed_ref_async_start+0x84/0xa0 Feb 01 01:46:27 diefledermaus kernel: [] ? process_one_work+0x126/0x310 Feb 01 01:46:27 diefledermaus kernel: [] ? pwq_activate_delayed_work+0x3
Re: btrfs receive leaves new subvolume modifiable during operation
On 02/02/17 00:02, Duncan wrote: > If it's a workaround, then many of the Linux procedures we as admins and > users use every day are equally workarounds. Setting 007 perms on a dir > that doesn't have anything immediately security vulnerable in it, simply > to keep other users from even potentially seeing or being able to write > to something N layers down the subdir tree, is standard practice. No. There is no need to normally place a read-only snapshot below a no-execute directory just to prevent write access to it. That is not part of the admin's expectation. > Which is my point. This is no different than standard security practice, > that an admin should be familiar with and using without even having to > think about it. Btrfs is simply making the same assumptions that > everyone else does, that an admin knows what they are doing and sets the > upstream permissions with that in mind. If they don't, how is that > btrfs' fault? Because btrfs intends the receive snapshot to be read-only. That is the expectation of the sysadmin. It is an important and useful feature which makes send/receive very useful for creating user-readable-but-not-modifiable backups (without it, send/receive are useful for many things but less useful for creating backups). That feature has a bug. Just because you don't personally use the feature, doesn't mean it isn't a bug! Many of us do rely on that feature. Even though it is security-related, I agree it isn't the highest priority btrfs bug. It can probably wait until receive is being worked on for other reasons. But if it isn't going to be fixed any time soon, it should be documented in the Wiki and the man page, with the suggested workround for anyone who needs to make sure the receive won't be tampered with. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] btrfs-progs: fsck-tests: verify 'btrfs check --repair' fixes corrupted nlink field
Signed-off-by: Lakshmipathi.G --- tests/fsck-tests/026-check-inode-link/test.sh | 30 +++ 1 file changed, 30 insertions(+) create mode 100755 tests/fsck-tests/026-check-inode-link/test.sh diff --git a/tests/fsck-tests/026-check-inode-link/test.sh b/tests/fsck-tests/026-check-inode-link/test.sh new file mode 100755 index 000..6822ee2 --- /dev/null +++ b/tests/fsck-tests/026-check-inode-link/test.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# verify that 'btrfs check --repair' fixes corrupted inode nlink field + +source $TOP/tests/common + +check_prereq btrfs-corrupt-block +check_prereq mkfs.btrfs + +setup_root_helper +prepare_test_dev 512M + +test_inode_nlink_field() +{ + run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV + + run_check_mount_test_dev + run_check $SUDO_HELPER touch $TEST_MNT/test_nlink.txt + + # find inode_number + inode_number=`stat -c%i $TEST_MNT/test_nlink.txt` + run_check_umount_test_dev + + # corrupt nlink field of inode object +run_check $SUDO_HELPER $TOP/btrfs-corrupt-block -i $inode_number \ + -f nlink $TEST_DEV + + check_image $TEST_DEV +} + +test_inode_nlink_field -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html