Another 4.8-rc locked splat: btrfs_close_devices()
Hello, This one seems to have appeared after Anand's commit 142388194191 ("btrfs: do not background blkdev_put()") got merged into 4.8-rc4. Thanks, Ilya [ 983.284212] == [ 983.290401] [ INFO: possible circular locking dependency detected ] [ 983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted [ 983.302081] --- [ 983.308357] umount/21720 is trying to acquire lock: [ 983.313243] (>bd_mutex){+.+.+.}, at: [] blkdev_put+0x31/0x150 [ 983.321264] [ 983.321264] but task is already holding lock: [ 983.327101] (_devs->device_list_mutex){+.+...}, at: [] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.337839] [ 983.337839] which lock already depends on the new lock. [ 983.337839] [ 983.346024] [ 983.346024] the existing dependency chain (in reverse order) is: [ 983.353512] -> #4 (_devs->device_list_mutex){+.+...}: [ 983.359096][] lock_acquire+0x1bc/0x1f0 [ 983.365143][] mutex_lock_nested+0x65/0x350 [ 983.371521][] btrfs_show_devname+0x36/0x1f0 [btrfs] [ 983.378710][] show_vfsmnt+0x4e/0x150 [ 983.384593][] m_show+0x17/0x20 [ 983.389957][] seq_read+0x2b5/0x3b0 [ 983.395669][] __vfs_read+0x28/0x100 [ 983.401464][] vfs_read+0xab/0x150 [ 983.407080][] SyS_read+0x52/0xb0 [ 983.412609][] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.419617] -> #3 (namespace_sem){++}: [ 983.424024][] lock_acquire+0x1bc/0x1f0 [ 983.430074][] down_write+0x49/0x80 [ 983.435785][] lock_mount+0x67/0x1c0 [ 983.441582][] do_add_mount+0x32/0xf0 [ 983.447458][] finish_automount+0x5a/0xc0 [ 983.453682][] follow_managed+0x1b3/0x2a0 [ 983.459912][] lookup_fast+0x300/0x350 [ 983.465875][] path_openat+0x3a7/0xaa0 [ 983.471846][] do_filp_open+0x85/0xe0 [ 983.477731][] do_sys_open+0x14c/0x1f0 [ 983.483702][] SyS_open+0x1e/0x20 [ 983.489240][] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.496254] -> #2 (>s_type->i_mutex_key#3){+.+.+.}: [ 983.501798][] lock_acquire+0x1bc/0x1f0 [ 983.507855][] down_write+0x49/0x80 [ 983.513558][] start_creating+0x87/0x100 [ 983.519703][] debugfs_create_dir+0x17/0x100 [ 983.526195][] bdi_register+0x93/0x210 [ 983.532165][] bdi_register_owner+0x43/0x70 [ 983.538570][] device_add_disk+0x1fb/0x450 [ 983.544888][] loop_add+0x1e6/0x290 [ 983.550596][] loop_init+0x10b/0x14f [ 983.556394][] do_one_initcall+0xa7/0x180 [ 983.562618][] kernel_init_freeable+0x1cc/0x266 [ 983.569370][] kernel_init+0xe/0x100 [ 983.575166][] ret_from_fork+0x1f/0x40 [ 983.581131] -> #1 (loop_index_mutex){+.+.+.}: [ 983.585801][] lock_acquire+0x1bc/0x1f0 [ 983.591858][] mutex_lock_nested+0x65/0x350 [ 983.598256][] lo_open+0x1f/0x60 [ 983.603704][] __blkdev_get+0x123/0x400 [ 983.609757][] blkdev_get+0x34a/0x350 [ 983.615639][] blkdev_open+0x64/0x80 [ 983.621428][] do_dentry_open+0x1c6/0x2d0 [ 983.627651][] vfs_open+0x69/0x80 [ 983.633181][] path_openat+0x834/0xaa0 [ 983.639152][] do_filp_open+0x85/0xe0 [ 983.645035][] do_sys_open+0x14c/0x1f0 [ 983.650999][] SyS_open+0x1e/0x20 [ 983.656535][] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.663541] -> #0 (>bd_mutex){+.+.+.}: [ 983.668107][] __lock_acquire+0x1003/0x17b0 [ 983.674510][] lock_acquire+0x1bc/0x1f0 [ 983.680561][] mutex_lock_nested+0x65/0x350 [ 983.686967][] blkdev_put+0x31/0x150 [ 983.692761][] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.699699][] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.707178][] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 983.714380][] close_ctree+0x265/0x340 [btrfs] [ 983.721061][] btrfs_put_super+0x19/0x20 [btrfs] [ 983.727908][] generic_shutdown_super+0x6f/0x100 [ 983.734744][] kill_anon_super+0x16/0x30 [ 983.740888][] btrfs_kill_super+0x1e/0x130 [btrfs] [ 983.747909][] deactivate_locked_super+0x49/0x80 [ 983.754745][] deactivate_super+0x5d/0x70 [ 983.760977][] cleanup_mnt+0x5c/0x80 [ 983.766773][] __cleanup_mnt+0x12/0x20 [ 983.772738][] task_work_run+0x7e/0xc0 [ 983.778708][] exit_to_usermode_loop+0x7e/0xb4 [ 983.785373][] syscall_return_slowpath+0xbb/0xd0 [ 983.792212][] entry_SYSCALL_64_fastpath+0xbf/0xc1 [ 983.799225] [ 983.799225] other info that might help us debug this: [ 983.799225] [ 983.807291] Chain exists of: >bd_mutex --> namespace_sem --> _devs->device_list_mutex [ 983.816521] Possible unsafe locking scenario: [ 983.816521] [ 983.822489]CPU0CPU1 [ 983.827043]
Possible recursive locking on btrfs_inode::log_mutex
Hi, I'm seeing the attached lockdep splat on 4.8-rc2 - I'm guessing it got introduced in 4.8-rc. It appears to be very easy to trigger and shows up within seconds after mkfs + mount. Thanks, Ilya kernel: = kernel: [ INFO: possible recursive locking detected ] kernel: 4.8.0-rc2-ceph-00297-gab9f8d2 #1 Not tainted kernel: - kernel: ceph-osd/25016 is trying to acquire lock: kernel: (>log_mutex){+.+...}, at: [] btrfs_log_inode+0x17e/0xb50 [btrfs] kernel: #012but task is already holding lock: kernel: (>log_mutex){+.+...}, at: [] btrfs_log_inode+0x17e/0xb50 [btrfs] kernel: #012other info that might help us debug this: kernel: Possible unsafe locking scenario: kernel: CPU0 kernel: kernel: lock(>log_mutex); kernel: lock(>log_mutex); kernel: #012 *** DEADLOCK *** kernel: May be due to missing lock nesting notation kernel: 6 locks held by ceph-osd/25016: kernel: #0: (sb_writers#15){.+.+.+}, at: [] __sb_start_write+0xad/0xe0 kernel: #1: (>i_mutex_dir_key#4/1){+.+.+.}, at: [] lock_rename+0x2e/0x100 kernel: #2: (>s_type->i_mutex_key#19){+.+.+.}, at: [] lock_two_nondirectories+0x3e/0x70 kernel: #3: (>s_type->i_mutex_key#19/4){+.+...}, at: [] lock_two_nondirectories+0x66/0x70 kernel: #4: (sb_internal#2){.+.+.+}, at: [] __sb_start_write+0x7b/0xe0 kernel: #5: (>log_mutex){+.+...}, at: [] btrfs_log_inode+0x17e/0xb50 [btrfs] kernel: #012stack backtrace: kernel: CPU: 0 PID: 25016 Comm: ceph-osd Not tainted 4.8.0-rc2-ceph-00297-gab9f8d2 #1 kernel: Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010 kernel: 0086 668dbca4 88042a277870 81413e51 kernel: 828f4630 828f4630 88042a277930 811069c0 kernel: dfc5c151 8804 4efe3813430e32cf 880429f38000 kernel: Call Trace: kernel: [] dump_stack+0x85/0xc4 kernel: [] __lock_acquire+0xce0/0x1680 kernel: [] ? mutex_unlock+0xe/0x10 kernel: [] lock_acquire+0x197/0x1f0 kernel: [] ? btrfs_log_inode+0x17e/0xb50 [btrfs] kernel: [] mutex_lock_nested+0x79/0x370 kernel: [] ? btrfs_log_inode+0x17e/0xb50 [btrfs] kernel: [] btrfs_log_inode+0x17e/0xb50 [btrfs] kernel: [] ? __might_sleep+0x70/0x90 kernel: [] ? btrfs_i_callback+0x20/0x20 [btrfs] kernel: [] ? btrfs_invalidate_inodes+0x198/0x1f0 [btrfs] kernel: [] ? _raw_spin_unlock+0x27/0x40 kernel: [] btrfs_log_inode+0x534/0xb50 [btrfs] kernel: [] ? trace_hardirqs_on_caller+0x178/0x1d0 kernel: [] btrfs_log_inode_parent+0x881/0x970 [btrfs] kernel: [] ? _raw_spin_unlock+0x27/0x40 kernel: [] ? btrfs_update_inode+0xd7/0x110 [btrfs] kernel: [] btrfs_log_new_name+0x7b/0x80 [btrfs] kernel: [] btrfs_rename2+0xfd2/0x16c0 [btrfs] kernel: [] ? lock_two_nondirectories+0x66/0x70 kernel: [] ? down_write_nested+0x4f/0x80 kernel: [] ? lock_two_nondirectories+0x66/0x70 kernel: [] vfs_rename+0x467/0x830 kernel: [] ? read_seqcount_begin+0x90/0xa0 kernel: [] SyS_rename+0x1ee/0x3c0 kernel: [] do_syscall_64+0x7a/0x1c0 kernel: [] entry_SYSCALL64_slow_path+0x25/0x25
Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held
On Wed, Mar 30, 2016 at 7:25 PM, Liu Bo <bo.li@oracle.com> wrote: > On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote: >> Hi, >> >> We are hitting the attached lockup on a somewhat regular basis during >> nightly tests. Looks like a bunch of CPUs spin in find_free_extent() >> on btrfs_free_cluster::lock, which is held by writer, who seems to be >> stuck in an endless loop in btrfs_find_space_cluster(), trying to >> cleanup bitmaps list. Smells like a list corruption to me? > > My objdump shows that find_free_extent() may wait on > down_read(_info->groups_sem); It's spinning on a spinlock: 6228 if (last_ptr) { 6229 spin_lock(_ptr->lock); 6230 if (last_ptr->block_group) > > One possible thing is that there're too many entries in bitmap, and > list_for_each_entry just is just stuck there. I don't think so - look at the two journal_write splats. > > Are these stacks from "Blocked more than 120s"? No, these are all soft lockups. Sorry, I had to edit it to make it readable and concentrated on stack traces. I guess "BUG: soft lockup" were mixed up with modules and object code all over the place. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held
Hi, We are hitting the attached lockup on a somewhat regular basis during nightly tests. Looks like a bunch of CPUs spin in find_free_extent() on btrfs_free_cluster::lock, which is held by writer, who seems to be stuck in an endless loop in btrfs_find_space_cluster(), trying to cleanup bitmaps list. Smells like a list corruption to me? The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but the surroundings look sufficiently similar to upstream and given recent patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in setup_cluster_bitmaps") I thought this might be relevant for upstream. Thanks, Ilya [74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted 3.13.0-83-generic #127-Ubuntu [74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: 88083c564000 [74750.666936] RIP: 0010:[] [] _raw_spin_lock+0x37/0x50 [74750.675645] RSP: 0018:88083c565b50 EFLAGS: 0206 [74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: 09a0 [74750.689126] RDX: 09a6 RSI: 09a6 RDI: 88084f6671b0 [74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: 0001 [74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: 8807e829d4e0 [74750.712140] R13: a02073ab R14: 88083c565ae0 R15: 8807e829d4e0 [74750.719811] FS: () GS:88087fd8() knlGS: [74750.728450] CS: 0010 DS: ES: CR0: 80050033 [74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: 001407e0 [74750.742437] Stack: [74750.745001] 88083c565c28 a01b07a3 88074873a000 880530131d20 [74750.753033] 880791982000 88083c565cef 1000 00040002 [74750.761065] a01a3a97 0020 880523f4a800 0001 [74750.769105] Call Trace: [74750.772137] [] find_free_extent+0x213/0xc30 [btrfs] [74750.779259] [] ? btrfs_del_items+0x367/0x470 [btrfs] [74750.786476] [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [74750.793869] [] __btrfs_prealloc_file_range+0xe5/0x380 [btrfs] [74750.801876] [] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs] [74750.810136] [] btrfs_write_dirty_block_groups+0x4d3/0x620 [btrfs] [74750.818469] [] commit_cowonly_roots+0x151/0x213 [btrfs] [74750.825940] [] btrfs_commit_transaction+0x483/0x970 [btrfs] [74750.833765] [] transaction_kthread+0x1b5/0x240 [btrfs] [74750.841156] [] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs] [74750.849244] [] kthread+0xd2/0xf0 [74750.854729] [] ? kthread_create_on_node+0x1c0/0x1c0 [74750.861861] [] ret_from_fork+0x58/0x90 [74750.867868] [] ? kthread_create_on_node+0x1c0/0x1c0 [74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted 3.13.0-83-generic #127-Ubuntu [74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15) [74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: 88078ad98000 [74758.690802] RIP: 0010:[] [] _raw_spin_lock+0x32/0x50 [74758.701202] RSP: 0018:88078ad99728 EFLAGS: 0202 [74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: 09a0 [74758.717910] RDX: 09a4 RSI: 09a4 RDI: 88084f6671b0 [74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: 0001 [74758.736311] R10: R11: ea001b38bc00 R12: 88084e918770 [74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: 88078ad99698 [74758.754665] FS: () GS:88087fd4() knlGS: [74758.764805] CS: 0010 DS: ES: CR0: 80050033 [74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: 001407e0 [74758.781766] Stack: [74758.785786] 88078ad99800 a01b07a3 8807e6bc5000 880853034e70 [74758.795307] 88078ad997b8 88078ad998b7 880853034e40 00040002 [74758.804838] 88078ad99778 0020 880523f4a800 0001 [74758.814337] Call Trace: [74758.818811] [] find_free_extent+0x213/0xc30 [btrfs] [74758.827389] [] ? alloc_extent_state+0x21/0xc0 [btrfs] [74758.836147] [] ? __lookup_extent_mapping+0xa0/0x150 [btrfs] [74758.845413] [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [74758.854224] [] cow_file_range+0x135/0x430 [btrfs] [74758.862593] [] run_delalloc_range+0x312/0x350 [btrfs] [74758.871310] [] ? find_lock_delalloc_range.constprop.43+0x1b9/0x1f0 [btrfs] [74758.881857] [] ? ata_scsi_queuecmd+0x133/0x400 [74758.889974] [] __extent_writepage+0x2f4/0x760 [btrfs] [74758.898703] [] ? btrfs_add_delayed_iput+0x61/0xc0 [btrfs] [74758.907763] [] ? __blk_run_queue+0x33/0x40 [74758.915511] [] ? find_get_pages_tag+0xd1/0x180 [74758.923600] [] ? kmem_cache_alloc_trace+0x3c/0x1f0 [74758.932051] []
Re: [PATCH] btrfs: Allow forced conversion of metadata to dup profile on multiple devices
On Thu, Feb 20, 2014 at 6:57 PM, David Sterba dste...@suse.cz wrote: On Wed, Feb 19, 2014 at 11:10:41AM -0500, Austin S Hemmelgarn wrote: Currently, btrfs balance start fails when trying to convert metadata or system chunks to dup profile on filesystems with multiple devices. This requires that a conversion from a multi-device filesystem to a single device filesystem use the following methodology: 1. btrfs balance start -dconvert=single -mconvert=single \ -sconvert=single -f / 2. btrfs device delete / /dev/sdx 3. btrfs balance start -mconvert=dup -sconvert=dup / This results in a period of time (possibly very long if the devices are big) where you don't have the protection guarantees of multiple copies of metadata chunks. After applying this patch, one can instead use the following methodology for conversion from a multi-device filesystem to a single device filesystem: 1. btrfs balance start -dconvert=single -mconvert=dup \ -sconvert=dup -f / 2. btrfs device delete / /dev/sdx This greatly reduces the chances of the operation causing data loss due to a read error during the device delete. Signed-off-by: Austin S. Hemmelgarn ahferro...@gmail.com Reviewed-by: David Sterba dste...@suse.cz Sounds useful. The muliple devices + DUP is allowed setup when the device is added, this patch only adds the 'delete' counterpart. The imroved data loss protection during the process is a good thing. Hi, Have you actually tried to queue it? Unless I'm missing something, it won't compile, and on top of that, it seems to be corrupted too.. IIRC muliple devices + DUP is allowed only until the first balance, has that changed? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Allow forced conversion of metadata to dup profile on multiple devices
On Mon, Feb 24, 2014 at 3:44 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2014-02-24 08:37, Ilya Dryomov wrote: On Thu, Feb 20, 2014 at 6:57 PM, David Sterba dste...@suse.cz wrote: On Wed, Feb 19, 2014 at 11:10:41AM -0500, Austin S Hemmelgarn wrote: Currently, btrfs balance start fails when trying to convert metadata or system chunks to dup profile on filesystems with multiple devices. This requires that a conversion from a multi-device filesystem to a single device filesystem use the following methodology: 1. btrfs balance start -dconvert=single -mconvert=single \ -sconvert=single -f / 2. btrfs device delete / /dev/sdx 3. btrfs balance start -mconvert=dup -sconvert=dup / This results in a period of time (possibly very long if the devices are big) where you don't have the protection guarantees of multiple copies of metadata chunks. After applying this patch, one can instead use the following methodology for conversion from a multi-device filesystem to a single device filesystem: 1. btrfs balance start -dconvert=single -mconvert=dup \ -sconvert=dup -f / 2. btrfs device delete / /dev/sdx This greatly reduces the chances of the operation causing data loss due to a read error during the device delete. Signed-off-by: Austin S. Hemmelgarn ahferro...@gmail.com Reviewed-by: David Sterba dste...@suse.cz Sounds useful. The muliple devices + DUP is allowed setup when the device is added, this patch only adds the 'delete' counterpart. The imroved data loss protection during the process is a good thing. Hi, Have you actually tried to queue it? Unless I'm missing something, it won't compile, and on top of that, it seems to be corrupted too.. The patch itself was made using git, AFAICT it should be fine. I've personally built and tested it using UML. It doesn't look fine. It was generated with git, but it got corrupted on the way: either how you pasted it or the email client you use is the problem. On Wed, Feb 19, 2014 at 6:10 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Currently, btrfs balance start fails when trying to convert metadata or system chunks to dup profile on filesystems with multiple devices. This requires that a conversion from a multi-device filesystem to a single device filesystem use the following methodology: 1. btrfs balance start -dconvert=single -mconvert=single \ -sconvert=single -f / 2. btrfs device delete / /dev/sdx 3. btrfs balance start -mconvert=dup -sconvert=dup / This results in a period of time (possibly very long if the devices are big) where you don't have the protection guarantees of multiple copies of metadata chunks. After applying this patch, one can instead use the following methodology for conversion from a multi-device filesystem to a single device filesystem: 1. btrfs balance start -dconvert=single -mconvert=dup \ -sconvert=dup -f / 2. btrfs device delete / /dev/sdx This greatly reduces the chances of the operation causing data loss due to a read error during the device delete. Signed-off-by: Austin S. Hemmelgarn ahferro...@gmail.com --- fs/btrfs/volumes.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 07629e9..38a9522 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3152,10 +3152,8 @@ int btrfs_balance(struct btrfs_balance_control *bctl, ^^^, that should be a single line num_devices--; } btrfs_dev_replace_unlock(fs_info-dev_replace); - allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; - if (num_devices == 1) - allowed |= BTRFS_BLOCK_GROUP_DUP; - else if (num_devices 1) + allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP; + if (num_devices 1) allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1); if (num_devices 2) allowed |= BTRFS_BLOCK_GROUP_RAID5; @@ -3221,6 +3219,21 @@ int btrfs_balance(struct btrfs_balance_control *bctl, ^^^, ditto goto out; } } + if (((bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) + (bctl-sys.target ~BTRFS_BLOCK_GROUP_DUP) || + (bctl-meta.flags BTRFS_BALANCE_ARGS_CONVERT) + (bctl-meta.target ~BTRFS_BLOCK_GROUP_DUP)) + (num_devs 1)) { + if (bctl-flags BTRFS_BALANCE_FORCE) { + btrfs_info(fs_info, force conversion of metadata + to dup profile on multiple devices); + } else { + btrfs_err(fs_info, balance will reduce metadata + integrity, use force if you want
Re: [PATCH 2/6] Btrfs-progs: add btrfsck functionality to btrfs
On Wed, Nov 13, 2013 at 7:13 PM, David Sterba dste...@suse.cz wrote: Hi, On Sun, Jun 02, 2013 at 05:47:38PM +0200, Dieter Ries wrote: For this to have any effect, 'h' must be added to getopt_long(), see attached patch 1. However, this results in btrfsck -h and --help doing different things: --help prints the usage message to stdout and exits with exit(0). -h prints the usage message to stderr and exits with exit(129). I made a patch to fix this, see attached patch 2. What it doesn't fix though is, that -h/--help and -? don't do the same thing. This is more complicated, as getop_long returns '?' for unknown options. FYI, both patchess added to integration. Hi David, FWIW, I think none of the btrfs sub-commands treat -h as a help option. (This is an artifact that was inherited from the the old btrfs-progs utility.) -h vs --help is actually consistent: -h results in a btrfs check: invalid option -- 'h' message, and therefore exits with 129. Since 'btrfs check -h' has clearly never worked we might want to keep the status quo. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: do not inc uncorrectable_errors counter on ro scrubs
Currently if we discover an error when scrubbing in ro mode we a) blindly increment the uncorrectable_errors counter, and b) spam the dmesg with the 'unable to fixup (regular) error at ...' message, even though a) we haven't tried to determine if the error is correctable or not, and b) we haven't tried to fixup anything. Fix this. Cc: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/scrub.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f21e2df..f94b98d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -938,8 +938,10 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) BTRFS_DEV_STAT_CORRUPTION_ERRS); } - if (sctx-readonly !sctx-is_dev_replace) - goto did_not_correct_error; + if (sctx-readonly) { + ASSERT(!sctx-is_dev_replace); + goto out; + } if (!is_metadata !have_csum) { struct scrub_fixup_nodatasum *fixup_nodatasum; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Btrfs: disable online raid-repair on ro mounts
This disables the if needed, write the good copy back before the read is completed part of the read sequence for read-only mounts. Cc: Jan Schmidt list.bt...@jan-o-sch.net Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/extent_io.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c09a40d..a0b7403 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1979,6 +1979,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start, struct btrfs_mapping_tree *map_tree = fs_info-mapping_tree; int ret; + ASSERT(!(fs_info-sb-s_flags MS_RDONLY)); BUG_ON(!mirror_num); /* we can't repair anything in raid56 yet */ @@ -2035,6 +2036,9 @@ int repair_eb_io_failure(struct btrfs_root *root, struct extent_buffer *eb, unsigned long i, num_pages = num_extent_pages(eb-start, eb-len); int ret = 0; + if (root-fs_info-sb-s_flags MS_RDONLY) + return -EROFS; + for (i = 0; i num_pages; i++) { struct page *p = extent_buffer_page(eb, i); ret = repair_io_failure(root-fs_info, start, PAGE_CACHE_SIZE, @@ -2056,12 +2060,12 @@ static int clean_io_failure(u64 start, struct page *page) u64 private; u64 private_failure; struct io_failure_record *failrec; - struct btrfs_fs_info *fs_info; + struct inode *inode = page-mapping-host; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info; struct extent_state *state; int num_copies; int did_repair = 0; int ret; - struct inode *inode = page-mapping-host; private = 0; ret = count_range_bits(BTRFS_I(inode)-io_failure_tree, private, @@ -2084,6 +2088,8 @@ static int clean_io_failure(u64 start, struct page *page) did_repair = 1; goto out; } + if (fs_info-sb-s_flags MS_RDONLY) + goto out; spin_lock(BTRFS_I(inode)-io_tree.lock); state = find_first_extent_bit_state(BTRFS_I(inode)-io_tree, @@ -2093,7 +2099,6 @@ static int clean_io_failure(u64 start, struct page *page) if (state state-start = failrec-start state-end = failrec-start + failrec-len - 1) { - fs_info = BTRFS_I(inode)-root-fs_info; num_copies = btrfs_num_copies(fs_info, failrec-logical, failrec-len); if (num_copies 1) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts
On Fri, Oct 11, 2013 at 4:13 AM, Wang Shilong wangsl.f...@cn.fujitsu.com wrote: On 10/11/2013 01:40 AM, Ilya Dryomov wrote: I have a question in my mind. Can we reach a state that there is operation in progress when filesystem has been readonly?If we do cancel operations on a ro filesystem, we should get No operations in progress . So if I understand you correctly you are saying that we should return the Not in progress error code if the filesystem is mounted ro and there is nothing to cancel. I actually did think about it, but decided against it because one can extend this argument to starting commands with something like If I order a start on a read-only fs and that operation has already been started, I should get EINPROGRESS., and that's not what we are currently doing. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts
On Fri, Oct 11, 2013 at 12:28 PM, Remco Hosman - Yerf-IT re...@yerf-it.nl wrote: i would expect a RO mount never to write anything to a filesystem. not even replay a journal (or a seperate option for that). Its possible that the device is not writable at all, if its a snapshot or a RO iscsi device of some kind. I agree, and that's exactly the point of this patch. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: nuke a bogus rw_devices decrement in __btrfs_close_devices
On mount failures, __btrfs_close_devices can be called well before dev-replace state is read and -is_tgtdev_for_dev_replace is set. This leads to a bogus decrement of -rw_devices and sets off a WARN_ON in __btrfs_close_devices if replace target device happens to be on the lists and we fail early in the mount sequence. Fix this by checking the devid instead of -is_tgtdev_for_dev_replace before the decrement: for replace targets devid is always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 043b215..a306db9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -666,7 +666,8 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) if (device-bdev) fs_devices-open_devices--; - if (device-writeable !device-is_tgtdev_for_dev_replace) { + if (device-writeable + device-devid != BTRFS_DEV_REPLACE_DEVID) { list_del_init(device-dev_alloc_list); fs_devices-rw_devices--; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: don't leak ioctl args in btrfs_ioctl_dev_replace
struct btrfs_ioctl_dev_replace_args memory is leaked if replace is requested on a read-only filesystem. Fix it. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ioctl.c |9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9d46f60..17bb98e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3679,9 +3679,10 @@ static long btrfs_ioctl_dev_replace(struct btrfs_root *root, void __user *arg) switch (p-cmd) { case BTRFS_IOCTL_DEV_REPLACE_CMD_START: - if (root-fs_info-sb-s_flags MS_RDONLY) - return -EROFS; - + if (root-fs_info-sb-s_flags MS_RDONLY) { + ret = -EROFS; + goto out; + } if (atomic_xchg( root-fs_info-mutually_exclusive_operation_running, 1)) { @@ -3707,7 +3708,7 @@ static long btrfs_ioctl_dev_replace(struct btrfs_root *root, void __user *arg) if (copy_to_user(arg, p, sizeof(*p))) ret = -EFAULT; - +out: kfree(p); return ret; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts
For both balance and replace, cancelling involves changing the on-disk state and committing a transaction, which is not a good thing to do on read-only filesystems. Cc: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/dev-replace.c |3 +++ fs/btrfs/volumes.c |3 +++ 2 files changed, 6 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 9efb94e..98df261 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -650,6 +650,9 @@ static u64 __btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) u64 result; int ret; + if (fs_info-sb-s_flags MS_RDONLY) + return -EROFS; + mutex_lock(dev_replace-lock_finishing_cancel_unmount); btrfs_dev_replace_lock(dev_replace); switch (dev_replace-replace_state) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a306db9..2630f38 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3424,6 +3424,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info) int btrfs_cancel_balance(struct btrfs_fs_info *fs_info) { + if (fs_info-sb-s_flags MS_RDONLY) + return -EROFS; + mutex_lock(fs_info-balance_mutex); if (!fs_info-balance_ctl) { mutex_unlock(fs_info-balance_mutex); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix the dev-replace suspend sequence
Replace progresses strictly from lower to higher offsets, and the progress is tracked in chunks, by storing the physical offset of the dev_extent which is being copied in the cursor_left field of btrfs_dev_replace_item. When we are done copying the chunk, left_cursor is updated to point one byte past the dev_extent, so that on resume we can skip the dev_extents that have already been copied. There is a major bug (which goes all the way back to the inception of dev-replace in 3.8) in the way left_cursor is bumped: the bump is done unconditionally, without any regard to the scrub_chunk return value. On suspend (and also on any kind of error) scrub_chunk returns early, i.e. without completing the copy. This leads to us skipping the chunk that hasn't been fully copied yet when resuming. Fix this by doing the cursor_left update only if scrub_chunk ret is 0. (On suspend scrub_chunk returns with -ECANCELED, so this fix covers both suspend and error cases.) Cc: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/scrub.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a18e0e2..f21e2df 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2717,8 +2717,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, mutex_unlock(fs_info-scrub_lock); wake_up(fs_info-scrub_pause_wait); - dev_replace-cursor_left = dev_replace-cursor_right; - dev_replace-item_needs_writeback = 1; btrfs_put_block_group(cache); if (ret) break; @@ -2732,6 +2730,9 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, break; } + dev_replace-cursor_left = dev_replace-cursor_right; + dev_replace-item_needs_writeback = 1; + key.offset = found_key.offset + length; btrfs_release_path(path); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: eliminate races in worker stopping code
The current implementation of worker threads in Btrfs has races in worker stopping code, which cause all kinds of panics and lockups when running btrfs/011 xfstest in a loop. The problem is that btrfs_stop_workers is unsynchronized with respect to check_idle_worker, check_busy_worker and __btrfs_start_workers. E.g., check_idle_worker race flow: btrfs_stop_workers():check_idle_worker(aworker): - grabs the lock - splices the idle list into the working list - removes the first worker from the working list - releases the lock to wait for its kthread's completion - grabs the lock - if aworker is on the working list, moves aworker from the working list to the idle list - releases the lock - grabs the lock - puts the worker - removes the second worker from the working list .. btrfs_stop_workers returns, aworker is on the idle list FS is umounted, memory is freed .. aworker is waken up, fireworks ensue With this applied, I wasn't able to trigger the problem in 48 hours, whereas previously I could reliably reproduce at least one of these races within an hour. Reported-by: David Sterba dste...@suse.cz Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/async-thread.c | 25 +++-- fs/btrfs/async-thread.h |2 ++ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 58b7d14..08cc08f 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -107,7 +107,8 @@ static void check_idle_worker(struct btrfs_worker_thread *worker) worker-idle = 1; /* the list may be empty if the worker is just starting */ - if (!list_empty(worker-worker_list)) { + if (!list_empty(worker-worker_list) + !worker-workers-stopping) { list_move(worker-worker_list, worker-workers-idle_list); } @@ -127,7 +128,8 @@ static void check_busy_worker(struct btrfs_worker_thread *worker) spin_lock_irqsave(worker-workers-lock, flags); worker-idle = 0; - if (!list_empty(worker-worker_list)) { + if (!list_empty(worker-worker_list) + !worker-workers-stopping) { list_move_tail(worker-worker_list, worker-workers-worker_list); } @@ -412,6 +414,7 @@ void btrfs_stop_workers(struct btrfs_workers *workers) int can_stop; spin_lock_irq(workers-lock); + workers-stopping = 1; list_splice_init(workers-idle_list, workers-worker_list); while (!list_empty(workers-worker_list)) { cur = workers-worker_list.next; @@ -455,6 +458,7 @@ void btrfs_init_workers(struct btrfs_workers *workers, char *name, int max, workers-ordered = 0; workers-atomic_start_pending = 0; workers-atomic_worker_start = async_helper; + workers-stopping = 0; } /* @@ -480,15 +484,19 @@ static int __btrfs_start_workers(struct btrfs_workers *workers) atomic_set(worker-num_pending, 0); atomic_set(worker-refs, 1); worker-workers = workers; - worker-task = kthread_run(worker_loop, worker, - btrfs-%s-%d, workers-name, - workers-num_workers + 1); + worker-task = kthread_create(worker_loop, worker, + btrfs-%s-%d, workers-name, + workers-num_workers + 1); if (IS_ERR(worker-task)) { ret = PTR_ERR(worker-task); - kfree(worker); goto fail; } + spin_lock_irq(workers-lock); + if (workers-stopping) { + spin_unlock_irq(workers-lock); + goto fail_kthread; + } list_add_tail(worker-worker_list, workers-idle_list); worker-idle = 1; workers-num_workers++; @@ -496,8 +504,13 @@ static int __btrfs_start_workers(struct btrfs_workers *workers) WARN_ON(workers-num_workers_starting 0); spin_unlock_irq(workers-lock); + wake_up_process(worker-task); return 0; + +fail_kthread: + kthread_stop(worker-task); fail: + kfree(worker); spin_lock_irq(workers-lock); workers-num_workers_starting--; spin_unlock_irq(workers-lock); diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h index 063698b..1f26792 100644 --- a/fs/btrfs/async-thread.h +++ b/fs/btrfs/async-thread.h @@ -107,6 +107,8 @@ struct btrfs_workers { /* extra name for this worker, used
[PATCH] Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev, can be processed before btrfs_scratch_superblock is called, which would result in a use-after-free on btrfs_device contents. Fix this by zeroing the superblock before the rcu callback is registered. Cc: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/dev-replace.c |5 + fs/btrfs/volumes.c |7 ++- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index a644353..5aa0718 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -535,10 +535,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, list_add(tgt_device-dev_alloc_list, fs_info-fs_devices-alloc_list); btrfs_rm_dev_replace_srcdev(fs_info, src_device); - if (src_device-bdev) { - /* zero out the old super */ - btrfs_scratch_superblock(src_device); - } + /* * this is again a consistent state where no dev_replace procedure * is running, the target device is part of the filesystem, the diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0052ca8..a9ac68a 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1715,6 +1715,7 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, struct btrfs_device *srcdev) { WARN_ON(!mutex_is_locked(fs_info-fs_devices-device_list_mutex)); + list_del_rcu(srcdev-dev_list); list_del_rcu(srcdev-dev_alloc_list); fs_info-fs_devices-num_devices--; @@ -1724,9 +1725,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, } if (srcdev-can_discard) fs_info-fs_devices-num_can_discard--; - if (srcdev-bdev) + if (srcdev-bdev) { fs_info-fs_devices-open_devices--; + /* zero out the old super */ + btrfs_scratch_superblock(srcdev); + } + call_rcu(srcdev-rcu, free_device); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: unify apostrophes in output files
On Thu, Sep 19, 2013 at 7:20 PM, Eric Sandeen sand...@redhat.com wrote: xfstests: unify apostrophes in output files From: Tomas Racek tra...@redhat.com With coreutils v8.16 the style of apostrophes changed from `word' to 'word'. This is breaking some tests which use the older form. This commit introduces function changes the golden output of the affected tests and introduces a filter for the older style output. [dchinner: modified to use a global filter in check rather than per-test filters] Signed-off-by: Tomas Racek tra...@redhat.com Signed-off-by: Dave Chinner dchin...@redhat.com Reviewed-by: Eric Sandeen sand...@redhat.com --- (Resending as proper top-level, modified, reviewed patch --Eric) check | 4 common/config | 1 + tests/generic/193.out | 16 tests/generic/230.out | 8 tests/generic/235.out | 2 +- tests/generic/245.out | 2 +- tests/generic/294.out | 8 tests/generic/306.out | 2 +- tests/xfs/103.out | 2 +- tests/xfs/200.out | 2 +- 10 files changed, 26 insertions(+), 21 deletions(-) diff --git a/check b/check index 4085eae..ba7fd21 100755 --- a/check +++ b/check @@ -478,6 +478,10 @@ do echo - no qualified output err=true else + + # coreutils 2.16+ changed quote formats in error messages from Should that be 8.16+? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 18/20] Btrfs-progs: fix magic return value in cmds-balance.c
On Tue, Sep 10, 2013 at 5:28 AM, Wang Shilong wangsl.f...@cn.fujitsu.com wrote: If there is no balance in progress, resume/pause/cancel will return 2. Usage or syntax errors will return 1. And 0 means operations return successfully. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- v1-v2: Address comments from Ilya Dryomov. Reviewed-by: Ilya Dryomov idryo...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/20] Btrfs-progs: fix magic return value in cmds-balance.c
Hi Wang, Thank you for doing the grunt work, it's been a long standing todo. See the comments below. On Wed, Sep 4, 2013 at 6:22 PM, Wang Shilong wangshilong1...@gmail.com wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com If there is no balance in progress.resume/pause/cancel will return 2. For usage or syntal errors will return 1. 0 means operations return successfully. This needs to be reworded (spelling, punctuation). Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- cmds-balance.c | 93 -- 1 file changed, 58 insertions(+), 35 deletions(-) diff --git a/cmds-balance.c b/cmds-balance.c index b7382ef..fd68051 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -58,7 +58,7 @@ static int parse_one_profile(const char *profile, u64 *flags) *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE; } else { fprintf(stderr, Unknown profile '%s'\n, profile); - return 1; + return -ENOENT; } return 0; @@ -68,12 +68,14 @@ static int parse_profiles(char *profiles, u64 *flags) { char *this_char; char *save_ptr = NULL; /* Satisfy static checkers */ + int ret; for (this_char = strtok_r(profiles, |, save_ptr); this_char != NULL; this_char = strtok_r(NULL, |, save_ptr)) { - if (parse_one_profile(this_char, flags)) - return 1; + ret = parse_one_profile(this_char, flags); + if (ret) + return ret; } return 0; @@ -86,7 +88,7 @@ static int parse_u64(const char *str, u64 *result) val = strtoull(str, endptr, 10); if (*endptr) - return 1; + return -EINVAL; *result = val; return 0; @@ -95,6 +97,7 @@ static int parse_u64(const char *str, u64 *result) static int parse_range(const char *range, u64 *start, u64 *end) { char *dots; + int ret; dots = strstr(range, ..); if (dots) { @@ -107,29 +110,31 @@ static int parse_range(const char *range, u64 *start, u64 *end) *end = (u64)-1; skipped++; } else { - if (parse_u64(rest, end)) - return 1; + ret = parse_u64(rest, end); + if (ret) + return ret; } if (dots == range) { *start = 0; skipped++; } else { + ret = parse_u64(rest, end); if (parse_u64(range, start)) - return 1; + return ret; } if (*start = *end) { fprintf(stderr, Range %llu..%llu doesn't make sense\n, (unsigned long long)*start, (unsigned long long)*end); - return 1; + return -EINVAL; } if (skipped = 1) return 0; } - return 1; + return -EINVAL; } static int parse_filters(char *filters, struct btrfs_balance_args *args) @@ -137,6 +142,7 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) char *this_char; char *value; char *save_ptr = NULL; /* Satisfy static checkers */ + int ret = 0; if (!filters) return 0; @@ -150,70 +156,72 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) if (!value || !*value) { fprintf(stderr, the profiles filter requires an argument\n); - return 1; + return -EINVAL; } if (parse_profiles(value, args-profiles)) { fprintf(stderr, Invalid profiles argument\n); - return 1; + return -EINVAL; } args-flags |= BTRFS_BALANCE_ARGS_PROFILES; } else if (!strcmp(this_char, usage)) { if (!value || !*value) { fprintf(stderr, the usage filter requires an argument\n); - return 1; + return -EINVAL; } if (parse_u64(value, args-usage) || args-usage 100) { fprintf(stderr,
[PATCH] Btrfs: do not add replace target to the alloc_list
If replace was suspended by the umount, replace target device is added to the fs_devices-alloc_list during a later mount. This is obviously wrong. -is_tgtdev_for_dev_replace is supposed to guard against that, but -is_tgtdev_for_dev_replace is (and can only ever be) initialized *after* everything is opened and fs_devices lists are populated. Fix this by checking the devid instead: for replace targets it's always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- At first I thought this was caused by my btrfs_device rollback patch, but no, it just made it easier to spot -- previously one had to reboot or rmmod/insmod before mounting a suspended replace. fs/btrfs/volumes.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c9a0977..5b99f19 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -793,7 +793,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fs_devices-rotating = 1; fs_devices-open_devices++; - if (device-writeable !device-is_tgtdev_for_dev_replace) { + if (device-writeable + device-devid != BTRFS_DEV_REPLACE_DEVID) { fs_devices-rw_devices++; list_add(device-dev_alloc_list, fs_devices-alloc_list); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: stop refusing the relocation of chunk 0
AFAICT chunk 0 is no longer special, and so it should be restriped just like every other chunk. One reason for this change is us refusing the relocation can lead to filesystems that can only be mounted ro, and never rw -- see the bugzilla [1] for details. The other reason is that device removal code is already doing this: it will happily relocate chunk 0 is part of shrinking the device. [1] https://bugzilla.kernel.org/show_bug.cgi?id=60594 Reported-by: Xavier Bassery xav...@bartica.org Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c9a0977..e8325de 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2991,10 +2991,6 @@ again: if (found_key.objectid != key.objectid) break; - /* chunk zero is special */ - if (found_key.offset == 0) - break; - chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); if (!counting) { @@ -3030,6 +3026,8 @@ again: spin_unlock(fs_info-balance_lock); } loop: + if (found_key.offset == 0) + break; key.offset = found_key.offset - 1; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Btrfs: add btrfs_alloc_device and switch to it
On Fri, Aug 23, 2013 at 10:35 AM, Stefan Behrens sbehr...@giantdisaster.de wrote: This WARN_ON(1) is triggered with the device replace procedure because BTRFS_DEV_REPLACE_DEVID is zero. Ah, a dreaded 0 in C. What a screw up. This came from my rebuild branch where btrfs_init_dev_replace_tgtdev is a bit different, and I just auto-merged it. I'll fix it up. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/4] Btrfs: add btrfs_alloc_device and switch to it
Currently btrfs_device is allocated ad-hoc in a few different places, and as a result not all fields are initialized properly. In particular, readahead state is only initialized in device_list_add (at scan time), and not in btrfs_init_new_device (when the new device is added with 'btrfs dev add'). Fix this by adding an allocation helper and switch everybody but __btrfs_close_devices to it. (__btrfs_close_devices is dealt with in a later commit.) Signed-off-by: Ilya Dryomov idryo...@gmail.com --- v2: - fixed a dev-replace regression spotted by Stefan Behrens fs/btrfs/volumes.c | 151 fs/btrfs/volumes.h |3 ++ 2 files changed, 97 insertions(+), 57 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ae1bcb0..100c21c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -101,6 +101,27 @@ void btrfs_cleanup_fs_uuids(void) } } +static struct btrfs_device *__alloc_device(void) +{ + struct btrfs_device *dev; + + dev = kzalloc(sizeof(*dev), GFP_NOFS); + if (!dev) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(dev-dev_list); + INIT_LIST_HEAD(dev-dev_alloc_list); + + spin_lock_init(dev-io_lock); + + spin_lock_init(dev-reada_lock); + atomic_set(dev-reada_in_flight, 0); + INIT_RADIX_TREE(dev-reada_zones, GFP_NOFS ~__GFP_WAIT); + INIT_RADIX_TREE(dev-reada_extents, GFP_NOFS ~__GFP_WAIT); + + return dev; +} + static noinline struct btrfs_device *__find_device(struct list_head *head, u64 devid, u8 *uuid) { @@ -414,17 +435,12 @@ static noinline int device_list_add(const char *path, if (fs_devices-opened) return -EBUSY; - device = kzalloc(sizeof(*device), GFP_NOFS); - if (!device) { + device = btrfs_alloc_device(NULL, devid, + disk_super-dev_item.uuid); + if (IS_ERR(device)) { /* we can safely leave the fs_devices entry around */ - return -ENOMEM; + return PTR_ERR(device); } - device-devid = devid; - device-dev_stats_valid = 0; - device-work.func = pending_bios_fn; - memcpy(device-uuid, disk_super-dev_item.uuid, - BTRFS_UUID_SIZE); - spin_lock_init(device-io_lock); name = rcu_string_strdup(path, GFP_NOFS); if (!name) { @@ -432,15 +448,6 @@ static noinline int device_list_add(const char *path, return -ENOMEM; } rcu_assign_pointer(device-name, name); - INIT_LIST_HEAD(device-dev_alloc_list); - - /* init readahead state */ - spin_lock_init(device-reada_lock); - device-reada_curr_zone = NULL; - atomic_set(device-reada_in_flight, 0); - device-reada_next = 0; - INIT_RADIX_TREE(device-reada_zones, GFP_NOFS ~__GFP_WAIT); - INIT_RADIX_TREE(device-reada_extents, GFP_NOFS ~__GFP_WAIT); mutex_lock(fs_devices-device_list_mutex); list_add_rcu(device-dev_list, fs_devices-devices); @@ -491,8 +498,9 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) list_for_each_entry(orig_dev, orig-devices, dev_list) { struct rcu_string *name; - device = kzalloc(sizeof(*device), GFP_NOFS); - if (!device) + device = btrfs_alloc_device(NULL, orig_dev-devid, + orig_dev-uuid); + if (IS_ERR(device)) goto error; /* @@ -506,13 +514,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) } rcu_assign_pointer(device-name, name); - device-devid = orig_dev-devid; - device-work.func = pending_bios_fn; - memcpy(device-uuid, orig_dev-uuid, sizeof(device-uuid)); - spin_lock_init(device-io_lock); - INIT_LIST_HEAD(device-dev_list); - INIT_LIST_HEAD(device-dev_alloc_list); - list_add(device-dev_list, fs_devices-devices); device-fs_devices = fs_devices; fs_devices-num_devices++; @@ -1956,10 +1957,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) } mutex_unlock(root-fs_info-fs_devices-device_list_mutex); - device = kzalloc(sizeof(*device), GFP_NOFS); - if (!device) { + device = btrfs_alloc_device(root-fs_info, NULL, NULL); + if (IS_ERR(device)) { /* we can safely leave the fs_devices entry around */ - ret = -ENOMEM
[PATCH 4/4] Btrfs: rollback btrfs_device fields on umount
It turns out we don't properly rollback in-core btrfs_device state on umount. We zero out -bdev, -in_fs_metadata and that's about it. In particular, we don't zero out -generation, and this can lead to us refusing a mount -- a non-NULL fs_devices-latest_bdev is essential, but btrfs_close_extra_devices will happily assign NULL to -latest_bdev if the first device on the dev_list happens to be missing and consequently has no bdev attached. This happens because since commit a6b0d5c8 btrfs_close_extra_devices adjusts -latest_bdev, and in doing that, relies on the -generation. Fix this, and possibly other problems, by zeroing out everything except for what device_list_add sets, so that a mount right after insmod and 'btrfs dev scan' is no different from any later mount in this respect. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index af17b17..2f6bc12 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -673,22 +673,19 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) if (device-can_discard) fs_devices-num_can_discard--; - new_device = kmalloc(sizeof(*new_device), GFP_NOFS); - BUG_ON(!new_device); /* -ENOMEM */ - memcpy(new_device, device, sizeof(*new_device)); + new_device = btrfs_alloc_device(NULL, device-devid, + device-uuid); + BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ /* Safe because we are under uuid_mutex */ if (device-name) { name = rcu_string_strdup(device-name-str, GFP_NOFS); - BUG_ON(device-name !name); /* -ENOMEM */ + BUG_ON(!name); /* -ENOMEM */ rcu_assign_pointer(new_device-name, name); } - new_device-bdev = NULL; - new_device-writeable = 0; - new_device-in_fs_metadata = 0; - new_device-can_discard = 0; - spin_lock_init(new_device-io_lock); + list_replace_rcu(device-dev_list, new_device-dev_list); + new_device-fs_devices = device-fs_devices; call_rcu(device-rcu, free_device); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Btrfs: add btrfs_alloc_device and switch to it
Currently btrfs_device is allocated ad-hoc in a few different places, and as a result not all fields are initialized properly. In particular, readahead state is only initialized in device_list_add (at scan time), and not in btrfs_init_new_device (when the new device is added with 'btrfs dev add'). Fix this by adding an allocation helper and switch everybody but __btrfs_close_devices to it. (__btrfs_close_devices is dealt with in a later commit.) Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 150 fs/btrfs/volumes.h |3 ++ 2 files changed, 96 insertions(+), 57 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ae1bcb0..fe52704 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -101,6 +101,27 @@ void btrfs_cleanup_fs_uuids(void) } } +static struct btrfs_device *__alloc_device(void) +{ + struct btrfs_device *dev; + + dev = kzalloc(sizeof(*dev), GFP_NOFS); + if (!dev) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(dev-dev_list); + INIT_LIST_HEAD(dev-dev_alloc_list); + + spin_lock_init(dev-io_lock); + + spin_lock_init(dev-reada_lock); + atomic_set(dev-reada_in_flight, 0); + INIT_RADIX_TREE(dev-reada_zones, GFP_NOFS ~__GFP_WAIT); + INIT_RADIX_TREE(dev-reada_extents, GFP_NOFS ~__GFP_WAIT); + + return dev; +} + static noinline struct btrfs_device *__find_device(struct list_head *head, u64 devid, u8 *uuid) { @@ -414,17 +435,12 @@ static noinline int device_list_add(const char *path, if (fs_devices-opened) return -EBUSY; - device = kzalloc(sizeof(*device), GFP_NOFS); - if (!device) { + device = btrfs_alloc_device(NULL, devid, + disk_super-dev_item.uuid); + if (IS_ERR(device)) { /* we can safely leave the fs_devices entry around */ - return -ENOMEM; + return PTR_ERR(device); } - device-devid = devid; - device-dev_stats_valid = 0; - device-work.func = pending_bios_fn; - memcpy(device-uuid, disk_super-dev_item.uuid, - BTRFS_UUID_SIZE); - spin_lock_init(device-io_lock); name = rcu_string_strdup(path, GFP_NOFS); if (!name) { @@ -432,15 +448,6 @@ static noinline int device_list_add(const char *path, return -ENOMEM; } rcu_assign_pointer(device-name, name); - INIT_LIST_HEAD(device-dev_alloc_list); - - /* init readahead state */ - spin_lock_init(device-reada_lock); - device-reada_curr_zone = NULL; - atomic_set(device-reada_in_flight, 0); - device-reada_next = 0; - INIT_RADIX_TREE(device-reada_zones, GFP_NOFS ~__GFP_WAIT); - INIT_RADIX_TREE(device-reada_extents, GFP_NOFS ~__GFP_WAIT); mutex_lock(fs_devices-device_list_mutex); list_add_rcu(device-dev_list, fs_devices-devices); @@ -491,8 +498,9 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) list_for_each_entry(orig_dev, orig-devices, dev_list) { struct rcu_string *name; - device = kzalloc(sizeof(*device), GFP_NOFS); - if (!device) + device = btrfs_alloc_device(NULL, orig_dev-devid, + orig_dev-uuid); + if (IS_ERR(device)) goto error; /* @@ -506,13 +514,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) } rcu_assign_pointer(device-name, name); - device-devid = orig_dev-devid; - device-work.func = pending_bios_fn; - memcpy(device-uuid, orig_dev-uuid, sizeof(device-uuid)); - spin_lock_init(device-io_lock); - INIT_LIST_HEAD(device-dev_list); - INIT_LIST_HEAD(device-dev_alloc_list); - list_add(device-dev_list, fs_devices-devices); device-fs_devices = fs_devices; fs_devices-num_devices++; @@ -1956,10 +1957,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) } mutex_unlock(root-fs_info-fs_devices-device_list_mutex); - device = kzalloc(sizeof(*device), GFP_NOFS); - if (!device) { + device = btrfs_alloc_device(root-fs_info, NULL, NULL); + if (IS_ERR(device)) { /* we can safely leave the fs_devices entry around */ - ret = -ENOMEM; + ret = PTR_ERR(device); goto error
[PATCH 3/4] Btrfs: add alloc_fs_devices and switch to it
In the spirit of btrfs_alloc_device, add a helper for allocating and doing some common initialization of btrfs_fs_devices struct. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 71 +++- 1 file changed, 53 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index fe52704..af17b17 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -62,6 +62,48 @@ static void unlock_chunks(struct btrfs_root *root) mutex_unlock(root-fs_info-chunk_mutex); } +static struct btrfs_fs_devices *__alloc_fs_devices(void) +{ + struct btrfs_fs_devices *fs_devs; + + fs_devs = kzalloc(sizeof(*fs_devs), GFP_NOFS); + if (!fs_devs) + return ERR_PTR(-ENOMEM); + + mutex_init(fs_devs-device_list_mutex); + + INIT_LIST_HEAD(fs_devs-devices); + INIT_LIST_HEAD(fs_devs-alloc_list); + INIT_LIST_HEAD(fs_devs-list); + + return fs_devs; +} + +/** + * alloc_fs_devices - allocate struct btrfs_fs_devices + * @fsid: a pointer to UUID for this FS. If NULL a new UUID is + * generated. + * + * Return: a pointer to a new struct btrfs_fs_devices on success; + * ERR_PTR() on error. Returned struct is not linked onto any lists and + * can be destroyed with kfree() right away. + */ +static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid) +{ + struct btrfs_fs_devices *fs_devs; + + fs_devs = __alloc_fs_devices(); + if (IS_ERR(fs_devs)) + return fs_devs; + + if (fsid) + memcpy(fs_devs-fsid, fsid, BTRFS_FSID_SIZE); + else + generate_random_uuid(fs_devs-fsid); + + return fs_devs; +} + static void free_fs_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_device *device; @@ -416,16 +458,14 @@ static noinline int device_list_add(const char *path, fs_devices = find_fsid(disk_super-fsid); if (!fs_devices) { - fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS); - if (!fs_devices) - return -ENOMEM; - INIT_LIST_HEAD(fs_devices-devices); - INIT_LIST_HEAD(fs_devices-alloc_list); + fs_devices = alloc_fs_devices(disk_super-fsid); + if (IS_ERR(fs_devices)) + return PTR_ERR(fs_devices); + list_add(fs_devices-list, fs_uuids); - memcpy(fs_devices-fsid, disk_super-fsid, BTRFS_FSID_SIZE); fs_devices-latest_devid = devid; fs_devices-latest_trans = found_transid; - mutex_init(fs_devices-device_list_mutex); + device = NULL; } else { device = __find_device(fs_devices-devices, devid, @@ -481,18 +521,13 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) struct btrfs_device *device; struct btrfs_device *orig_dev; - fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS); - if (!fs_devices) - return ERR_PTR(-ENOMEM); + fs_devices = alloc_fs_devices(orig-fsid); + if (IS_ERR(fs_devices)) + return fs_devices; - INIT_LIST_HEAD(fs_devices-devices); - INIT_LIST_HEAD(fs_devices-alloc_list); - INIT_LIST_HEAD(fs_devices-list); - mutex_init(fs_devices-device_list_mutex); fs_devices-latest_devid = orig-latest_devid; fs_devices-latest_trans = orig-latest_trans; fs_devices-total_devices = orig-total_devices; - memcpy(fs_devices-fsid, orig-fsid, sizeof(fs_devices-fsid)); /* We have held the volume lock, it is safe to get the devices. */ list_for_each_entry(orig_dev, orig-devices, dev_list) { @@ -1794,9 +1829,9 @@ static int btrfs_prepare_sprout(struct btrfs_root *root) if (!fs_devices-seeding) return -EINVAL; - seed_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS); - if (!seed_devices) - return -ENOMEM; + seed_devices = __alloc_fs_devices(); + if (IS_ERR(seed_devices)) + return PTR_ERR(seed_devices); old_devices = clone_fs_devices(fs_devices); if (IS_ERR(old_devices)) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] btrfs_device fixes
Hello, This patch series does two things: - adds allocation helpers for struct btrfs_device and struct btrfs_fs_devices so that all the list_heads, spinlocks, etc are properly initialized and the code for that is in one place; - fixes a bug in the umount sequence, which, under certain conditions, can lead to mount failures. This is on top of 3.11-rc3, but applies cleanly to btrfs-next/master as of now. You can pull from: git://github.com/idryomov/btrfs-unstable.git dev-fixes Thanks, Ilya Ilya Dryomov (4): Btrfs: find_next_devid: root - fs_info Btrfs: add btrfs_alloc_device and switch to it Btrfs: add alloc_fs_devices and switch to it Btrfs: rollback btrfs_device fields on umount fs/btrfs/volumes.c | 250 +--- fs/btrfs/volumes.h |3 + 2 files changed, 162 insertions(+), 91 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Btrfs: find_next_devid: root - fs_info
find_next_devid() knows which root to search, so it should take an fs_info instead of an arbitrary root. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 78b8717..ae1bcb0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1307,15 +1307,14 @@ static u64 find_next_chunk(struct btrfs_fs_info *fs_info) return ret; } -static noinline int find_next_devid(struct btrfs_root *root, u64 *objectid) +static noinline int find_next_devid(struct btrfs_fs_info *fs_info, + u64 *devid_ret) { int ret; struct btrfs_key key; struct btrfs_key found_key; struct btrfs_path *path; - root = root-fs_info-chunk_root; - path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -1324,20 +1323,21 @@ static noinline int find_next_devid(struct btrfs_root *root, u64 *objectid) key.type = BTRFS_DEV_ITEM_KEY; key.offset = (u64)-1; - ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + ret = btrfs_search_slot(NULL, fs_info-chunk_root, key, path, 0, 0); if (ret 0) goto error; BUG_ON(ret == 0); /* Corruption */ - ret = btrfs_previous_item(root, path, BTRFS_DEV_ITEMS_OBJECTID, + ret = btrfs_previous_item(fs_info-chunk_root, path, + BTRFS_DEV_ITEMS_OBJECTID, BTRFS_DEV_ITEM_KEY); if (ret) { - *objectid = 1; + *devid_ret = 1; } else { btrfs_item_key_to_cpu(path-nodes[0], found_key, path-slots[0]); - *objectid = found_key.offset + 1; + *devid_ret = found_key.offset + 1; } ret = 0; error: @@ -1971,7 +1971,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) } rcu_assign_pointer(device-name, name); - ret = find_next_devid(root, device-devid); + ret = find_next_devid(root-fs_info, device-devid); if (ret) { rcu_string_free(device-name); kfree(device); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: num_tolerated_disk_barrier_failures is incorrect for RAID6
Currently num_tolerated_disk_barrier_failures gets the value of fs_devices-num_devices in the RAID6 case. But, RAID6 can tolerate only two simultaneous failures, so set it to 2. CC: Stefan Behrens sbehr...@giantdisaster.de Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b8b60b6..aecf788 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3258,7 +3258,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( BTRFS_BLOCK_GROUP_RAID10)) { num_tolerated_disk_barrier_failures = 1; } else if (flags - BTRFS_BLOCK_GROUP_RAID5) { + BTRFS_BLOCK_GROUP_RAID6) { num_tolerated_disk_barrier_failures = 2; } } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance resume + raid5/6
On Sun, Jun 16, 2013 at 06:00:10AM +, Tim Castle wrote: Greetings! I'm testing raid6, and recently added two drives. I haven't been able to properly resume a balance operation: the number of total chunks is always too low. It seems that the balance starts and pauses properly, but always resumes with ~7 chunks. Hi Tim, This is the expected behaviour. Balance does not actually remember which chunks it had processed before it was paused, instead it uses a heuristic which is based on the amount of space used out of each chunk. As you have an almost full filesystem, all of your chunks, except for those ~7, are fully (or almost fully) used, and balance resume simply skips them. If you really want to balance everything (why?) you would have to run balance to completion, w/o pausing it. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] How to enable btrfs reserve block space from specific device?
On Wed, Mar 27, 2013 at 09:45:59AM +0800, Zhi Yong Wu wrote: HI, When i work on btrfs hot relocation feature, i hit one question about block reservation. btrfs hot relocation need to reserve block space from specific devices such as SSD, but current btrfs reserving code doesn't differentiate if block space is reserved from SSD or HDD. In order to make btrfs support this, I thought that we can introduce one new block group or flag, but this will maybe make large impact on current existing other functions. For this, does any guy have some better idea? thanks. Hi, Sorry for the late reply. I have a lot on my plate right now, so I haven't looked at your WIP code. However, based on my knowledge, I can tell you that block reservation problem is completely orthogonal to the way you choose to handle hot data storage. A separate block group is one way to do it, and probably the easiest one. Block groups - chunks is a 1:1 mapping, so, for now, it might make sense to have one giant block group spanning over the entire hot data device. OTOH, IIRC the original implementation from IBM just used the rotation flag to detect hot data devices. You have a lot of choices here, but, if you are planning on implementing mirrored/striped hot data devices in future, I'd definitely go with block groups, since that'll give you some of the infrastructure for that for free. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: document mount options in Documentation/fs/btrfs.txt
On Sat, Mar 23, 2013 at 12:48:54PM -0500, Eric Sandeen wrote: Document all current btrfs mount options. Signed-off-by: Eric Sandeen sand...@redhat.com --- please, Please, PLEASE review this and suggest improvements. I'm no btrfs wizard but I've done my best to get this all right based on commit logs, code reading, and wiki reading. In cases where any of those 3 disagreed, I've done my best to document reality, but I'm sure it could use corrections, clarifications, etc. In particular, some of these mount options could really use more description of why a user might want them, rather than simply stating what they do. notreelog is particularly egregious, as I have nothing but self-referential documentation. So this can use some iteration, I'm sure, but hopefully it's a decent start. Thanks, -Eric diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 7671352..02a19c8 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -1,6 +1,6 @@ - BTRFS - = +BTRFS += Btrfs is a new copy on write filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, @@ -34,9 +34,173 @@ The main Btrfs features include: * Online filesystem defragmentation - - MAILING LIST - +Mount Options += + +When mounting a btrfs filesystem, the following option are accepted. +Unless otherwise specified, all options default to off. + + alloc_start=bytes + Debugging option to force all block allocations above a threshold. + The value is specified in bytes, optionally with a K, M, or G suffix, + case insensitive. Default is 1MB. + + autodefrag + Detect small random writes into files and queue them up for the + defrag process. Works best for small files; Not well suited for + large database workloads. + + check_int + check_int_data + check_int_print_mask=value + These debugging options control the behavior of the integrity checking + module (the BTRFS_FS_CHECK_INTEGRITY config option required). + + check_int enables the integrity checker module, which examines all + block write requests to ensure on-disk consistency, at a large + memory and CPU cost. + + check_int_data includes extent data in the integrity checks, and + implies the check_int option. + + check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values + as defined in fs/btrfs/check-integrity.c, to control the integrity + checker module behavior. + + See comments at the top of fs/btrfs/check-integrity.c for more info. + + compress + compress=type + compress-force + compress-force=type + Control BTRFS file data compression. Type may be specified as zlib + lzo or no (for no compression, used for remounting). If no type + is specified, zlib is used. If compress-force is specified, + all files will be compressed, whether or not they compress well. + If compression is enabled, nodatacow and nodatasum are disabled. + + degraded + Allow mounts to continue with missing devices. A read-write mount may + fail with too many devices missing, for example if a stripe member + is completely missing. + + device=devicepath + Specify a device during mount so that ioctls on the control device + can be avoided. Especialy useful when trying to mount a multi-device + setup as root. May be specified multiple times for multiple devices. + + discard + Issue command to let the block device reclaim space freed by the + filesystem. This is useful for SSD devices, thinly provisioned + LUNs and virtual machine images, but may have a performance + impact. + + enospc_debug + Debugging option to be more verbose in some ENOSPC conditions. + + fatal_errors=action + Action to take when encountering a fatal error: + bug - BUG() on a fatal error. This is the default. + panic - panic() on a fatal error. + + flushoncommit + The 'flushoncommit' mount option forces any data dirtied by a write in a + prior transaction to commit as part of the current commit. This makes + the committed state a fully consistent view of the file system from the + application's perspective (i.e., it includes all completed file system + operations). This was previously the behavior only when a snapshot is + created. + + inode_cache + Enable free inode number caching. Defaults to off due to an overflow + problem when the free space crcs don't fit inside a single page. + + max_inline=bytes + Specify the maximum amount of space, in bytes, that can be inlined in + a metadata B-tree leaf. The value is specified in bytes, optionally + with a K, M, or G suffix, case insensitive. In practice, this value + is
[PATCH] Btrfs: fix a mismerge in btrfs_balance()
Raid56 merge (merge commit e942f88) had mistakenly removed a call to __cancel_balance(), which resulted in balance not cleaning up after itself after a successful finish. (Cleanup includes switching the state, removing the balance item and releasing mut_ex_op testnset lock.) Bring it back. Reported-by: David Sterba dste...@suse.cz Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 35bb2d4..33440da 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3230,6 +3230,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl, update_ioctl_balance_args(fs_info, 0, bargs); } + if ((ret ret != -ECANCELED ret != -ENOSPC) || + balance_need_close(fs_info)) { + __cancel_balance(fs_info); + } + wake_up(fs_info-balance_wait_q); return ret; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: clean snapshots one by one
On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote: The mail did not make it to the list on Monday, anyway, now I have a few testing results to share. The umount responsiveness is much improved, it took a few seconds when there were many roots to clean scheduled. With $ echo 'func btrfs_drop_snapshot +pf' /sys/debug/kernel/dynamic_debug/control $ echo 'func btrfs_clean_one_deleted_snapshot +pf' /sys/debug/kernel/dynamic_debug/control one can watch how it proceeds. After umount and and mount, the cleaning process starts again, after 30 seconds when transaction commit triggers. Next test was to let it run with balance (just 'fi balance', no other options). Worked well, but I've hit an oops from balance cancel command that was triggered sometime during the balance. According to the log it happened right after the balance finished. CCing Ilya, I'm not sure if this is somehow induced by the patch. 3407 BUG_ON(fs_info-balance_ctl || atomic_read(fs_info-balance_running)); Were you by any chance running this on top of a for-linus that included a raid56 merge commit? If so, this is result of a raid56 mismerge -- fs_info-balance_ctl is supposed to be cleared in __cancel_balance(), a call to which was accidentally nuked. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: return EPERM in btrfs_rm_device()
On Sat, Mar 02, 2013 at 12:13:59AM -0700, Jerry Snitselaar wrote: Currently there are error paths in btrfs_rm_device() where EINVAL is returned telling the user they passed an invalid argument even though they passed a valid device. Change to return EPERM instead as the operation is not permitted. Signed-off-by: Jerry Snitselaar jerry.snitsel...@oracle.com --- fs/btrfs/volumes.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5cbb7f4..3e1586c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1392,14 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if ((all_avail BTRFS_BLOCK_GROUP_RAID10) num_devices = 4) { printk(KERN_ERR btrfs: unable to go below four devices on raid10\n); - ret = -EINVAL; + ret = -EPERM; goto out; } if ((all_avail BTRFS_BLOCK_GROUP_RAID1) num_devices = 2) { printk(KERN_ERR btrfs: unable to go below two devices on raid1\n); - ret = -EINVAL; + ret = -EPERM; goto out; } @@ -1449,14 +1449,14 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (device-is_tgtdev_for_dev_replace) { pr_err(btrfs: unable to remove the dev_replace target dev\n); - ret = -EINVAL; + ret = -EPERM; goto error_brelse; } if (device-writeable root-fs_info-fs_devices-rw_devices == 1) { printk(KERN_ERR btrfs: unable to remove the only writeable device\n); - ret = -EINVAL; + ret = -EPERM; I don't think returning EPERM in these cases is any better than EINVAL. FWIW, many other btrfs ioctls, especially balance, suffer from this as well. What we really need is some kind of error message delivery system, but that's not going to happen any time soon... Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep
On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote: The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: Remove accidentally-added debug printfs, thanks Geoffredo! diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..cfbb8eb 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; - for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { + while ((this_char = strsep(profiles, |))) { if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; if (!filters) return 0; - for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { + while ((this_char = strsep(filters , ,))) { ^^^ whitespace One of the differences between strtok() and strsep() is that the former allows multiple delimiters between two tokens. With strsep(), this btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt fails with error, whereas with strtok() it passes. I don't have a strong opinion here (this has been loosely modeled on the way mount(8) handles -o options), but might it be better to just initialize save_ptr? (And yes, I know that strsep() is better ;)) Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep
On Tue, Feb 26, 2013 at 02:46:30PM -0600, Eric Sandeen wrote: On 2/26/13 2:40 PM, Ilya Dryomov wrote: On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote: The coverity runs had a false positive complaining that save_ptr is uninitialized in the call to strtok_r. We could initialize it, but Zach points out that just using strsep is a lot simpler if there's only one delimiter, so just switch to that. Signed-off-by: Eric Sandeen sand...@redhat.com --- V2: Remove accidentally-added debug printfs, thanks Geoffredo! diff --git a/cmds-balance.c b/cmds-balance.c index b671e1d..cfbb8eb 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 *flags) static int parse_profiles(char *profiles, u64 *flags) { char *this_char; - char *save_ptr; - for (this_char = strtok_r(profiles, |, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, |, save_ptr)) { + while ((this_char = strsep(profiles, |))) { if (parse_one_profile(this_char, flags)) return 1; } @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) { char *this_char; char *value; - char *save_ptr; if (!filters) return 0; - for (this_char = strtok_r(filters, ,, save_ptr); - this_char != NULL; - this_char = strtok_r(NULL, ,, save_ptr)) { + while ((this_char = strsep(filters , ,))) { ^^^ whitespace One of the differences between strtok() and strsep() is that the former allows multiple delimiters between two tokens. With strsep(), this btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt fails with error, whereas with strtok() it passes. I don't have a strong opinion here (this has been loosely modeled on the way mount(8) handles -o options), but might it be better to just initialize save_ptr? (And yes, I know that strsep() is better ;)) I don't really care much either way, TBH. Initializing it seems a little bit magic, but with a comment as to why, it'd be fine. If you did it this way intentionally to allow the above format, and changing it would break expectations, then I'll happily just initialize save_ptr. Yeah, although I doubt anybody would notice, it's probably better to keep the old behaviour. (And, I realize that lots of these changes are pedantic and seemingly pointless, but we've gotten the static checker errors down from over 100 to under 30 and dropping; the more noise we remove the more likely we are to pay attention to the output and catch actual errors. At least that's my feeling; if people think this is getting to be pointless churn, I'm ok with that, too). Not at all. Btrfs-progs had rarely seen any cleanups, with you and Zach patching it up and David's integration work it has gotten a whole new life ;) BTW, huge thanks for the kernel-style build output, I am forever grateful.. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: eliminate a use-after-free in btrfs_balance()
Commit 5af3e8cc introduced a use-after-free at volumes.c:3139: bctl is freed above in __cancel_balance() in all cases except for balance pause. Fix this by moving the offending check a couple statements above, the meaning of the check is preserved. Reported-by: Chris Mason chris.ma...@fusionio.com Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 15f6efd..5d25aed 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3126,6 +3126,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl, mutex_lock(fs_info-balance_mutex); atomic_dec(fs_info-balance_running); + if (bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) { + fs_info-num_tolerated_disk_barrier_failures = + btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); + } + if (bargs) { memset(bargs, 0, sizeof(*bargs)); update_ioctl_balance_args(fs_info, 0, bargs); @@ -3136,11 +3141,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl, __cancel_balance(fs_info); } - if (bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) { - fs_info-num_tolerated_disk_barrier_failures = - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); - } - wake_up(fs_info-balance_wait_q); return ret; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: allow for selecting only completely empty chunks
Enhance balance usage filter by making it possible to balance out only completely empty chunks. Today, usage filter properly acts on values from 1 to 99 inclusive, usage=100 selects all chunks, and usage=0 selects no chunks. This commit changes the usage=0 case: the new meaning is to restripe only completely empty chunks and nothing else. Suggested-by: David Sterba dste...@suse.cz Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 15f6efd..7e41619 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2615,7 +2615,7 @@ static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, chunk_used = btrfs_block_group_used(cache-item); if (bargs-usage == 0) - user_thresh = 0; + user_thresh = 1; else if (bargs-usage 100) user_thresh = cache-key.offset; else -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: accept zero for balance usage filter
On Mon, Feb 11, 2013 at 01:33:52PM +0100, David Sterba wrote: The condition can be relaxed to accept also 0 which will delete unoccupied chunks and does not need space for the actual data relocation. Until there is an automatic empty chunk reclaim, we can use this as a last resort option under enospc. CC: Ilya Dryomov idryo...@gmail.com Signed-off-by: David Sterba dste...@suse.cz Josef, please don't pull this one into btrfs-next, it's been deprecated by Btrfs: allow for selecting only completely empty chunks from me. David and I discussed this on IRC, he is OK with it. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: make 0 a valid usage filter argument
This is a progs counterpart to a Btrfs: allow for selecting only completely empty chunks. usage=0 now means select only only completely empty chunks and nothing else. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- cmds-balance.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cmds-balance.c b/cmds-balance.c index 38a7426..1f888db 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -159,7 +159,7 @@ static int parse_filters(char *filters, struct btrfs_balance_args *args) return 1; } if (parse_u64(value, args-usage) || - args-usage 1 || args-usage 100) { + args-usage 100) { fprintf(stderr, Invalid usage argument: %s\n, value); return 1; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: move crc32c optimization init
Don't call crc32c_optimization_init() until we know that a command is actually going to be invoked. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- btrfs.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/btrfs.c b/btrfs.c index 687acec..7752bd6 100644 --- a/btrfs.c +++ b/btrfs.c @@ -261,8 +261,6 @@ int main(int argc, char **argv) { const struct cmd_struct *cmd; - crc32c_optimization_init(); - argc--; argv++; handle_options(argc, argv); @@ -278,6 +276,8 @@ int main(int argc, char **argv) handle_help_options_next_level(cmd, argc, argv); + crc32c_optimization_init(); + fixup_argv0(argv, cmd-token); exit(cmd-fn(argc, argv)); } -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] Btrfs-progs: add the rescue section to btrfs
On Fri, Feb 08, 2013 at 01:37:02AM +0100, Ian Kumlien wrote: the btrfs command now lists: btrfs rescue select-super -s number device Select a superblock btrfs rescue dump-super device Dump a superblock to disk btrfs rescue debug-tree [options] device Debug the filesystem btrfs-dump-super.c was imported in to cmds-rescue-super-ops.c This patch integrates all the functionality... cmds-rescue.c is used to glue cmds-rescue-debug-tree.c, cmds-rescue-restore.c and cmds-rescue-super-ops.c together to make the source files more managable. I think btrfs-debug-tree should go under debug -- btrfs debug dump-tree. Same goes for btrfs-dump-super -- btrfs debug dump-super. These commands won't help an average user to rescue a single byte, they are there to help developers. I suppose you can also import btrfs-image under btrfs debug dump-image. This leaves the question of whether we want select-super under rescue. Given that it can easily do more harm than good, it might not be the best place for it.. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] include btrfsck in btrfs - including name check
On Wed, Jan 30, 2013 at 10:11:44PM +0100, Ian Kumlien wrote: On Wed, Jan 30, 2013 at 12:33:42PM -0800, Filipe Brandenburger wrote: Hi Ian, On Tue, Jan 29, 2013 at 3:03 PM, Ian Kumlien po...@demius.net wrote: This patch includes fsck as a subcommand of btrfs, but if you rename the binary to btrfsck (or, preferably, use a symlink) it will act like the old btrfs command. You can rename files in your git (there's git mv for that), only thing is when you generate the patch with format-patch (or git show, git diff etc.) pass it the -M option to detect moves and act appropriately. git send-email seems to send the full diff, diffing against /dev/null =P This is why i skipped that part. Regarding your patches, I really like the idea of btrfs fsck but I think I'd prefer to keep the external commands as wrapper scripts instead of adding busybox-style name detection to btrfs... But then, that's just my opinion. Well, now both works. I guess I would have a btrfsck that would simply contain: #! /bin/sh exec btrfs fsck $@ Downside is that error reporting (e.g. invalid syntax, etc.) would show btrfs fsck instead of the command the user actually typed... Actually it still does, due to how btrfs handles things... It's a simple enough fix and it will make rescue cd's or dracut images, or just about anything. I understand your point, but i think this is a simpler solution =) FWIW I agree with Filipe, this name detection thing looks ugly to me. The merge itself is a good idea, but I think we should stick with shell wrappers for everything else. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix a regression in balance usage filter
Commit 3fed40cc (Btrfs: cleanup duplicated division functions), which was merged into 3.8-rc1, has introduced a regression by removing logic that was guarding us against bad user input. Bring it back. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 86279c3..b577aee 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2614,7 +2614,14 @@ static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, cache = btrfs_lookup_block_group(fs_info, chunk_offset); chunk_used = btrfs_block_group_used(cache-item); - user_thresh = div_factor_fine(cache-key.offset, bargs-usage); + if (bargs-usage == 0) + user_thresh = 0; + else if (bargs-usage 100) + user_thresh = cache-key.offset; + else + user_thresh = div_factor_fine(cache-key.offset, + bargs-usage); + if (chunk_used user_thresh) ret = 0; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] Btrfs: fix mutually exclusive op is running error code
The error code that is returned in response to starting a mutually exclusive operation when there is one already running got silently changed from EINVAL to EINPROGRESS by 5ac00add. Returning EINPROGRESS to, say, add_dev, when rm_dev is running is misleading. Furthermore, the operation itself may want to use EINPROGRESS for other purposes. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ioctl.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ab82636..3aebb28 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1337,7 +1337,7 @@ static noinline int btrfs_ioctl_resize(struct file *file, 1)) { pr_info(btrfs: dev add/delete/balance/replace/resize operation in progress\n); mnt_drop_write_file(file); - return -EINPROGRESS; + return -EINVAL; } mutex_lock(root-fs_info-volume_mutex); @@ -2193,7 +2193,7 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp) if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running, 1)) { pr_info(btrfs: dev add/delete/balance/replace/resize operation in progress\n); - return -EINPROGRESS; + return -EINVAL; } ret = mnt_want_write_file(file); if (ret) { @@ -2267,7 +2267,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, void __user *arg) if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running, 1)) { pr_info(btrfs: dev add/delete/balance/replace/resize operation in progress\n); - return -EINPROGRESS; + return -EINVAL; } mutex_lock(root-fs_info-volume_mutex); @@ -2304,7 +2304,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg) 1)) { pr_info(btrfs: dev add/delete/balance/replace/resize operation in progress\n); mnt_drop_write_file(file); - return -EINPROGRESS; + return -EINVAL; } mutex_lock(root-fs_info-volume_mutex); -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] Btrfs: fix unlock order in btrfs_ioctl_resize
Fix unlock order in btrfs_ioctl_resize(). Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 3aebb28..b9a0190 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1447,8 +1447,8 @@ out_free: kfree(vol_args); out: mutex_unlock(root-fs_info-volume_mutex); - mnt_drop_write_file(file); atomic_set(root-fs_info-mutually_exclusive_operation_running, 0); + mnt_drop_write_file(file); return ret; } -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Btrfs: fix unlock order in btrfs_ioctl_rm_dev
Fix unlock order in btrfs_ioctl_rm_dev(). Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index b9a0190..9c079dd 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2320,8 +2320,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg) kfree(vol_args); out: mutex_unlock(root-fs_info-volume_mutex); - mnt_drop_write_file(file); atomic_set(root-fs_info-mutually_exclusive_operation_running, 0); + mnt_drop_write_file(file); return ret; } -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Mutually exclusive ops fixes
Hello, This is a set of fixes for mutually exclusive ops stuff added by Stefan as part of the dev-replace patchset. The main one here is the first patch, it fixes a regression in balancing interface. The second patch brings EINVAL error code back. The rest are relatively minor ordering fixes. Thanks, Ilya Ilya Dryomov (5): Btrfs: bring back balance pause/resume logic Btrfs: fix mutually exclusive op is running error code Btrfs: fix unlock order in btrfs_ioctl_resize Btrfs: fix unlock order in btrfs_ioctl_rm_dev Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag fs/btrfs/ioctl.c | 107 ++- fs/btrfs/volumes.c | 10 +++- 2 files changed, 86 insertions(+), 31 deletions(-) -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Mutually exclusive ops fixes
On Sun, Jan 20, 2013 at 05:22:27PM +0200, Ilya Dryomov wrote: Hello, This is a set of fixes for mutually exclusive ops stuff added by Stefan as part of the dev-replace patchset. The main one here is the first patch, it fixes a regression in balancing interface. The second patch brings EINVAL error code back. The rest are relatively minor ordering fixes. You can pull from git://github.com/idryomov/btrfs-unstable.git mutex-ops@next-master top commit 2f68aafa90fedb3cdb80f0a8517a2130c5c035b0 Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] Btrfs: bring back balance pause/resume logic
Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1 as part of dev-replace merge). Offending commit took a stab at making mutually exclusive volume operations (add_dev, rm_dev, resize, balance, replace_dev) not block behind volume_mutex if another such operation is in progress and instead return an error right away. Balancing front-end relied on the blocking behaviour, so the fix is ugly, but short of a complete rework, it's the best we can do. Reported-by: Liu Bo bo.li@oracle.com Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ioctl.c | 78 ++- fs/btrfs/volumes.c | 10 -- 2 files changed, 71 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 171786a..ab82636 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3441,8 +3441,8 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg) struct btrfs_fs_info *fs_info = root-fs_info; struct btrfs_ioctl_balance_args *bargs; struct btrfs_balance_control *bctl; + bool need_unlock; /* for mut. excl. ops lock */ int ret; - int need_to_clear_lock = 0; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -3451,14 +3451,61 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg) if (ret) return ret; - mutex_lock(fs_info-volume_mutex); +again: + if (!atomic_xchg(fs_info-mutually_exclusive_operation_running, 1)) { + mutex_lock(fs_info-volume_mutex); + mutex_lock(fs_info-balance_mutex); + need_unlock = true; + goto locked; + } + + /* +* mut. excl. ops lock is locked. Three possibilites: +* (1) some other op is running +* (2) balance is running +* (3) balance is paused -- special case (think resume) +*/ mutex_lock(fs_info-balance_mutex); + if (fs_info-balance_ctl) { + /* this is either (2) or (3) */ + if (!atomic_read(fs_info-balance_running)) { + mutex_unlock(fs_info-balance_mutex); + if (!mutex_trylock(fs_info-volume_mutex)) + goto again; + mutex_lock(fs_info-balance_mutex); + + if (fs_info-balance_ctl + !atomic_read(fs_info-balance_running)) { + /* this is (3) */ + need_unlock = false; + goto locked; + } + + mutex_unlock(fs_info-balance_mutex); + mutex_unlock(fs_info-volume_mutex); + goto again; + } else { + /* this is (2) */ + mutex_unlock(fs_info-balance_mutex); + ret = -EINPROGRESS; + goto out; + } + } else { + /* this is (1) */ + mutex_unlock(fs_info-balance_mutex); + pr_info(btrfs: dev add/delete/balance/replace/resize operation in progress\n); + ret = -EINVAL; + goto out; + } + +locked: + BUG_ON(!atomic_read(fs_info-mutually_exclusive_operation_running)); if (arg) { bargs = memdup_user(arg, sizeof(*bargs)); if (IS_ERR(bargs)) { ret = PTR_ERR(bargs); - goto out; + goto out_unlock; } if (bargs-flags BTRFS_BALANCE_RESUME) { @@ -3478,13 +3525,10 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg) bargs = NULL; } - if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running, - 1)) { - pr_info(btrfs: dev add/delete/balance/replace/resize operation in progress\n); + if (fs_info-balance_ctl) { ret = -EINPROGRESS; goto out_bargs; } - need_to_clear_lock = 1; bctl = kzalloc(sizeof(*bctl), GFP_NOFS); if (!bctl) { @@ -3505,11 +3549,17 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg) } do_balance: - ret = btrfs_balance(bctl, bargs); /* -* bctl is freed in __cancel_balance or in free_fs_info if -* restriper was paused all the way until unmount +* Ownership of bctl and mutually_exclusive_operation_running +* goes to to btrfs_balance. bctl is freed in __cancel_balance, +* or, if restriper was paused all the way until unmount, in +* free_fs_info. mutually_exclusive_operation_running is +* cleared in __cancel_balance. */ + need_unlock = false; + + ret = btrfs_balance(bctl, bargs); + if (arg
Re: [PATCH] Btrfs: fix crash of starting balance
On Tue, Jan 15, 2013 at 10:47:57PM +0800, Liu Bo wrote: We will crash on BUG_ON(ret == -EEXIST) when we do not resume the existing balance but attempt to start a new one. The steps can be: 1. start balance 2. pause balance 3. start balance Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/volumes.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5cce6aa..3901654 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3100,7 +3100,12 @@ int btrfs_balance(struct btrfs_balance_control *bctl, goto out; if (!(bctl-flags BTRFS_BALANCE_RESUME)) { - BUG_ON(ret == -EEXIST); + /* + * This can happen when we do not resume the existing balance + * but try to start a new one instead. + */ + if (ret == -EEXIST) + goto out; set_balance_control(bctl); } else { BUG_ON(ret != -EEXIST); OK, it seems balance pause/resume logic got broken by dev-replace code (5ac00addc7ac09110995fe967071d191b5981cc1), which went into v3.8-rc1. This is most certainly not the right way to fix it, that BUG_ON is there for a reason. I'll send a fix in a couple of days. Thanks, Ilya -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df
On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote: This serie of patches updated the command btrfs filesystem df. I update this command because it is not so easy to get the information about the disk usage from the command fi df and fi show. From the man page (see 2nd patch): [...] The command btrfs filesystem df is used to query the status of the chunks, how many space on the disk(s) are used by the chunks, how many space are available in the chunks, and an estimation of the free space of the filesystem. [...] $ ./btrfs filesystem df --help usage: btrfs filesystem disk-usage [-d][-s][-k] path [path..] Show space usage information for a mount point(s). -k Set KB (1024 bytes) as unit -s Don't show the summary section -d Don't show the detail section Hi Goffredo, Why not show just the summary section by default and have a flag (-v) for the details section? $ ./btrfs filesystem df / Path: / Summary: Disk_size: 72.57GB Disk_allocated:25.10GB Disk_unallocated: 47.48GB Logical_size: 23.06GB Used: 11.01GB Free_(Estimated): 55.66GB(Max: 59.52GB, Min: 35.78GB) Data_to_disk_ratio: 92 % Details: Chunk-type Mode Chunk-size Logical-sizeUsed Type for the first column is probably enough. Why is the third column called Chunk-size? If my understanding is correct, it's just a break down of Disk_allocated from the summary section. If so, why not call it Disk_allocated to avoid confusion? Also, why do you use dashes instead of underbars for table headers? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Update btrfs filesystem df command
On Wed, Oct 03, 2012 at 01:43:15PM +0200, Goffredo Baroncelli wrote: The command btrfs filesystem df is used to query the status of the chunks, how many space on the disk(s) are used by the chunks, how many space are available in the chunks, and an estimation of the free space of the filesystem. --- cmds-filesystem.c | 264 +++-- 1 file changed, 215 insertions(+), 49 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index b1457de..6c3ebdc 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c (snipped) +static int cmd_disk_free(int argc, char **argv) +{ + + int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT; + int i, more_than_one=0; + + if (check_argc_min(argc, 2)) + usage(cmd_disk_free_usage); + + for(i=1; i argc ; i++){ + if(!strcmp(argv[i],-d)) + flags = ~DF_SHOW_DETAIL; + else if(!strcmp(argv[i],-s)) + flags = ~DF_SHOW_SUMMARY; + else if(!strcmp(argv[i],-k)) + flags = ~DF_HUMAN_UNIT; + else{ + int r; + if(more_than_one) + printf(\n); + r = _cmd_disk_free(argv[i], flags); + if( r ) return r; + more_than_one=1; + } Is there any reason getopt(), or better yet, getopt_long() won't work? + + } + + return 0; +} + + + static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search) { char uuidbuf[37]; @@ -529,7 +695,7 @@ static int cmd_label(int argc, char **argv) const struct cmd_group filesystem_cmd_group = { filesystem_cmd_group_usage, NULL, { - { df, cmd_df, cmd_df_usage, NULL, 0 }, + { df, cmd_disk_free, cmd_disk_free_usage, NULL, 0 }, If this command is going to replace df, you should change the function name back to cmd_df. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Update btrfs filesystem df command
On Wed, Oct 03, 2012 at 06:34:38PM +0200, Goffredo Baroncelli wrote: On 10/03/2012 05:02 PM, Ilya Dryomov wrote: On Wed, Oct 03, 2012 at 01:43:15PM +0200, Goffredo Baroncelli wrote: The command btrfs filesystem df is used to query the status of the chunks, how many space on the disk(s) are used by the chunks, how many space are available in the chunks, and an estimation of the free space of the filesystem. --- cmds-filesystem.c | 264 +++-- 1 file changed, 215 insertions(+), 49 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index b1457de..6c3ebdc 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c (snipped) +static int cmd_disk_free(int argc, char **argv) +{ + + int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT; + int i, more_than_one=0; + + if (check_argc_min(argc, 2)) + usage(cmd_disk_free_usage); + + for(i=1; i argc ; i++){ + if(!strcmp(argv[i],-d)) + flags= ~DF_SHOW_DETAIL; + else if(!strcmp(argv[i],-s)) + flags= ~DF_SHOW_SUMMARY; + else if(!strcmp(argv[i],-k)) + flags= ~DF_HUMAN_UNIT; + else{ + int r; + if(more_than_one) + printf(\n); + r = _cmd_disk_free(argv[i], flags); + if( r ) return r; + more_than_one=1; + } Is there any reason getopt(), or better yet, getopt_long() won't work? In the beginning there were also the switches +d, +s, +k, then I dropped them. So I leaved this style. The code is simple enough to not justify a change. It's not a matter of style. It is generally expected that several short options can be specified after one dash. Further, if somebody else were to add an option with an optional parameter, they'd have to spend time rewriting your stuff. In addition to that, other parts of progs already use getopt() in similar situations. One example would be the scrub subgroup: scrub commands are equally simple (a few short switches, no parameters), and yet getopt() is used. + + } + + return 0; +} + + + static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search) { char uuidbuf[37]; @@ -529,7 +695,7 @@ static int cmd_label(int argc, char **argv) const struct cmd_group filesystem_cmd_group = { filesystem_cmd_group_usage, NULL, { - { df, cmd_df, cmd_df_usage, NULL, 0 }, + { df, cmd_disk_free, cmd_disk_free_usage, NULL, 0 }, If this command is going to replace df, you should change the function name back to cmd_df. I was never convinced to use 'df'. At the beginning when I wrote the first parser of btrfs, was suggested (not by me) to use long command and allow the parser to match a contracted command until there was any ambiguity. I suggested to use disk-free, but everybody were confortable with df.. so I leaved it as official name. But I prefer for the internal code a more verbose name. Well, all your patch is doing is extending the functionality of an existing command. The official name for that command is df, and you are not changing its name. Why change the name of an existing function? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df
On Wed, Oct 03, 2012 at 06:46:00PM +0200, Goffredo Baroncelli wrote: On 10/03/2012 05:01 PM, Ilya Dryomov wrote: Type for the first column is probably enough. Why is the third column called Chunk-size? If my understanding is correct, it's just a break down of Disk_allocated from the summary section. If so, why not call it Disk_allocated to avoid confusion? Using everywhere Disk_something was my first attempt. But after some thoughts I decided that these are two different kind of information. It is true that Disk_allocated is the sum of Chunk-Sizes... But my feels is that this is a kind of implementation details. If some other type of allocation unit will be added to BTRFS, then these will be added to Disk_allocated, but not to Chunk list... I prefer to not change the wording until an enough critical mass of people converge to a unique solution . It is the chunks that is the implementation detail that we want to hide. Average Btrfs user wouldn't want to know anything about chunks, the only thing he'd be interested in is Disk_allocated and similar fields. Moreover, I am pretty sure Chunk-Size would actually confuse people. I stared at your example output for a few seconds trying to comprehend a 21GB Chunk-Size on a 72GB partition. What you call Chunk-Size is actually a sum of sizes of chunks of that particular type, and a few lines above you call the same exact sum (only this time over all types of chunks) Disk_allocated. So I think it's only logical to rename the column in question to Disk_allocated to match the summary section. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df
On Wed, Oct 03, 2012 at 10:01:00PM +0200, Goffredo Baroncelli wrote: On 10/03/2012 07:46 PM, Ilya Dryomov wrote: On Wed, Oct 03, 2012 at 06:46:00PM +0200, Goffredo Baroncelli wrote: On 10/03/2012 05:01 PM, Ilya Dryomov wrote: Type for the first column is probably enough. Why is the third column called Chunk-size? If my understanding is correct, it's just a break down of Disk_allocated from the summary section. If so, why not call it Disk_allocated to avoid confusion? Using everywhere Disk_something was my first attempt. But after some thoughts I decided that these are two different kind of information. It is true that Disk_allocated is the sum of Chunk-Sizes... But my feels is that this is a kind of implementation details. If some other type of allocation unit will be added to BTRFS, then these will be added to Disk_allocated, but not to Chunk list... I prefer to not change the wording until an enough critical mass of people converge to a unique solution . It is the chunks that is the implementation detail that we want to hide. Average Btrfs user wouldn't want to know anything about chunks, the only thing he'd be interested in is Disk_allocated and similar fields. The df standard tool id sufficient for the average user. We need only to export these information via the standard syscall stat[v]fs. Basically we should try to implement the algorithm of the Free_(Estimated) space for the statfs(2) syscall. Who uses btrfs tools, is an user with knowledge of btrfs higher than the average. Moreover, I am pretty sure Chunk-Size would actually confuse people. I stared at your example output for a few seconds trying to comprehend a 21GB Chunk-Size on a 72GB partition. What you call Chunk-Size is actually a sum of sizes of chunks of that particular type, and a few lines above you call the same exact sum (only this time over all types of chunks) Disk_allocated. So I think it's only logical to rename the column in question to Disk_allocated to match the summary section. What about [...] Details: Chunk_type Mode Size_(disk) Size_(logical) Used DataSingle 21.01GB 21.01GB 10.53GB System DUP 80.00MB 40.00MB 4.00KB [...] ? This is definitely better. Can you also drop Chunk_type in favor of Type? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Update btrfs filesystem df command
On Wed, Oct 03, 2012 at 07:22:31PM +0200, Goffredo Baroncelli wrote: From: Goffredo Baroncelli kreij...@inwind.it The command btrfs filesystem df is used to query the status of the chunks, how many space on the disk(s) are used by the chunks, how many space are available in the chunks, and an estimation of the free space of the filesystem. --- cmds-filesystem.c | 284 - 1 file changed, 235 insertions(+), 49 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index b1457de..2701904 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -22,6 +22,7 @@ #include errno.h #include uuid/uuid.h #include ctype.h +#include sys/vfs.h #include kerncompat.h #include ctree.h @@ -39,25 +40,55 @@ static const char * const filesystem_cmd_group_usage[] = { NULL }; -static const char * const cmd_df_usage[] = { - btrfs filesystem df path, - Show space usage information for a mount point, - NULL -}; +static u64 disk_size( char *path){ + struct statfs sfs; + + if( statfs(path, sfs) 0 ) + return 0; + else + return sfs.f_bsize * sfs.f_blocks; + +} + +static void print_string(char *s, int w) +{ + int i; + + printf(%s, s); + for( i = strlen(s) ; i w ; i++ ) + putchar(' '); +} + +#define DF_SHOW_SUMMARY (11) +#define DF_SHOW_DETAIL (12) +#define DF_HUMAN_UNIT(13) + +static void pretty_sizes_r(u64 size, int w, int mode) +{ + if( mode DF_HUMAN_UNIT ){ + char *s = pretty_sizes(size); + printf(%*s, w, s); + free(s); + } else { + printf(%*llu, w, size/1024); + + } +} -static int cmd_df(int argc, char **argv) +static int _cmd_disk_free(char *path, int mode) { struct btrfs_ioctl_space_args *sargs; u64 count = 0, i; int ret; int fd; - int e; - char *path; - - if (check_argc_exact(argc, 2)) - usage(cmd_df_usage); - - path = argv[1]; + int e, width; + u64 total_disk;/* filesystem size == sum of +disks sizes */ + u64 total_chunks; /* sum of chunks sizes on disk(s) */ + u64 total_used;/* logical space used */ + u64 total_free;/* logical space un-used */ + double K; + fd = open_file_or_dir(path); if (fd 0) { @@ -103,56 +134,211 @@ static int cmd_df(int argc, char **argv) return ret; } - for (i = 0; i sargs-total_spaces; i++) { - char description[80]; - char *total_bytes; - char *used_bytes; - int written = 0; - u64 flags = sargs-spaces[i].flags; + total_disk = disk_size(path); + e = errno; + if( total_disk == 0 ){ + fprintf(stderr, ERROR: couldn't get space info on '%s' - %s\n, + path, strerror(e)); + close(fd); + free(sargs); + return 19; + } + + total_chunks = total_used = total_free = 0; - memset(description, 0, 80); + for (i = 0; i sargs-total_spaces; i++) { + int ratio=1; + u64 allocated; - if (flags BTRFS_BLOCK_GROUP_DATA) { - if (flags BTRFS_BLOCK_GROUP_METADATA) { - snprintf(description, 14, %s, - Data+Metadata); - written += 13; - } else { - snprintf(description, 5, %s, Data); - written += 4; - } - } else if (flags BTRFS_BLOCK_GROUP_SYSTEM) { - snprintf(description, 7, %s, System); - written += 6; - } else if (flags BTRFS_BLOCK_GROUP_METADATA) { - snprintf(description, 9, %s, Metadata); - written += 8; - } + u64 flags = sargs-spaces[i].flags; if (flags BTRFS_BLOCK_GROUP_RAID0) { - snprintf(description+written, 8, %s, , RAID0); - written += 7; + ratio=1; } else if (flags BTRFS_BLOCK_GROUP_RAID1) { - snprintf(description+written, 8, %s, , RAID1); - written += 7; + ratio=2; } else if (flags BTRFS_BLOCK_GROUP_DUP) { - snprintf(description+written, 6, %s, , DUP); - written += 5; + ratio=2; } else if (flags BTRFS_BLOCK_GROUP_RAID10) { - snprintf(description+written, 9, %s, , RAID10); -
Re: [PATCH V4 1/2] Btrfs: cleanup duplicated division functions
Hi Miao, You haven't addressed any of my concerns with v3. On Thu, Sep 27, 2012 at 06:19:58PM +0800, Miao Xie wrote: (snipped) the parameters are right. So besides the code cleanup, this patch also add a check for the usage of the space balance, it is the only place that we need add check to make sure the parameters of div_factor are right till now. Besides that, the old kernel may hold the wrong usage value, so we must rectify it. Cleaning up/unifying duplicated functions and changing the existing logic are two very different things. If you, in the course of writing this patch, became unhappy with the way balancing ioctl deals with invalid input, please send a separate patch. Before your patch, volumes.c had its own copy of div_factor_fine(): static u64 div_factor_fine(u64 num, int factor) { if (factor = 0) return 0; if (factor = 100) return num; num *= factor; do_div(num, 100); return num; } which was called from chunk_usage_filter() on unvalidated user input. As far as the cleanup part of your patch goes, you've dropped factor = 0 / factor = 100 logic, merged volumes.c's copy with extent-tree.c's copy and renamed div_factor_fine() to div_factor(). To make chunk_usage_filter() happy again, it's enough to move the dropped logic directly to the call site: static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, struct btrfs_balance_args *bargs) { ... - user_thresh = div_factor_fine(cache-key.offset, bargs-usage); + if (bargs-usage == 0) + user_thresh = 0; + else if (bargs-usage = 100) + user_thresh = cache-key.offset; + else + user_thresh = div_factor(cache-key.offset, bargs-usage); ... } So I would suggest you drop all hunks related to changing the way balancing ioctl works and make the above change to chunk_usage_filter() instead. Once again, if you are unhappy with usage filter argument handling, send a separate patch. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions
On Mon, Sep 24, 2012 at 06:47:42PM +0200, David Sterba wrote: On Mon, Sep 24, 2012 at 10:05:02AM +0800, Miao Xie wrote: Generally, we should check the value when it is input. If not, we might run our program with the wrong value, and it is possible to cause unknown problems. So I think the above chunk is necessary. The difference is that giving an invalid value will exit early (your version), while Ilya's version will clamp the 'usage' to the allowed range during processing. From usability POV, I'd prefer to let the command fail early with a verbose error eg. that the range is invalid. I think that's the job for progs, and that's where this and a few other checks are currently implemented. There is no possibility for unknown problems, that's exactly how it's been implemented before the cleanup. The purpose of balance filters (and the usage filter in particular) is to lower the number of chunks we have to relocate. If someone decides to use raw ioctls, and supplies us with the invalid value, we just cut it down to a 100 and proceed. usage=100 is the equivalent of not using the usage filter at all, so the raw ioctl user will get what he deserves. This is very similar to what we currently do with other filters. For example, soft can only be used with convert, and this is checked by progs. But, if somebody were to set a soft flag without setting convert we would simply ignore that soft. Same goes for drange and devid, invalid ranges, invalid devids, etc. Pulling all these checks into the kernel is questionable at best, and, if enough people insist on it, should be discussed separately. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions
On Fri, Sep 21, 2012 at 05:07:46PM +0800, Miao Xie wrote: div_factor{_fine} has been implemented for two times, and these two functions are very similar, so cleanup the reduplicate implement and drop the original div_factor(), and then rename div_factor_fine() to div_factor(). So the divisor of the new div_factor() is 100, not 10. And I move div_factor into a independent file named math.h because it is a common math function, may be used by every composition of btrfs. Because these functions are mostly used on the hot path, and we are sure the parameters are right in the most cases, we don't add complex checks for the parameters. But in the other place, we must check and make sure the parameters are right. So besides the code cleanup, this patch also add a check for the usage of the space balance, it is the only place that we need add check to make sure the parameters of div_factor are right till now. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v2 - v3: - drop the original div_factor and rename div_factor_fine to div_factor - drop the check of the factor Changelog v1 - v2: - add missing check --- fs/btrfs/extent-tree.c | 29 ++--- fs/btrfs/ioctl.c | 18 ++ fs/btrfs/math.h| 33 + fs/btrfs/relocation.c |2 +- fs/btrfs/transaction.c |2 +- fs/btrfs/volumes.c | 30 +- 6 files changed, 64 insertions(+), 50 deletions(-) create mode 100644 fs/btrfs/math.h diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a010234..bcb9ced 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,7 @@ #include volumes.h #include locking.h #include free-space-cache.h +#include math.h #undef SCRAMBLE_DELAYED_REFS @@ -648,24 +649,6 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info) rcu_read_unlock(); } -static u64 div_factor(u64 num, int factor) -{ - if (factor == 10) - return num; - num *= factor; - do_div(num, 10); - return num; -} - -static u64 div_factor_fine(u64 num, int factor) -{ - if (factor == 100) - return num; - num *= factor; - do_div(num, 100); - return num; -} - u64 btrfs_find_block_group(struct btrfs_root *root, u64 search_start, u64 search_hint, int owner) { @@ -674,7 +657,7 @@ u64 btrfs_find_block_group(struct btrfs_root *root, u64 last = max(search_hint, search_start); u64 group_start = 0; int full_search = 0; - int factor = 9; + int factor = 90; int wrapped = 0; again: while (1) { @@ -708,7 +691,7 @@ again: if (!full_search factor 10) { last = search_start; full_search = 1; - factor = 10; + factor = 100; goto again; } found: @@ -3513,7 +3496,7 @@ static int should_alloc_chunk(struct btrfs_root *root, if (force == CHUNK_ALLOC_LIMITED) { thresh = btrfs_super_total_bytes(root-fs_info-super_copy); thresh = max_t(u64, 64 * 1024 * 1024, -div_factor_fine(thresh, 1)); +div_factor(thresh, 1)); if (num_bytes - num_allocated thresh) return 1; @@ -3521,12 +3504,12 @@ static int should_alloc_chunk(struct btrfs_root *root, thresh = btrfs_super_total_bytes(root-fs_info-super_copy); /* 256MB or 2% of the FS */ - thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2)); + thresh = max_t(u64, 256 * 1024 * 1024, div_factor(thresh, 2)); /* system chunks need a much small threshold */ if (sinfo-flags BTRFS_BLOCK_GROUP_SYSTEM) thresh = 32 * 1024 * 1024; - if (num_bytes thresh sinfo-bytes_used div_factor(num_bytes, 8)) + if (num_bytes thresh sinfo-bytes_used div_factor(num_bytes, 80)) return 0; return 1; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9384a2a..d8d53f7 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3335,6 +3335,24 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg) goto do_balance; } + + if ((bargs-data.flags BTRFS_BALANCE_ARGS_USAGE) + (bargs-data.usage 0 || bargs-data.usage 100)) { + ret = -EINVAL; + goto out_bargs; + } + + if ((bargs-meta.flags BTRFS_BALANCE_ARGS_USAGE) + (bargs-meta.usage 0 || bargs-meta.usage 100)) { + ret = -EINVAL; + goto out_bargs; + } + + if ((bargs-sys.flags BTRFS_BALANCE_ARGS_USAGE) + (bargs-sys.usage 0 || bargs-sys.usage 100)) { +
Re: [PATCH] Btrfs-progs: btrfs subvolume delete could delete subvolumes
On Fri, Sep 21, 2012 at 02:54:08PM +0800, Anand jain wrote: From: Anand Jain anand.j...@oracle.com With this user will be able to provide more than one subvolume to delete. eg: btrfs subvolume delete subvol1 subvol2 Signed-off-by: Anand Jain anand.j...@oracle.com --- cmds-subvolume.c | 36 man/btrfs.8.in |4 ++-- 2 files changed, 26 insertions(+), 14 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index f4aa80f..cfeaa8d 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -188,31 +188,34 @@ int test_issubvolume(char *path) } static const char * const cmd_subvol_delete_usage[] = { - btrfs subvolume delete name, - Delete a subvolume, + btrfs subvolume delete subvolume [subvolume...], + Delete subvolume(s), NULL }; static int cmd_subvol_delete(int argc, char **argv) { - int res, fd, len, e; + int res, fd, len, e, cnt = 1, ret = 0; struct btrfs_ioctl_vol_args args; char*dname, *vname, *cpath; char*path; - if (check_argc_exact(argc, 2)) + if (argc 2) usage(cmd_subvol_delete_usage); check_argc_min(argc, 2) Haven't looked at the rest. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions
On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote: On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote: On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote: div_factor{_fine} has been implemented for two times, cleanup it. And I move them into a independent file named math.h because they are common math functions. You removed the sanity checks: - if (factor = 0) - return 0; - if (factor = 100) - return num; As inline functions, they should not contain complex checks, the caller should make sure the parameters are right. I think. div_factor_fine() in volumes.c is not inline, and is called from chunk_usage_filter() on unvalidated user input. If you think the caller should do those checks, you should move them to the caller as part of your patch. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Change small filesystem to normal
On Sun, Jul 22, 2012 at 05:06:24PM +0200, Swâmi Petaramesh wrote: Hi, I've created a small BTRFS filesystem, where metadata+data are mixed (and metadata are not DUP'ed). Then I've enlarged the FS to 1 GB ; now I'd like to make it normal with separate data and metadata, and DUP'ed metadata. Is there a way tp do this without reformatting the FS ? No, currently there is no way to do this. You'll have to create a new filesystem with mkfs.btrfs. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs fi df won't show total=
On Mon, Jul 09, 2012 at 10:06:24PM +0200, Jan Engelhardt wrote: On Monday 2012-07-09 21:25, Hugo Mills wrote: On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote: On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be observed: after a change of the profiles, total=,used= is no longer shown: 20:49 mmsrv1:~ # btrfs fi df /top.srv/ Data, RAID10: total=152.00GiB, used=121.07GiB System, RAID1: total=40.00MiB, used=44.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID1: total=112.00GiB, used=1.30GiB Metadata: total=8.00MiB, used=0.00 [...] 21:10 mmsrv1:~ # btrfs fi df /top.srv/ Data, RAID10: total=156.00GiB, used=124.35GiB System, RAID10: total=128.00MiB, used=48.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID10: total=112.00GiB, used=1.38GiB What's the problem here? You no longer have any RAID1 chunks, so it's not showing them. Rather tha a 4-line output, I would have expected this 6-line output that I would also get when mkfs'ing a new fresh btrfs volume with raid10 from the start: Data, RAID10: total=156.00GiB, used=124.35GiB Data: total=foo, used=bar System, RAID10: total=128.00MiB, used=48.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID10: total=112.00GiB, used=1.38GiB Metadata: total=foo, used=bar mkfs creates those non-raid chunks for a pretty stupid reason, they really shouldn't be there if you create a raid10 fs. Balance does the right thing and removes them. Fixing this along with another mkfs annoyance related to this one is on my TODO list. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times
On Thu, Jul 05, 2012 at 10:20:16AM -0700, Zach Brown wrote: On 07/05/2012 10:14 AM, Alexander Block wrote: On Thu, Jul 5, 2012 at 7:08 PM, Zach Brownz...@zabbo.net wrote: Careful, timespec will be different sizes in 32bit userspace and a 64bit kernel. I'd use btrfs_timespec to get a fixed size timespec and avoid all the compat_timespec noise. (I'd then also worry about padding and might pack the struct.. I always lose track of the best practice across all archs.) Hmm we currently don't have ctree.h in ioctl.h. Can I include it there or are there problems with that? As an alternative I could define my own struct for that. Hmm, yeah, it looks like ioctl.h is well isolated and doesn't really have a precedent for pulling in format bits from the kernel implementation. I'd do as you suggested and just make its own ioctl_timespec with a comment that its duplicating other similar structures to keep ioctl.h from getting tangled up in the kernel-side includes. This has been done for restriper, see struct btrfs_balance_args vs struct btrfs_disk_balance_args. You could do the same thing: struct btrfs_ioctl_timespec { __u64 sec; __u32 nsec; } __attribute__ ((__packed__)); and take endianess into account with le{64,32}_to_cpu and cpu_to_le{64,32} macros. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times
On Thu, Jul 05, 2012 at 11:37:40AM -0700, Zach Brown wrote: and take endianess into account with le{64,32}_to_cpu and cpu_to_le{64,32} macros. The kernel doesn't support system calls from userspace of a different endianness, no worries there :) What if you are on a big-endian machine with a big-endian kernel and userspace? Everything on-disk should be little-endian, so if you are going to write stuff you got from userspace to disk, at some point you have to make sure you are writing out bytes in the right order. Alex already does that, so my remarks are moot ;) + root_item-stime.sec = cpu_to_le64(sa-stime.tv_sec); + root_item-stime.nsec = cpu_to_le64(sa-stime.tv_nsec); + root_item-rtime.sec = cpu_to_le64(sa-rtime.tv_sec); + root_item-rtime.nsec = cpu_to_le64(sa-rtime.tv_nsec); Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times
On Thu, Jul 05, 2012 at 09:18:41PM +0200, Alexander Block wrote: On Thu, Jul 5, 2012 at 9:01 PM, Zach Brown z...@zabbo.net wrote: On 07/05/2012 11:59 AM, Ilya Dryomov wrote: What if you are on a big-endian machine with a big-endian kernel and userspace? Everything on-disk should be little-endian, so if you are going to write stuff you got from userspace to disk, at some point you have to make sure you are writing out bytes in the right order. Alex already does that, so my remarks are moot ;) Yeah, indeed, we were only talking about the ioctl interface crossing the user-kernel barrier :). I decided to not use __leXX variables in the new btrfs_ioctl_timespec structure because as most arguments found ioctls are currently in cpu dependent endianess. The kernel will then do the endianess conversion as it already did with struct timespec. That's exactly the point of adding btrfs_ioctl_timespec instead of just copying btrfs_timespec definition. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Btrfs-progs: make get default report correctly
On Mon, Jul 02, 2012 at 09:39:45AM +0800, Liu Bo wrote: On 06/29/2012 06:36 PM, Ilya Dryomov wrote: On Fri, Jun 29, 2012 at 06:00:37PM +0800, Liu Bo wrote: We have both set-default and get-default, but actually our get-default does not show which one is the default one. This patch plus the kernel side patch fix this. Are you referring to the fact that get-default in Chris' btrfs-progs master lists all subvolumes instead of showing the default one? If so, I fixed that problem a while ago, patch is in Hugo's integration branch. So can you please show me a link about your patch? I'm sorry, it turns out Hugo hasn't pushed it yet. Here is a link: http://thread.gmane.org/gmane.comp.file-systems.btrfs/16187/ Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Btrfs-progs: add support to set subvolume/snapshot readonly
On Mon, Jul 02, 2012 at 10:07:42AM +0800, Liu Bo wrote: On 06/29/2012 06:21 PM, Ilya Dryomov wrote: On Fri, Jun 29, 2012 at 06:00:36PM +0800, Liu Bo wrote: Setting subvolume/snapshot readonly has been missing for a long time. With this patch, we can set a subvolume/snapshot readonly via: obtrfs subvolume set-ro path Alexander's 'btrfs property' patches do exactly this, but in a much more generic and extensible way. 'btrfs property' subgroup provides a uniform interface for getting and setting properties of filesystem objects in general, not only those of subvolumes and snapshots. It provides a much better user interface, and it also allows us to easily rethink kernel-user interface for generic get/set in future. Thanks for the explanation! But I prefer keeping the current categories {subvolume,filesystem,device,...}: o Compatibility, we cannot remove the old commands until we make sure that no users will use them. We are not going to remove old commands any time soon. However, adding new ones that clearly fall into get/set category, is not a good idea. Especially when there is a generic interface on its way. o We've three properties {default, readonly, lable}, is it worthy making another new interface? It's not just about subvolumes. There will be a lot more properties on the table as filesystem matures, for example device speeds, subvolume profiles, quotas. o Current categories are clear and clean. Once again, it's not just about subvolumes. Current categories are indeed clear, but adding two commands for each non-trivial property that comes up in future does not seem practical to me. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: no dev_stats entry found / OK on first mount after mkfs
On Mon, Jul 02, 2012 at 02:46:11PM -0700, Christian Kujau wrote: Hi, after upgrading from 3.4.0 to 3.5.0-rc5 on this powerpc machine, the following is printed during bootup: [ 18.630750] device fsid ce8c9df5-0a93-47c6-adf6-25084f352a4f devid 1 transid 11061 /dev/hda7 [ 18.637193] btrfs: disk space caching is enabled [ 18.706423] btrfs: no dev_stats entry found for device /dev/hda7 (devid 1) (OK on first mount after mkfs) The btrfs on hda7 has been created many months ago and has been mounted several times since then. Assuming first mount after mkfs does not apply here, is it then NOT OK that no dev_stats entry has been found? IOW: should I worry about this message? No, you should not. If this is your first mount after upgrading your kernel to the one with btrfs dev stats feature, first mount after mkfs sort of applies. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Btrfs-progs: add support to set subvolume/snapshot readonly
On Fri, Jun 29, 2012 at 06:00:36PM +0800, Liu Bo wrote: Setting subvolume/snapshot readonly has been missing for a long time. With this patch, we can set a subvolume/snapshot readonly via: obtrfs subvolume set-ro path Alexander's 'btrfs property' patches do exactly this, but in a much more generic and extensible way. 'btrfs property' subgroup provides a uniform interface for getting and setting properties of filesystem objects in general, not only those of subvolumes and snapshots. It provides a much better user interface, and it also allows us to easily rethink kernel-user interface for generic get/set in future. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Btrfs-progs: make get default report correctly
On Fri, Jun 29, 2012 at 06:00:37PM +0800, Liu Bo wrote: We have both set-default and get-default, but actually our get-default does not show which one is the default one. This patch plus the kernel side patch fix this. Are you referring to the fact that get-default in Chris' btrfs-progs master lists all subvolumes instead of showing the default one? If so, I fixed that problem a while ago, patch is in Hugo's integration branch. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: restore restriper state on all mounts
First of all, thanks for reviewing! On Tue, Jun 26, 2012 at 06:17:39PM +0200, David Sterba wrote: On Fri, Jun 22, 2012 at 09:24:12PM +0300, Ilya Dryomov wrote: Fix a bug that triggered asserts in btrfs_balance() in both normal and resume modes -- restriper state was not properly restored on read-only mounts. This factors out resuming code from btrfs_restore_balance(), which is now also called earlier in the mount sequence to avoid the problem of some early writes getting the old profile. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 77872da..dae7cd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2492,9 +2497,6 @@ retry_root_backup: err = btrfs_orphan_cleanup(fs_info-tree_root); up_read(fs_info-cleanup_work_sem); - if (!err) - err = btrfs_recover_balance(fs_info-tree_root); - if (err) { close_ctree(tree_root); return err; @@ -2518,6 +2520,9 @@ fail_cleaner: fail_block_groups: btrfs_free_block_groups(fs_info); +fail_balance_ctl: + kfree(fs_info-balance_ctl); I think you need to set fs_info-balance_ctl to NULL, otherwise this could lead to double free from free_fs_info. I was looking along the Yes, I do. I meant to call unset_balance_control(fs_info) there, but changed it to kfree(), because of the BUG_ON() in the former. unset_balance_control(), of course, sets -balance_ctl to NULL ;) call paths and didn't see free_fs_info called on the mount failure path: vfs-mount btrfs_mount btrfs_fill_super open_ctree (recover balance fails, frees ctl) error is propagated back to vfs, no other fs callback is done (like kill_super which does call free_fs_info). The only exit path that is not going through free_fs_info is after error from btrfs_fill_super, and this can fail from various reasons. Either I'm missing something, or we leak a btrfs_fs_info every time a mount fails ... No, we don't, you just missed it. It's freed from btrfs_kill_super(), which is called from deactivate_locked_super() after btrfs_fill_super() errors out. Back to your patch, apart from the balance_ctl pointer reset, both are ok and given the number of bug reports [useless padding text here] this should go to 3.5-rc. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not return EINVAL instead of ENOMEM from open_ctree()
When bailing from open_ctree() err is returned, not ret. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e1890b1..f06db81 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2244,7 +2244,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); if (ret) { - ret = -ENOMEM; + err = -ENOMEM; goto fail_sb_buffer; } -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting
There used to be a BUG_ON(ret) there before EH patch (79787eaa) went in. Bail out with EINVAL. Cc: David Sterba dste...@suse.cz Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f06db81..77872da 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2461,8 +2461,8 @@ retry_root_backup: if (!(sb-s_flags MS_RDONLY)) { ret = btrfs_cleanup_fs_roots(fs_info); - if (ret) { - } + if (ret) + goto fail_trans_kthread; ret = btrfs_recover_relocation(tree_root); if (ret 0) { -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: resume balance on rw (re)mounts properly
This introduces btrfs_resume_balance_async(), which, given that restriper state was recovered earlier by btrfs_recover_balance(), resumes balance in btrfs-balance kthread. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c | 24 +++- fs/btrfs/super.c |4 fs/btrfs/volumes.c | 36 +++- fs/btrfs/volumes.h |1 + 4 files changed, 47 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dae7cd6..e863f58 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2490,17 +2490,23 @@ retry_root_backup: goto fail_trans_kthread; } - if (!(sb-s_flags MS_RDONLY)) { - down_read(fs_info-cleanup_work_sem); - err = btrfs_orphan_cleanup(fs_info-fs_root); - if (!err) - err = btrfs_orphan_cleanup(fs_info-tree_root); + if (sb-s_flags MS_RDONLY) + return 0; + + down_read(fs_info-cleanup_work_sem); + if ((ret = btrfs_orphan_cleanup(fs_info-fs_root)) || + (ret = btrfs_orphan_cleanup(fs_info-tree_root))) { up_read(fs_info-cleanup_work_sem); + close_ctree(tree_root); + return ret; + } + up_read(fs_info-cleanup_work_sem); - if (err) { - close_ctree(tree_root); - return err; - } + ret = btrfs_resume_balance_async(fs_info); + if (ret) { + printk(KERN_WARNING btrfs: failed to resume balance\n); + close_ctree(tree_root); + return ret; } return 0; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0eb9a4d..e239915 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1187,6 +1187,10 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) if (ret) goto restore; + ret = btrfs_resume_balance_async(fs_info); + if (ret) + goto restore; + sb-s_flags = ~MS_RDONLY; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f7649b9..0f1778c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2845,28 +2845,46 @@ out: static int balance_kthread(void *data) { - struct btrfs_balance_control *bctl = - (struct btrfs_balance_control *)data; - struct btrfs_fs_info *fs_info = bctl-fs_info; + struct btrfs_fs_info *fs_info = data; int ret = 0; mutex_lock(fs_info-volume_mutex); mutex_lock(fs_info-balance_mutex); - set_balance_control(bctl); - - if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) { - printk(KERN_INFO btrfs: force skipping balance\n); - } else { + if (fs_info-balance_ctl) { printk(KERN_INFO btrfs: continuing balance\n); - ret = btrfs_balance(bctl, NULL); + ret = btrfs_balance(fs_info-balance_ctl, NULL); } mutex_unlock(fs_info-balance_mutex); mutex_unlock(fs_info-volume_mutex); + return ret; } +int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info) +{ + struct task_struct *tsk; + + spin_lock(fs_info-balance_lock); + if (!fs_info-balance_ctl) { + spin_unlock(fs_info-balance_lock); + return 0; + } + spin_unlock(fs_info-balance_lock); + + if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) { + printk(KERN_INFO btrfs: force skipping balance\n); + return 0; + } + + tsk = kthread_run(balance_kthread, fs_info, btrfs-balance); + if (IS_ERR(tsk)) + return PTR_ERR(tsk); + + return 0; +} + int btrfs_recover_balance(struct btrfs_fs_info *fs_info) { struct btrfs_balance_control *bctl; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index e1b1a64..95f6637 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -281,6 +281,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_root *root, char *path); int btrfs_balance(struct btrfs_balance_control *bctl, struct btrfs_ioctl_balance_args *bargs); +int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info); int btrfs_recover_balance(struct btrfs_fs_info *fs_info); int btrfs_pause_balance(struct btrfs_fs_info *fs_info); int btrfs_cancel_balance(struct btrfs_fs_info *fs_info); -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: restore restriper state on all mounts
Fix a bug that triggered asserts in btrfs_balance() in both normal and resume modes -- restriper state was not properly restored on read-only mounts. This factors out resuming code from btrfs_restore_balance(), which is now also called earlier in the mount sequence to avoid the problem of some early writes getting the old profile. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/disk-io.c | 15 ++- fs/btrfs/volumes.c | 39 +++ fs/btrfs/volumes.h |2 +- 3 files changed, 30 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 77872da..dae7cd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2354,17 +2354,22 @@ retry_root_backup: BTRFS_CSUM_TREE_OBJECTID, csum_root); if (ret) goto recovery_tree_root; - csum_root-track_dirty = 1; fs_info-generation = generation; fs_info-last_trans_committed = generation; + ret = btrfs_recover_balance(fs_info); + if (ret) { + printk(KERN_WARNING btrfs: failed to recover balance\n); + goto fail_tree_roots; + } + ret = btrfs_init_dev_stats(fs_info); if (ret) { printk(KERN_ERR btrfs: failed to init dev_stats: %d\n, ret); - goto fail_block_groups; + goto fail_balance_ctl; } ret = btrfs_init_space_info(fs_info); @@ -2492,9 +2497,6 @@ retry_root_backup: err = btrfs_orphan_cleanup(fs_info-tree_root); up_read(fs_info-cleanup_work_sem); - if (!err) - err = btrfs_recover_balance(fs_info-tree_root); - if (err) { close_ctree(tree_root); return err; @@ -2518,6 +2520,9 @@ fail_cleaner: fail_block_groups: btrfs_free_block_groups(fs_info); +fail_balance_ctl: + kfree(fs_info-balance_ctl); + fail_tree_roots: free_root_pointers(fs_info, 1); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8a3d259..f7649b9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2867,9 +2867,8 @@ static int balance_kthread(void *data) return ret; } -int btrfs_recover_balance(struct btrfs_root *tree_root) +int btrfs_recover_balance(struct btrfs_fs_info *fs_info) { - struct task_struct *tsk; struct btrfs_balance_control *bctl; struct btrfs_balance_item *item; struct btrfs_disk_balance_args disk_bargs; @@ -2882,29 +2881,30 @@ int btrfs_recover_balance(struct btrfs_root *tree_root) if (!path) return -ENOMEM; - bctl = kzalloc(sizeof(*bctl), GFP_NOFS); - if (!bctl) { - ret = -ENOMEM; - goto out; - } - key.objectid = BTRFS_BALANCE_OBJECTID; key.type = BTRFS_BALANCE_ITEM_KEY; key.offset = 0; - ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0); + ret = btrfs_search_slot(NULL, fs_info-tree_root, key, path, 0, 0); if (ret 0) - goto out_bctl; + goto out; if (ret 0) { /* ret = -ENOENT; */ ret = 0; - goto out_bctl; + goto out; + } + + bctl = kzalloc(sizeof(*bctl), GFP_NOFS); + if (!bctl) { + ret = -ENOMEM; + goto out; } leaf = path-nodes[0]; item = btrfs_item_ptr(leaf, path-slots[0], struct btrfs_balance_item); - bctl-fs_info = tree_root-fs_info; - bctl-flags = btrfs_balance_flags(leaf, item) | BTRFS_BALANCE_RESUME; + bctl-fs_info = fs_info; + bctl-flags = btrfs_balance_flags(leaf, item); + bctl-flags |= BTRFS_BALANCE_RESUME; btrfs_balance_data(leaf, item, disk_bargs); btrfs_disk_balance_args_to_cpu(bctl-data, disk_bargs); @@ -2913,14 +2913,13 @@ int btrfs_recover_balance(struct btrfs_root *tree_root) btrfs_balance_sys(leaf, item, disk_bargs); btrfs_disk_balance_args_to_cpu(bctl-sys, disk_bargs); - tsk = kthread_run(balance_kthread, bctl, btrfs-balance); - if (IS_ERR(tsk)) - ret = PTR_ERR(tsk); - else - goto out; + mutex_lock(fs_info-volume_mutex); + mutex_lock(fs_info-balance_mutex); -out_bctl: - kfree(bctl); + set_balance_control(bctl); + + mutex_unlock(fs_info-balance_mutex); + mutex_unlock(fs_info-volume_mutex); out: btrfs_free_path(path); return ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 74366f2..e1b1a64 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -281,7 +281,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_root *root, char *path); int btrfs_balance(struct btrfs_balance_control *bctl, struct
[PATCH 0/2] Restriper restore bugfix
This fixes restriper restore bug which triggered asserts for users with interrupted balances on rootfs btrfs. Thanks, Ilya Ilya Dryomov (2): Btrfs: restore restriper state on all mounts Btrfs: resume balance on rw (re)mounts properly fs/btrfs/disk-io.c | 39 ++-- fs/btrfs/super.c |4 +++ fs/btrfs/volumes.c | 73 fs/btrfs/volumes.h |3 +- 4 files changed, 76 insertions(+), 43 deletions(-) -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] Btrfs: count the chunks which will be relocated at first
On Thu, May 17, 2012 at 07:56:53PM +0800, Miao Xie wrote: the balance function should count the chunks which will be relocated at first, and then relocate those chunks one by one. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/volumes.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 759d024..91da8a2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2580,7 +2580,7 @@ again: chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); - if (!counting) { + if (counting) { spin_lock(fs_info-balance_lock); bctl-stat.considered++; spin_unlock(fs_info-balance_lock); __btrfs_balance() already calculates the approximate number of chunks that will be relocated and stores that value in bctl-stat.expected. The stat.considered counter OTOH is supposed to reflect the number of chunks processed through balance filters and it is meant to be updated at relocation pass, so AFAICS if (!counting) is the right test. What exactly are you trying to fix here ? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] Btrfs: pause/recover the space balance when doing remount
On Thu, May 17, 2012 at 07:57:40PM +0800, Miao Xie wrote: pause the space balance threads when remounting the fs to be readonly, and recover it when remounting it from r/o to r/w Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/super.c |9 - fs/btrfs/volumes.c |8 +++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7deb00e..ea17f0a 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1148,6 +1148,9 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) if (*flags MS_RDONLY) { sb-s_flags |= MS_RDONLY; + /* pause restriper - we want to resume on remount to r/w */ + btrfs_pause_balance(root-fs_info); + ret = btrfs_commit_super(root); if (ret) goto restore; @@ -1174,7 +1177,10 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) if (ret) goto restore; - sb-s_flags = ~MS_RDONLY; + if (sb-s_flags MS_RDONLY) { + sb-s_flags = ~MS_RDONLY; + btrfs_recover_balance(fs_info-tree_root); + } } return 0; @@ -1190,6 +1196,7 @@ restore: fs_info-alloc_start = old_alloc_start; fs_info-thread_pool_size = old_thread_pool_size; fs_info-metadata_ratio = old_metadata_ratio; + return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 91da8a2..c536d52 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2833,7 +2833,13 @@ static int balance_kthread(void *data) mutex_lock(fs_info-volume_mutex); mutex_lock(fs_info-balance_mutex); - set_balance_control(bctl); + if (fs_info-balance_ctl) { + kfree(bctl); + bctl = fs_info-balance_ctl; + bctl-flags = bctl-flags | BTRFS_BALANCE_RESUME; + } else { + set_balance_control(bctl); + } if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) { printk(KERN_INFO btrfs: force skipping balance\n); This is a known bug. There is a deeper problem here, related to the fact that we restore balancing state not early enough and that we don't restore it on ro mounts at all. I have a patch in the works to fix that problem, and it also fixes this one the right way. I'm backed up with other things right now, but I'll post it as soon as I get a chance. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Btrfs: fix memory leak in btrfs_pause_balance()
On Thu, May 17, 2012 at 07:58:53PM +0800, Miao Xie wrote: We forget to free fs_info-balance_ctl in the btrfs_pause_balance() when umounting the fs. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/volumes.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c536d52..fd7fe80 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2937,6 +2937,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info) ret = -ENOTCONN; } + if (btrfs_fs_closing(fs_info) fs_info-balance_ctl) + unset_balance_control(fs_info); + mutex_unlock(fs_info-balance_mutex); return ret; } It is kfree()'d in free_fs_info(), which should be called on unmount. Am I missing something here ? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: failed disk (was: kernel 3.3.4 damages filesystem (?))
On Wed, May 09, 2012 at 03:37:35PM +0100, Hugo Mills wrote: On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote: Du meintest am 07.05.12: [...] With a file system like ext2/3/4 I can work with several directories which are mounted together, but (as said before) one broken disk doesn't disturb the others. mkfs.btrfs -m raid1 -d single should give you that. Just a small bug, perhaps: created a system with mkfs.btrfs -m raid1 -d single /dev/sdl1 mount /dev/sdl1 /mnt/Scsi btrfs device add /dev/sdk1 /mnt/Scsi btrfs device add /dev/sdm1 /mnt/Scsi (filling with data) and btrfs fi df /mnt/Scsi now tells Data, RAID0: total=183.18GB, used=76.60GB Data: total=80.01GB, used=79.83GB System, DUP: total=8.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=192.74MB Metadata: total=8.00MB, used=0.00 -- Data, RAID0 confuses me (not very much ...), and the system for metadata (RAID1) is not told. DUP is two copies of each block, but it allows the two copies to live on the same device. It's done this because you started with a single device, and you can't do RAID-1 on one device. The first bit of What Hugo said. Newer mkfs.btrfs will error out if you try to do this. metadata you write to it should automatically upgrade the DUP chunk to RAID-1. We don't upgrade chunks in place, only during balance. As to the spurious upgrade of single to RAID-0, I thought Ilya had stopped it doing that. What kernel version are you running? I did, but again, we were doing it only as part of balance, not as part of normal operation. Helmut, do you have any additional data points - the output of btrfs fi df right after you created FS or somewhere in the middle of filling it ? Also could you please paste the output of btrfs fi show and tell us what kernel version you are running ? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it possible to reclaim block groups once they are allocated to data or metadata?
On Sun, May 06, 2012 at 01:07:06PM +1000, Mike Sampson wrote: On Sat, May 5, 2012 at 10:52 PM, Hugo Mills h...@carfax.org.uk wrote: On Sat, May 05, 2012 at 10:37:01PM +1000, Mike Sampson wrote: Hello list, recently reformatted my home partition from XFS to RAID1 btrfs. I used the default options to mkfs.btrfs except for enabling raid1 for data as well as metadata. Filesystem is made up of two 1TB drives. mike@mercury (0) pts/3 ~ $ sudo btrfs filesystem show Label: none uuid: f08a8896-e03e-4064-9b94-9342fb547e47 Total devices 2 FS bytes used 888.06GB devid 1 size 931.51GB used 931.51GB path /dev/sdb1 devid 2 size 931.51GB used 931.49GB path /dev/sdc1 Btrfs Btrfs v0.19 mike@mercury (0) pts/3 ~ $ btrfs filesystem df /home Data, RAID1: total=893.48GB, used=886.60GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=136.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=38.00GB, used=2.54GB Metadata: total=8.00MB, used=0.00 As can be seen I don't have a lot of free space left and while I am planning on adding more storage soon I would like to gain a little breathing room until I can do this. While I don't have a lot of space remaining in Data, RAID1 I do have a good chunk in Metadata, RAID1. 2.5GB used out of 38GB. Does this excess become available automatically to the file system when the block groups in Data, RAID1 are exhausted or, if not, is there a way to manually reallocate them? Youre best bet at the moment is to try a partial balance of metadata chunks: # btrfs balance start -m /home That will rewrite all of your metadata, putting it through the allocator again, and removing the original allocated chunks. This should have the effect of reducing the allocation of metadata chunks. You will need a 3.3 kernel, or later, and an up-to-date userspace from cmason's git repository. Gave this a shot and it did help. mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ uname -r 3.3.4-2-ARCH mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ sudo ./btrfs balance start -m /home Done, had to relocate 40 out of 934 chunks mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ ./btrfs filesystem df /home Data, RAID1: total=900.97GB, used=880.06GB Data: total=8.00MB, used=0.00 System, RAID1: total=32.00MB, used=136.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=30.50GB, used=2.54GB There is now 8GB less in Metadata and I was able to delete some files as well to free up space. There is still a lot of wasted space in the metadata block groups. It seems that it allocates more metadata block groups than required for my filesystem. This will do until I am able to add a couple of devices to the system. Is there anyway to adjust the block group allocation strategy at filesystem creation? No. Chunk allocator currently allocates a lot more chunks than actually needed, and it impacts both balancing and normal operation. Try this: # btrfs balance start -musage=10 /home This is suboptimal, but it should get rid of more chunks. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it possible to reclaim block groups once they are allocated to data or metadata?
On Sun, May 06, 2012 at 11:37:27AM +0100, Hugo Mills wrote: On Sun, May 06, 2012 at 01:26:45PM +0300, Ilya Dryomov wrote: On Sun, May 06, 2012 at 01:07:06PM +1000, Mike Sampson wrote: There is now 8GB less in Metadata and I was able to delete some files as well to free up space. There is still a lot of wasted space in the metadata block groups. It seems that it allocates more metadata block groups than required for my filesystem. This will do until I am able to add a couple of devices to the system. Is there anyway to adjust the block group allocation strategy at filesystem creation? No. Chunk allocator currently allocates a lot more chunks than actually needed, and it impacts both balancing and normal operation. Try this: # btrfs balance start -musage=10 /home This is suboptimal, but it should get rid of more chunks. While we're talking about it, what is the parameter to the usage option? I'm assuming it selects chunks which are less than some amount full -- but is the value a percentage, or a quantity in megabytes (power-of-10 or power-of-2), or something else? It's a percentage, so the command above will balance out chunks that are less than 10 percent full. I'll update btrfs man page and the wiki page you started as soon as I can. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing metadata fails with no space left on device
On Sun, May 06, 2012 at 01:19:38PM +0200, Martin Steigerwald wrote: Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald: Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald: Hi! merkaba:~ btrfs balance start -m / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail merkaba:~#19 dmesg | tail -22 [ 62.918734] CPU0: Package power limit normal [ 525.229976] btrfs: relocating block group 20422066176 flags 1 [ 526.940452] btrfs: found 3048 extents [ 528.803778] btrfs: found 3048 extents […] [ 635.906517] btrfs: found 1 extents [ 636.038096] btrfs: 1 enospc errors during balance merkaba:~ btrfs filesystem show failed to read /dev/sr0 Label: 'debian' uuid: […] Total devices 1 FS bytes used 7.89GB devid1 size 18.62GB used 17.58GB path /dev/dm-0 Btrfs Btrfs v0.19 merkaba:~ btrfs filesystem df / Data: total=15.52GB, used=7.31GB System, DUP: total=32.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=587.83MB I thought data tree might have been to big, so out of curiousity I tried a full balance. It shrunk the data tree but it failed as well: merkaba:~ btrfs balance start / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail merkaba:~#19 dmesg | tail -63 [ 89.306718] postgres (2876): /proc/2876/oom_adj is deprecated, please use /proc/2876/oom_score_adj instead. [ 159.939728] btrfs: relocating block group 21994930176 flags 34 [ 160.010427] btrfs: relocating block group 21860712448 flags 1 [ 161.188104] btrfs: found 6 extents [ 161.507388] btrfs: found 6 extents […] [ 335.897953] btrfs: relocating block group 1103101952 flags 1 [ 347.888295] btrfs: found 28458 extents [ 352.736987] btrfs: found 28458 extents [ 353.099659] btrfs: 1 enospc errors during balance merkaba:~ btrfs filesystem df / Data: total=10.00GB, used=7.31GB System, DUP: total=64.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.12GB, used=587.20MB merkaba:~ btrfs filesystem show failed to read /dev/sr0 Label: 'debian' uuid: […] Total devices 1 FS bytes used 7.88GB devid1 size 18.62GB used 12.38GB path /dev/dm-0 For the sake of it I tried another time. It failed again: martin@merkaba:~ dmesg | tail -32 [ 353.099659] btrfs: 1 enospc errors during balance [ 537.057375] btrfs: relocating block group 32833011712 flags 36 […] [ 641.479140] btrfs: relocating block group 22062039040 flags 34 [ 641.695614] btrfs: relocating block group 22028484608 flags 34 [ 641.840179] btrfs: found 1 extents [ 641.965843] btrfs: 1 enospc errors during balance merkaba:~#19 btrfs filesystem df / Data: total=10.00GB, used=7.31GB System, DUP: total=32.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.12GB, used=586.74MB merkaba:~ btrfs filesystem show failed to read /dev/sr0 Label: 'debian' uuid: […] Total devices 1 FS bytes used 7.88GB devid1 size 18.62GB used 12.32GB path /dev/dm-0 Btrfs Btrfs v0.19 Well, in order to be gentle to the SSD again I stop my experiments now ;). I had subjective impression that the speed of the BTRFS filesystem decreased after all these Anyway, after reading the a -musage hint by Ilya in thread Is it possible to reclaim block groups once they ar allocated to data or metadata? Currently there is no way to reclaim block groups other than performing a balance. We will add a kernel thread for this in future, but a couple of things have to be fixed before that can happen. I tried: merkaba:~ btrfs filesystem df / Data: total=10.00GB, used=7.34GB System, DUP: total=32.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.12GB, used=586.39MB merkaba:~ btrfs balance start -musage=1 / Done, had to relocate 2 out of 13 chunks merkaba:~ btrfs filesystem df / Data: total=10.00GB, used=7.34GB System, DUP: total=32.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=586.39MB So this worked. But I wasn´t able to specify less than a Gig: A follow up to the -musage hint says that the argument it takes is the percentage. That is -musage=X will balance out block groups that are less than X percent used. merkaba:~ btrfs balance start -musage=0.8 / Invalid usage argument: 0.8 merkaba:~#1 btrfs balance start -musage=700M / Invalid usage argument: 700M When I try without usage I get the old behavior back: merkaba:~#1 btrfs balance start -m / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail merkaba:~ btrfs balance start -musage=1 / Done, had to relocate 2 out of 13 chunks
Re: 'filesystem resize max' tries to use devid 1
On Tue, Apr 24, 2012 at 11:31:15AM -0400, Josef Bacik wrote: On Sun, Apr 22, 2012 at 11:03:06PM -0400, Jeremy Atkins wrote: Back story: I started my pool with a 200gb partition at the end of my drive (sdc5) , until I was able to clear out the data at the beginning of my drive. When I was ready, I ran `btrfs dev add /dev/sdc4 /` then `btrfs dev del /dev/sdc5 /`, $ sudo btrfs fi resize max / Resize '/' of 'max' ERROR: unable to resize '/' - Invalid argument in dmesg i see: [ 72.729685] btrfs: resizer unable to find device 1 $ sudo btrfs fi df / Data: total=34.00GB, used=17.79GB System, DUP: total=64.00MB, used=20.00KB Metadata, DUP: total=33.50GB, used=394.93MB $ sudo btrfs fi sh Label: none uuid: b0ad55e2-09e0-4658-8cab-d2e11ba03753 Total devices 1 FS bytes used 17.18GB devid 2 size 1.62TB used 101.12GB path /dev/sdc4 $ uname -r 3.3.2-1.fc16.x86_64 btrfs-progs version is current git master ( commit 1957076 ) After writing this email, and searching around the wiki some, I discovered the command to resize specific devids, [antrat@tbox ~]$ sudo btrfs fi resize 2:max / Resize '/' of '2:max' and in dmesg: [ 1661.933884] btrfs: resizing devid 2 [ 1661.933895] btrfs: new size for /dev/sdc4 is 1995564908544 Hrm, if we're using filesystem resize max we should probably just max out all the devices and not require one specific device, what do you think Ilya? I'd keep the devid:max syntax for maxing out a single device and add a new fs:max or similar to resize all devices. ABI here is just a string, so I think it can be easily done. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'filesystem resize max' tries to use devid 1
On Tue, Apr 24, 2012 at 11:42:26AM -0400, Josef Bacik wrote: On Tue, Apr 24, 2012 at 06:39:03PM +0300, Ilya Dryomov wrote: On Tue, Apr 24, 2012 at 11:31:15AM -0400, Josef Bacik wrote: On Sun, Apr 22, 2012 at 11:03:06PM -0400, Jeremy Atkins wrote: Back story: I started my pool with a 200gb partition at the end of my drive (sdc5) , until I was able to clear out the data at the beginning of my drive. When I was ready, I ran `btrfs dev add /dev/sdc4 /` then `btrfs dev del /dev/sdc5 /`, $ sudo btrfs fi resize max / Resize '/' of 'max' ERROR: unable to resize '/' - Invalid argument in dmesg i see: [ 72.729685] btrfs: resizer unable to find device 1 $ sudo btrfs fi df / Data: total=34.00GB, used=17.79GB System, DUP: total=64.00MB, used=20.00KB Metadata, DUP: total=33.50GB, used=394.93MB $ sudo btrfs fi sh Label: none uuid: b0ad55e2-09e0-4658-8cab-d2e11ba03753 Total devices 1 FS bytes used 17.18GB devid 2 size 1.62TB used 101.12GB path /dev/sdc4 $ uname -r 3.3.2-1.fc16.x86_64 btrfs-progs version is current git master ( commit 1957076 ) After writing this email, and searching around the wiki some, I discovered the command to resize specific devids, [antrat@tbox ~]$ sudo btrfs fi resize 2:max / Resize '/' of '2:max' and in dmesg: [ 1661.933884] btrfs: resizing devid 2 [ 1661.933895] btrfs: new size for /dev/sdc4 is 1995564908544 Hrm, if we're using filesystem resize max we should probably just max out all the devices and not require one specific device, what do you think Ilya? I'd keep the devid:max syntax for maxing out a single device and add a new fs:max or similar to resize all devices. ABI here is just a string, so I think it can be easily done. Right I think the devid:max thing is handy for people who want to do specific things, but I think just a normal btrfs fi resize max should do all devices for simplicity and easy of use, no need for a fs:max. Thanks, Sorry, I misread. 'btrfs fi resize max mnt' should of course resize the entire FS. I think this popped up before, at least a couple times. I'll do it if nobody else wants to. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix max chunk size check in chunk allocator
Fix a bug, where in case we need to adjust stripe_size so that the length of the resulting chunk is less than or equal to max_chunk_size, DUP chunks turn out to be only half as big as they could be. Cc: Arne Jansen sensi...@gmx.net Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a872b48..c462d84 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3324,12 +3324,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, stripe_size = devices_info[ndevs-1].max_avail; num_stripes = ndevs * dev_stripes; - if (stripe_size * num_stripes max_chunk_size * ncopies) { + if (stripe_size * ndevs max_chunk_size * ncopies) { stripe_size = max_chunk_size * ncopies; - do_div(stripe_size, num_stripes); + do_div(stripe_size, ndevs); } do_div(stripe_size, dev_stripes); + + /* align to BTRFS_STRIPE_LEN */ do_div(stripe_size, BTRFS_STRIPE_LEN); stripe_size *= BTRFS_STRIPE_LEN; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:2733
On Fri, Mar 30, 2012 at 10:44:05PM +0200, Sander wrote: Ilya Dryomov wrote (ao): On Fri, Mar 30, 2012 at 07:49:56PM +0200, Sander wrote: Thanks. btrfs-debug-tree confirms that you've got a balance item on media. After that mount it back and see if there is btrfs: continuing balance line in dmesg (and if btrfs-balance kthread shows up)? There is no such line in dmesg, and currently no btrfs-balance kthread is running. I've pulled Chris Masons for-linus and booted with the resulting kernel. And given the above it's weird. We are failing to locate the item during mount for some reason and I would like to find out why. So if you are up for running debugging patches (really just compiling btrfs module and sending me dmesg output) I would appreciate that. Sure, please send me patches. I'm sorry this took a week, I was backed up. If you still have that fs around in that state, could you please apply the patch below, mount it and send me dmesg output ? (no need to run balance or anything, just mount) Thanks, Ilya diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 20196f4..86fa082 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1867,6 +1867,7 @@ int open_ctree(struct super_block *sb, csum_root = fs_info-csum_root = btrfs_alloc_root(fs_info); chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info); dev_root = fs_info-dev_root = btrfs_alloc_root(fs_info); +printk(open_ctree\n); if (!tree_root || !extent_root || !csum_root || !chunk_root || !dev_root) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a872b48..2e39348 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2834,6 +2834,7 @@ static int balance_kthread(void *data) mutex_lock(fs_info-balance_mutex); set_balance_control(bctl); +printk(balance_kthread: flags %llu\n, (unsigned long long)bctl-flags); if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) { printk(KERN_INFO btrfs: force skipping balance\n); @@ -2858,6 +2859,7 @@ int btrfs_recover_balance(struct btrfs_root *tree_root) struct btrfs_key key; int ret; +printk(recover_balance\n); path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -2872,7 +2874,11 @@ int btrfs_recover_balance(struct btrfs_root *tree_root) key.type = BTRFS_BALANCE_ITEM_KEY; key.offset = 0; +printk(key.obj %llu\n, (unsigned long long)key.objectid); +printk(key.type %d\n, key.type); +printk(key.off %llu\n, (unsigned long long)key.offset); ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0); +printk(search ret %d\n, ret); if (ret 0) goto out_bctl; if (ret 0) { /* ret = -ENOENT; */ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: remove lock assert from get_restripe_target()
This fixes a regression introduced by fc67c450. spin_is_locked() always returns 0 on UP kernels, which caused assert in get_restripe_target() to be fired on every call from btrfs_reduce_alloc_profile() on UP systems. Remove it completely for now, it's not clear if it's going to be needed in future. Reported-by: Bobby Powers bobbypow...@gmail.com Reported-by: Mitch Harder mitch.har...@sabayonlinux.org Tested-by: Mitch Harder mitch.har...@sabayonlinux.org Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/extent-tree.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a844204..8db0884 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3152,15 +3152,14 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) /* * returns target flags in extended format or 0 if restripe for this * chunk_type is not in progress + * + * should be called with either volume_mutex or balance_lock held */ static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags) { struct btrfs_balance_control *bctl = fs_info-balance_ctl; u64 target = 0; - BUG_ON(!mutex_is_locked(fs_info-volume_mutex) - !spin_is_locked(fs_info-balance_lock)); - if (!bctl) return 0; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at extent-tree.c:3162 (get_restripe_target)
On Fri, Apr 06, 2012 at 10:52:13AM -0500, Mitch Harder wrote: On Fri, Apr 6, 2012 at 10:43 AM, David Sterba d...@jikos.cz wrote: On Fri, Apr 06, 2012 at 10:33:04AM -0500, Mitch Harder wrote: I'm consistently hitting the BUG_ON in the new get_restripe_target function that was added by [PATCH 4/8] Btrfs: add get_restripe_target() helper when trying to initially mount a btrfs partition. Lines fs/btrfs/extent-tree.c:3161-3162: BUG_ON(!mutex_is_locked(fs_info-volume_mutex) !spin_is_locked(fs_info-balance_lock)); Is it a single CPU machine? See http://www.spinics.net/lists/linux-btrfs/msg15884.html Thanks! Yes, this is a single CPU machine. I'll test the solution from that thread. Sorry about that. I'm going to hold off a bit to see if there will be any objections to v2 from that thread, but we will definitely do something about this by rc3 or rc4 at worst. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs, lockdep: get_restripe_target: use lockdep in BUG_ON
On Wed, Apr 04, 2012 at 10:04:12PM -0400, Bobby Powers wrote: spin_is_locked always returns 0 on non-SMP systems, which causes btrfs to fail the mount. There is documentation pending as to why checking for spin_is_locked is a bad idea: https://lkml.org/lkml/2012/3/27/413 The suggested lockdep_assert_held() is not appropriate in this case, as what get_restripe_target() is checking for is that either volume_mutex is held or balance_lock is held. Luckily lockdep_assert_held() is a simple macro: WARN_ON(debug_locks !lockdep_is_held(l)) We can mimic the structure in get_restripe_target(), but we need to make sure lockdep_is_held() is defined for the !LOCKDEP case. CC: Ilya Dryomov idryo...@gmail.com CC: Chris Mason chris.ma...@oracle.com CC: Andi Kleen a...@linux.intel.com CC: Jeff Mahoney je...@suse.de CC: Ingo Molnar mi...@redhat.com CC: linux-ker...@vger.kernel.org Signed-off-by: Bobby Powers bobbypow...@gmail.com --- fs/btrfs/extent-tree.c |5 +++-- include/linux/lockdep.h |1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a844204..4d13eb1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -24,6 +24,7 @@ #include linux/kthread.h #include linux/slab.h #include linux/ratelimit.h +#include linux/lockdep.h #include compat.h #include hash.h #include ctree.h @@ -3158,8 +3159,8 @@ static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags) struct btrfs_balance_control *bctl = fs_info-balance_ctl; u64 target = 0; - BUG_ON(!mutex_is_locked(fs_info-volume_mutex) -!spin_is_locked(fs_info-balance_lock)); + BUG_ON(debug_locks !lockdep_is_held(fs_info-volume_mutex) +!lockdep_is_held(fs_info-balance_lock)); if (!bctl) return 0; diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index d36619e..94c0edb 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -392,6 +392,7 @@ struct lock_class_key { }; #define lockdep_depth(tsk) (0) +#define lockdep_is_held(l) (0) #define lockdep_assert_held(l) do { } while (0) #define lockdep_recursing(tsk) (0) -- 1.7.10.rc3.3.g19a6c OK, Mitch's report prompted me to actually take a look. This patch is wrong: by defining lockdep_is_held(l) to 0 in !LOCKDEP case you effectively mimic the behaviour of spin_is_locked() which is what getting us in the first place. get_restripe_target() interface is WIP so I will replace BUG_ON with a comment and send a patch through btrfs tree. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs, lockdep: get_restripe_target: use lockdep in BUG_ON
On Thu, Apr 05, 2012 at 12:23:01PM -0400, Bobby Powers wrote: On Wed, Apr 4, 2012 at 10:04 PM, Bobby Powers bobbypow...@gmail.com wrote: spin_is_locked always returns 0 on non-SMP systems, which causes btrfs to fail the mount. There is documentation pending as to why checking I guess I should be explicit in stating that this is a regression, so this patch or something else that addresses it should be pulled into 3.4. Yes, this is a regression and spin_is_locked() definitely has to go. I don't have a strong opinion on this assert, if there are objections to v2 I'm OK with ripping that BUG_ON entirely and replacing it with a comment (this function and its callers are WIP). Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:2733
On Fri, Mar 30, 2012 at 07:49:56PM +0200, Sander wrote: Hello Ilya, Ilya Dryomov wrote (ao): I'm definitely intrested in reproducing it. Could you please umount this filesystem, capture the output of 'btrfs-debug-tree -d dev' and post it somewhere ? Will do. It is the / filesystem, so I'll need to reboot. I need this to confirm that balance item is on disk. I'm sorry it took so long. I'll mail the output to you directly. Thanks. btrfs-debug-tree confirms that you've got a balance item on media. After that mount it back and see if there is btrfs: continuing balance line in dmesg (and if btrfs-balance kthread shows up)? There is no such line in dmesg, and currently no btrfs-balance kthread is running. I've pulled Chris Masons for-linus and booted with the resulting kernel. And given the above it's weird. We are failing to locate the item during mount for some reason and I would like to find out why. So if you are up for running debugging patches (really just compiling btrfs module and sending me dmesg output) I would appreciate that. If you don't want to do that you can try to compile btrfs-progs from git, mount fs and run 'btrfs balance resume mnt'. If that doesn't work I'll send you a small util that will simply delete the item from disk. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:2733
On Thu, Mar 29, 2012 at 12:52:35PM +0200, Sander wrote: Hello all, I can't seem to balance my btrfs filesystem. It segfaults, and gives a kernel bug: [ 1355.139099] [ cut here ] [ 1355.139099] kernel BUG at fs/btrfs/volumes.c:2733! [ 1355.149322] Internal error: Oops - BUG: 0 [#1] SMP [ 1355.149322] Modules linked in: [ 1355.154479] CPU: 0Not tainted (3.3.0 #8) [ 1355.162109] PC is at btrfs_balance+0x312/0xb04 [ 1355.166778] LR is at btrfs_run_delayed_iputs+0x2d/0xac [ 1355.166931] pc : [c0138c3a]lr : [c01234d5]psr: 6033 [ 1355.166931] sp : cb141d98 ip : fp : be83fdb4 [ 1355.166931] r10: r9 : r8 : [ 1355.184173] r7 : r6 : ffef r5 : ede7f000 r4 : ed730e00 [ 1355.189636] r3 : r2 : r1 : r0 : 0007 [ 1355.203277] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user [ 1355.203277] Control: 50c5387d Table: 8b15c04a DAC: 0015 [ 1355.203277] Process btrfs (pid: 1798, stack limit = 0xcb1402f8) [ 1355.203277] Stack: (0xcb141d98 to 0xcb142000) [ 1355.203277] 1d80: c145f944 [ 1355.227691] 1da0: 0003 ee478d40 0015 [ 1355.227691] 1dc0: [ 1355.253356] 1de0: ede7fcd4 ede7fcd8 [ 1355.253356] 1e00: 271aee1c 000200da c0160fd5 [ 1355.253356] 1e20: ed74ec00 d6257680 ed730e00 ede7f4e8 ede7fcb0 [ 1355.279022] 1e40: ede7f000 be83fdb4 c013d489 eec9c118 be83ebf8 ed74ec00 eec8a370 [ 1355.279022] 1e60: d6257680 eec8a528 be83fdb4 c013fc6b 001d 00eb [ 1355.279022] 1e80: 0007 0001 e6f7e680 c015cecd cb141ea4 cb141ef0 [ 1355.296142] 1ea0: cb15c000 01ff 0001 eeabaac0 0001 [ 1355.296142] 1ec0: ed5428c0 c1414788 d6257688 00eb cb141ef0 c016011b cb141ef0 [ 1355.321807] 1ee0: 0001 c016018f 0817 0001 271aee1c ede92250 d6257680 [ 1355.321807] 1f00: be83ebf8 be83ebf8 eec8a528 cb14 be83fdb4 c0088075 [ 1355.321807] 1f20: 4000 c00887ff [ 1355.338928] 1f40: 271aee1c 0003 d6257680 [ 1355.338928] 1f60: be83ebf8 5000940c d6257680 be83ebf8 5000940c 0003 cb14 [ 1355.364593] 1f80: c008885d 0003 be83fec7 0003 0013c478 0036 [ 1355.364593] 1fa0: c000c5a4 c000c401 be83fec7 0003 0003 5000940c be83ebf8 be83fbf8 [ 1355.364593] 1fc0: be83fec7 0003 0013c478 0036 0002 b7ad 0001 be83fdb4 [ 1355.381713] 1fe0: 00024b3d be83ebf0 b7f7 b6ea7f9c 8010 0003 00052d17 00090224 [ 1355.381713] [c0138c3a] (btrfs_balance+0x312/0xb04) from [c013d489] (btrfs_ioctl_balance+0x109/0x174) [ 1355.381713] [c013d489] (btrfs_ioctl_balance+0x109/0x174) from [c013fc6b] (btrfs_ioctl+0xbf5/0xd42) [ 1355.418518] [c013fc6b] (btrfs_ioctl+0xbf5/0xd42) from [c0088075] (vfs_ioctl+0xd/0x28) [ 1355.418518] [c0088075] (vfs_ioctl+0xd/0x28) from [c00887ff] (do_vfs_ioctl+0x35d/0x38e) [ 1355.427093] [c00887ff] (do_vfs_ioctl+0x35d/0x38e) from [c008885d] (sys_ioctl+0x2d/0x44) [ 1355.88] [c008885d] (sys_ioctl+0x2d/0x44) from [c000c401] (ret_fast_syscall+0x1/0x44) [ 1355.88] Code: d107 f116 0f11 d100 (de02) 4620 [ 1355.458343] ---[ end trace f06b6b8fcd08e6d5 ]--- A new 'btrfs filesystem balance /' seems to just hang, and is unkillable. After a reboot, I tried again, with the same result: [ 81.048767] [ cut here ] [ 81.053619] kernel BUG at fs/btrfs/volumes.c:2733! [ 81.053619] Internal error: Oops - BUG: 0 [#1] SMP [ 81.059295] Modules linked in: [ 81.059295] CPU: 1Not tainted (3.3.0 #8) [ 81.071411] PC is at btrfs_balance+0x312/0xb04 [ 81.074890] LR is at btrfs_run_delayed_iputs+0x2d/0xac [ 81.074890] pc : [c0138c3a]lr : [c01234d5]psr: 6133 [ 81.074890] sp : edda5d98 ip : fp : beb62d64 [ 81.093475] r10: r9 : r8 : [ 81.098327] r7 : r6 : ffef r5 : ed73f000 r4 : ee311c00 [ 81.098327] r3 : r2 : r1 : r0 : 0007 [ 81.112609] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user [ 81.112609] Control: 50c5387d Table: a7fb404a DAC: 0015 [ 81.112609] Process btrfs (pid: 752, stack limit = 0xedda42f8) [ 81.132354] Stack: (0xedda5d98 to 0xedda6000) [ 81.132354] 5d80: c145f944 [ 81.145477] 5da0:
Re: kernel BUG at fs/btrfs/volumes.c:2733
On Thu, Mar 29, 2012 at 05:14:22PM +0200, Sander wrote: Ilya Dryomov wrote (ao): On Thu, Mar 29, 2012 at 12:52:35PM +0200, Sander wrote: After a reboot, I tried again, with the same result: [ 81.048767] [ cut here ] [ 81.053619] kernel BUG at fs/btrfs/volumes.c:2733! [ 81.053619] Internal error: Oops - BUG: 0 [#1] SMP [ 81.059295] Modules linked in: [ 81.059295] CPU: 1Not tainted (3.3.0 #8) [ 81.071411] PC is at btrfs_balance+0x312/0xb04 [ 81.074890] LR is at btrfs_run_delayed_iputs+0x2d/0xac So you have balance item on disk, but the kernel doesn't seem to know about it in advance, which is odd and so when you try to run balance it panics on one of the safety checks. The system is a pandaboard running a plain Linus kernel 3.3.0 with a btrfs filesystem, over two Intel 320 600GB ssd's, connected via usb (on an usb hub), on top of md_crypt. Mount options: subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd Before the balance, I deleted about 2500 snapshots and waited for the btrfs kernel threads to calm down. Then I initiated a btrfs filesystem scrub. Unfortunately during the scrub, the filesystem balance started. Might be related. That's indeed pretty cool, I wonder how that could happen. I create 5 snapshots of 5 different subvolumes every 5 minutes, and the system is low on memory: total used free sharedbuffers cached Mem: 745712 33 0 0480 -/+ buffers/cache:231514 Swap:0 0 0 There is ample space on the fileystem: panda:~# df -h / Filesystem Size Used Avail Use% Mounted on /dev/mapper/ata-INTEL_SSDSA2CW600G3_CVPR112405AJ600FGN 1.1T 17G 1.1T 2% / panda:~# btrfs filesystem df / Data, RAID0: total=24.00GB, used=15.69GB System, RAID1: total=64.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=23.00GB, used=231.26MB Do you need more information? No, that's enough for now. I'm definitely intrested in reproducing it. Could you please umount this filesystem, capture the output of 'btrfs-debug-tree -d dev' and post it somewhere ? Will do. It is the / filesystem, so I'll need to reboot. I need this to confirm that balance item is on disk. After that mount it back and see if there is btrfs: continuing balance line in dmesg (and if btrfs-balance kthread shows up)? There was none after the first reboot, but I'll pay extra attention to that after the next reboot. If so, just let it run, it should finish the balance and remove on-disk item. (You can query the status of running balance with 'btrfs balance status mnt') Do I need newer tools for that? This is Debian Sid (unstable): Yeah, you do. That command is in master now, but it's not really needed. If btrfs-balance shows up, just wait for it to finish, it should get rid of the balance item. If it doesn't show up but the item is there we will have to dig deeper. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/volumes.c:2733
On Thu, Mar 29, 2012 at 04:39:29PM +0200, Sander wrote: Hello Josef, Josef Bacik wrote (ao): On Thu, Mar 29, 2012 at 12:52:35PM +0200, Sander wrote: I can't seem to balance my btrfs filesystem. It segfaults, and gives a kernel bug: [ 1355.139099] [ cut here ] [ 1355.139099] kernel BUG at fs/btrfs/volumes.c:2733! [ 1355.149322] Internal error: Oops - BUG: 0 [#1] SMP [ 1355.149322] Modules linked in: [ 1355.154479] CPU: 0Not tainted (3.3.0 #8) [ 1355.162109] PC is at btrfs_balance+0x312/0xb04 [ 1355.166778] LR is at btrfs_run_delayed_iputs+0x2d/0xac The system is a pandaboard running a plain Linus kernel 3.3.0 with a btrfs filesystem, over two Intel 320 600GB ssd's, connected via usb (on an usb hub), on top of md_crypt. Mount options: subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd Before the balance, I deleted about 2500 snapshots and waited for the btrfs kernel threads to calm down. Then I initiated a btrfs filesystem scrub. Unfortunately during the scrub, the filesystem balance started. Might be related. Well that's kind of cool. So 2 options 1) If you are in a hurry and need this stuff back right away run btrfs fi balance resume / and it should work, buuutt 2) If you aren't in a hurry I'd really like to try and reproduce this locally and if I can't I'd like to be able to send you patches to help me figure out how to fix this problem. I am in no hurry at all. The filesystem seems just fine the way it is (after a reboot), so there is no stuff to get back right away. Does the kernel bug suggest the filesystem is fubar? No, as I said in another mail you are trapping over a simle sanity check. FS should be OK. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] Restriper fixes
Hi Chris, The main one here is the improvement to btrfs_can_relocate(), which is now a tiny bit smarter and does not return ENOSPC when there's plenty of unallocated space for target chunks. This, in addition to my patch which disables silent profile upgrades, should lower a number of corner cases in profile changing. The rest are a bunch of cleanups and some minor fixes. Please pull from git://github.com/idryomov/btrfs-unstable.git for-chris top commit 213e64da90d14537cd63f7090d6c4d1fcc75d9f8 Thanks, Ilya Ilya Dryomov (8): Btrfs: add wrappers for working with alloc profiles Btrfs: make profile_is_valid() check more strict Btrfs: move alloc_profile_is_valid() to volumes.c Btrfs: add get_restripe_target() helper Btrfs: add __get_block_group_index() helper Btrfs: improve the logic in btrfs_can_relocate() Btrfs: validate target profiles only if we are going to use them Btrfs: allow dup for data chunks in mixed mode fs/btrfs/ctree.h | 33 +-- fs/btrfs/extent-tree.c | 158 ++-- fs/btrfs/volumes.c | 88 --- 3 files changed, 152 insertions(+), 127 deletions(-) -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html