Another 4.8-rc locked splat: btrfs_close_devices()

2016-09-08 Thread Ilya Dryomov
Hello,

This one seems to have appeared after Anand's commit
142388194191 ("btrfs: do not background blkdev_put()") got merged into
4.8-rc4.

Thanks,

Ilya
[  983.284212] ==
[  983.290401] [ INFO: possible circular locking dependency detected ]
[  983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted
[  983.302081] ---
[  983.308357] umount/21720 is trying to acquire lock:
[  983.313243]  (>bd_mutex){+.+.+.}, at: [] 
blkdev_put+0x31/0x150
[  983.321264] 
[  983.321264] but task is already holding lock:
[  983.327101]  (_devs->device_list_mutex){+.+...}, at: [] 
__btrfs_close_devices+0x46/0x200 [btrfs]
[  983.337839] 
[  983.337839] which lock already depends on the new lock.
[  983.337839] 
[  983.346024] 
[  983.346024] the existing dependency chain (in reverse order) is:
[  983.353512] 
-> #4 (_devs->device_list_mutex){+.+...}:
[  983.359096][] lock_acquire+0x1bc/0x1f0
[  983.365143][] mutex_lock_nested+0x65/0x350
[  983.371521][] btrfs_show_devname+0x36/0x1f0 [btrfs]
[  983.378710][] show_vfsmnt+0x4e/0x150
[  983.384593][] m_show+0x17/0x20
[  983.389957][] seq_read+0x2b5/0x3b0
[  983.395669][] __vfs_read+0x28/0x100
[  983.401464][] vfs_read+0xab/0x150
[  983.407080][] SyS_read+0x52/0xb0
[  983.412609][] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.419617] 
-> #3 (namespace_sem){++}:
[  983.424024][] lock_acquire+0x1bc/0x1f0
[  983.430074][] down_write+0x49/0x80
[  983.435785][] lock_mount+0x67/0x1c0
[  983.441582][] do_add_mount+0x32/0xf0
[  983.447458][] finish_automount+0x5a/0xc0
[  983.453682][] follow_managed+0x1b3/0x2a0
[  983.459912][] lookup_fast+0x300/0x350
[  983.465875][] path_openat+0x3a7/0xaa0
[  983.471846][] do_filp_open+0x85/0xe0
[  983.477731][] do_sys_open+0x14c/0x1f0
[  983.483702][] SyS_open+0x1e/0x20
[  983.489240][] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.496254] 
-> #2 (>s_type->i_mutex_key#3){+.+.+.}:
[  983.501798][] lock_acquire+0x1bc/0x1f0
[  983.507855][] down_write+0x49/0x80
[  983.513558][] start_creating+0x87/0x100
[  983.519703][] debugfs_create_dir+0x17/0x100
[  983.526195][] bdi_register+0x93/0x210
[  983.532165][] bdi_register_owner+0x43/0x70
[  983.538570][] device_add_disk+0x1fb/0x450
[  983.544888][] loop_add+0x1e6/0x290
[  983.550596][] loop_init+0x10b/0x14f
[  983.556394][] do_one_initcall+0xa7/0x180
[  983.562618][] kernel_init_freeable+0x1cc/0x266
[  983.569370][] kernel_init+0xe/0x100
[  983.575166][] ret_from_fork+0x1f/0x40
[  983.581131] 
-> #1 (loop_index_mutex){+.+.+.}:
[  983.585801][] lock_acquire+0x1bc/0x1f0
[  983.591858][] mutex_lock_nested+0x65/0x350
[  983.598256][] lo_open+0x1f/0x60
[  983.603704][] __blkdev_get+0x123/0x400
[  983.609757][] blkdev_get+0x34a/0x350
[  983.615639][] blkdev_open+0x64/0x80
[  983.621428][] do_dentry_open+0x1c6/0x2d0
[  983.627651][] vfs_open+0x69/0x80
[  983.633181][] path_openat+0x834/0xaa0
[  983.639152][] do_filp_open+0x85/0xe0
[  983.645035][] do_sys_open+0x14c/0x1f0
[  983.650999][] SyS_open+0x1e/0x20
[  983.656535][] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.663541] 
-> #0 (>bd_mutex){+.+.+.}:
[  983.668107][] __lock_acquire+0x1003/0x17b0
[  983.674510][] lock_acquire+0x1bc/0x1f0
[  983.680561][] mutex_lock_nested+0x65/0x350
[  983.686967][] blkdev_put+0x31/0x150
[  983.692761][] btrfs_close_bdev+0x4f/0x60 [btrfs]
[  983.699699][] __btrfs_close_devices+0xcb/0x200 
[btrfs]
[  983.707178][] btrfs_close_devices+0x2b/0xa0 [btrfs]
[  983.714380][] close_ctree+0x265/0x340 [btrfs]
[  983.721061][] btrfs_put_super+0x19/0x20 [btrfs]
[  983.727908][] generic_shutdown_super+0x6f/0x100
[  983.734744][] kill_anon_super+0x16/0x30
[  983.740888][] btrfs_kill_super+0x1e/0x130 [btrfs]
[  983.747909][] deactivate_locked_super+0x49/0x80
[  983.754745][] deactivate_super+0x5d/0x70
[  983.760977][] cleanup_mnt+0x5c/0x80
[  983.766773][] __cleanup_mnt+0x12/0x20
[  983.772738][] task_work_run+0x7e/0xc0
[  983.778708][] exit_to_usermode_loop+0x7e/0xb4
[  983.785373][] syscall_return_slowpath+0xbb/0xd0
[  983.792212][] entry_SYSCALL_64_fastpath+0xbf/0xc1
[  983.799225] 
[  983.799225] other info that might help us debug this:
[  983.799225] 
[  983.807291] Chain exists of:
  >bd_mutex --> namespace_sem --> _devs->device_list_mutex

[  983.816521]  Possible unsafe locking scenario:
[  983.816521] 
[  983.822489]CPU0CPU1
[  983.827043]   

Possible recursive locking on btrfs_inode::log_mutex

2016-08-20 Thread Ilya Dryomov
Hi,

I'm seeing the attached lockdep splat on 4.8-rc2 - I'm guessing it got
introduced in 4.8-rc.  It appears to be very easy to trigger and shows up
within seconds after mkfs + mount.

Thanks,

Ilya
kernel: =
kernel: [ INFO: possible recursive locking detected ]
kernel: 4.8.0-rc2-ceph-00297-gab9f8d2 #1 Not tainted
kernel: -
kernel: ceph-osd/25016 is trying to acquire lock:
kernel: (>log_mutex){+.+...}, at: [] 
btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: #012but task is already holding lock:
kernel: (>log_mutex){+.+...}, at: [] 
btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: #012other info that might help us debug this:
kernel: Possible unsafe locking scenario:
kernel:   CPU0
kernel:   
kernel:  lock(>log_mutex);
kernel:  lock(>log_mutex);
kernel: #012 *** DEADLOCK ***
kernel: May be due to missing lock nesting notation
kernel: 6 locks held by ceph-osd/25016:
kernel: #0:  (sb_writers#15){.+.+.+}, at: [] 
__sb_start_write+0xad/0xe0
kernel: #1:  (>i_mutex_dir_key#4/1){+.+.+.}, at: [] 
lock_rename+0x2e/0x100
kernel: #2:  (>s_type->i_mutex_key#19){+.+.+.}, at: [] 
lock_two_nondirectories+0x3e/0x70
kernel: #3:  (>s_type->i_mutex_key#19/4){+.+...}, at: [] 
lock_two_nondirectories+0x66/0x70
kernel: #4:  (sb_internal#2){.+.+.+}, at: [] 
__sb_start_write+0x7b/0xe0
kernel: #5:  (>log_mutex){+.+...}, at: [] 
btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: #012stack backtrace:
kernel: CPU: 0 PID: 25016 Comm: ceph-osd Not tainted 
4.8.0-rc2-ceph-00297-gab9f8d2 #1
kernel: Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
kernel: 0086 668dbca4 88042a277870 81413e51
kernel: 828f4630 828f4630 88042a277930 811069c0
kernel: dfc5c151 8804 4efe3813430e32cf 880429f38000
kernel: Call Trace:
kernel: [] dump_stack+0x85/0xc4
kernel: [] __lock_acquire+0xce0/0x1680
kernel: [] ? mutex_unlock+0xe/0x10
kernel: [] lock_acquire+0x197/0x1f0
kernel: [] ? btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: [] mutex_lock_nested+0x79/0x370
kernel: [] ? btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: [] btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: [] ? __might_sleep+0x70/0x90
kernel: [] ? btrfs_i_callback+0x20/0x20 [btrfs]
kernel: [] ? btrfs_invalidate_inodes+0x198/0x1f0 [btrfs]
kernel: [] ? _raw_spin_unlock+0x27/0x40
kernel: [] btrfs_log_inode+0x534/0xb50 [btrfs]
kernel: [] ? trace_hardirqs_on_caller+0x178/0x1d0
kernel: [] btrfs_log_inode_parent+0x881/0x970 [btrfs]
kernel: [] ? _raw_spin_unlock+0x27/0x40
kernel: [] ? btrfs_update_inode+0xd7/0x110 [btrfs]
kernel: [] btrfs_log_new_name+0x7b/0x80 [btrfs]
kernel: [] btrfs_rename2+0xfd2/0x16c0 [btrfs]
kernel: [] ? lock_two_nondirectories+0x66/0x70
kernel: [] ? down_write_nested+0x4f/0x80
kernel: [] ? lock_two_nondirectories+0x66/0x70
kernel: [] vfs_rename+0x467/0x830
kernel: [] ? read_seqcount_begin+0x90/0xa0
kernel: [] SyS_rename+0x1ee/0x3c0
kernel: [] do_syscall_64+0x7a/0x1c0
kernel: [] entry_SYSCALL64_slow_path+0x25/0x25


Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Ilya Dryomov
On Wed, Mar 30, 2016 at 7:25 PM, Liu Bo <bo.li@oracle.com> wrote:
> On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote:
>> Hi,
>>
>> We are hitting the attached lockup on a somewhat regular basis during
>> nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
>> on btrfs_free_cluster::lock, which is held by writer, who seems to be
>> stuck in an endless loop in btrfs_find_space_cluster(), trying to
>> cleanup bitmaps list.  Smells like a list corruption to me?
>
> My objdump shows that find_free_extent() may wait on 
> down_read(_info->groups_sem);

It's spinning on a spinlock:

6228 if (last_ptr) {
6229 spin_lock(_ptr->lock);
6230 if (last_ptr->block_group)

>
> One possible thing is that there're too many entries in bitmap, and
> list_for_each_entry just is just stuck there.

I don't think so - look at the two journal_write splats.

>
> Are these stacks from "Blocked more than 120s"?

No, these are all soft lockups.  Sorry, I had to edit it to make it
readable and concentrated on stack traces.  I guess "BUG: soft lockup"
were mixed up with modules and object code all over the place.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Ilya Dryomov
Hi,

We are hitting the attached lockup on a somewhat regular basis during
nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
on btrfs_free_cluster::lock, which is held by writer, who seems to be
stuck in an endless loop in btrfs_find_space_cluster(), trying to
cleanup bitmaps list.  Smells like a list corruption to me?

The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but
the surroundings look sufficiently similar to upstream and given recent
patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in
setup_cluster_bitmaps") I thought this might be relevant for upstream.

Thanks,

Ilya
[74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted 
3.13.0-83-generic #127-Ubuntu
[74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
09/07/2015
[74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: 
88083c564000
[74750.666936] RIP: 0010:[]  [] 
_raw_spin_lock+0x37/0x50
[74750.675645] RSP: 0018:88083c565b50  EFLAGS: 0206
[74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: 09a0
[74750.689126] RDX: 09a6 RSI: 09a6 RDI: 88084f6671b0
[74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: 0001
[74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: 8807e829d4e0
[74750.712140] R13: a02073ab R14: 88083c565ae0 R15: 8807e829d4e0
[74750.719811] FS:  () GS:88087fd8() 
knlGS:
[74750.728450] CS:  0010 DS:  ES:  CR0: 80050033
[74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: 001407e0
[74750.742437] Stack:
[74750.745001]  88083c565c28 a01b07a3 88074873a000 
880530131d20
[74750.753033]  880791982000 88083c565cef 1000 
00040002
[74750.761065]  a01a3a97 0020 880523f4a800 
0001
[74750.769105] Call Trace:
[74750.772137]  [] find_free_extent+0x213/0xc30 [btrfs]
[74750.779259]  [] ? btrfs_del_items+0x367/0x470 [btrfs]
[74750.786476]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[74750.793869]  [] __btrfs_prealloc_file_range+0xe5/0x380 
[btrfs]
[74750.801876]  [] btrfs_prealloc_file_range_trans+0x30/0x40 
[btrfs]
[74750.810136]  [] btrfs_write_dirty_block_groups+0x4d3/0x620 
[btrfs]
[74750.818469]  [] commit_cowonly_roots+0x151/0x213 [btrfs]
[74750.825940]  [] btrfs_commit_transaction+0x483/0x970 
[btrfs]
[74750.833765]  [] transaction_kthread+0x1b5/0x240 [btrfs]
[74750.841156]  [] ? btrfs_cleanup_transaction+0x550/0x550 
[btrfs]
[74750.849244]  [] kthread+0xd2/0xf0
[74750.854729]  [] ? kthread_create_on_node+0x1c0/0x1c0
[74750.861861]  [] ret_from_fork+0x58/0x90
[74750.867868]  [] ? kthread_create_on_node+0x1c0/0x1c0

[74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted 
3.13.0-83-generic #127-Ubuntu
[74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
09/07/2015
[74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15)
[74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: 
88078ad98000
[74758.690802] RIP: 0010:[]  [] 
_raw_spin_lock+0x32/0x50
[74758.701202] RSP: 0018:88078ad99728  EFLAGS: 0202
[74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: 09a0
[74758.717910] RDX: 09a4 RSI: 09a4 RDI: 88084f6671b0
[74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: 0001
[74758.736311] R10:  R11: ea001b38bc00 R12: 88084e918770
[74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: 88078ad99698
[74758.754665] FS:  () GS:88087fd4() 
knlGS:
[74758.764805] CS:  0010 DS:  ES:  CR0: 80050033
[74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: 001407e0
[74758.781766] Stack:
[74758.785786]  88078ad99800 a01b07a3 8807e6bc5000 
880853034e70
[74758.795307]  88078ad997b8 88078ad998b7 880853034e40 
00040002
[74758.804838]  88078ad99778 0020 880523f4a800 
0001
[74758.814337] Call Trace:
[74758.818811]  [] find_free_extent+0x213/0xc30 [btrfs]
[74758.827389]  [] ? alloc_extent_state+0x21/0xc0 [btrfs]
[74758.836147]  [] ? __lookup_extent_mapping+0xa0/0x150 
[btrfs]
[74758.845413]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[74758.854224]  [] cow_file_range+0x135/0x430 [btrfs]
[74758.862593]  [] run_delalloc_range+0x312/0x350 [btrfs]
[74758.871310]  [] ? 
find_lock_delalloc_range.constprop.43+0x1b9/0x1f0 [btrfs]
[74758.881857]  [] ? ata_scsi_queuecmd+0x133/0x400
[74758.889974]  [] __extent_writepage+0x2f4/0x760 [btrfs]
[74758.898703]  [] ? btrfs_add_delayed_iput+0x61/0xc0 [btrfs]
[74758.907763]  [] ? __blk_run_queue+0x33/0x40
[74758.915511]  [] ? find_get_pages_tag+0xd1/0x180
[74758.923600]  [] ? kmem_cache_alloc_trace+0x3c/0x1f0
[74758.932051]  [] 

Re: [PATCH] btrfs: Allow forced conversion of metadata to dup profile on multiple devices

2014-02-24 Thread Ilya Dryomov
On Thu, Feb 20, 2014 at 6:57 PM, David Sterba dste...@suse.cz wrote:
 On Wed, Feb 19, 2014 at 11:10:41AM -0500, Austin S Hemmelgarn wrote:
 Currently, btrfs balance start fails when trying to convert metadata or
 system chunks to dup profile on filesystems with multiple devices.  This
 requires that a conversion from a multi-device filesystem to a single
 device filesystem use the following methodology:
 1. btrfs balance start -dconvert=single -mconvert=single \
-sconvert=single -f /
 2. btrfs device delete / /dev/sdx
 3. btrfs balance start -mconvert=dup -sconvert=dup /
 This results in a period of time (possibly very long if the devices are
 big) where you don't have the protection guarantees of multiple copies
 of metadata chunks.

 After applying this patch, one can instead use the following methodology
 for conversion from a multi-device filesystem to a single device
 filesystem:
 1. btrfs balance start -dconvert=single -mconvert=dup \
-sconvert=dup -f /
 2. btrfs device delete / /dev/sdx
 This greatly reduces the chances of the operation causing data loss due
 to a read error during the device delete.

 Signed-off-by: Austin S. Hemmelgarn ahferro...@gmail.com
 Reviewed-by: David Sterba dste...@suse.cz

 Sounds useful. The muliple devices + DUP is allowed setup when the
 device is added, this patch only adds the 'delete' counterpart. The
 imroved data loss protection during the process is a good thing.

Hi,

Have you actually tried to queue it?  Unless I'm missing something, it won't
compile, and on top of that, it seems to be corrupted too..

IIRC muliple devices + DUP is allowed only until the first balance, has that
changed?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Allow forced conversion of metadata to dup profile on multiple devices

2014-02-24 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 3:44 PM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2014-02-24 08:37, Ilya Dryomov wrote:
 On Thu, Feb 20, 2014 at 6:57 PM, David Sterba dste...@suse.cz wrote:
 On Wed, Feb 19, 2014 at 11:10:41AM -0500, Austin S Hemmelgarn wrote:
 Currently, btrfs balance start fails when trying to convert metadata or
 system chunks to dup profile on filesystems with multiple devices.  This
 requires that a conversion from a multi-device filesystem to a single
 device filesystem use the following methodology:
 1. btrfs balance start -dconvert=single -mconvert=single \
-sconvert=single -f /
 2. btrfs device delete / /dev/sdx
 3. btrfs balance start -mconvert=dup -sconvert=dup /
 This results in a period of time (possibly very long if the devices are
 big) where you don't have the protection guarantees of multiple copies
 of metadata chunks.

 After applying this patch, one can instead use the following methodology
 for conversion from a multi-device filesystem to a single device
 filesystem:
 1. btrfs balance start -dconvert=single -mconvert=dup \
-sconvert=dup -f /
 2. btrfs device delete / /dev/sdx
 This greatly reduces the chances of the operation causing data loss due
 to a read error during the device delete.

 Signed-off-by: Austin S. Hemmelgarn ahferro...@gmail.com
 Reviewed-by: David Sterba dste...@suse.cz

 Sounds useful. The muliple devices + DUP is allowed setup when the
 device is added, this patch only adds the 'delete' counterpart. The
 imroved data loss protection during the process is a good thing.

 Hi,

 Have you actually tried to queue it?  Unless I'm missing something, it won't
 compile, and on top of that, it seems to be corrupted too..
 The patch itself was made using git, AFAICT it should be fine.  I've
 personally built and tested it using UML.

It doesn't look fine.  It was generated with git, but it got corrupted
on the way: either how you pasted it or the email client you use is the
problem.

On Wed, Feb 19, 2014 at 6:10 PM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 Currently, btrfs balance start fails when trying to convert metadata or
 system chunks to dup profile on filesystems with multiple devices.  This
 requires that a conversion from a multi-device filesystem to a single
 device filesystem use the following methodology:
 1. btrfs balance start -dconvert=single -mconvert=single \
-sconvert=single -f /
 2. btrfs device delete / /dev/sdx
 3. btrfs balance start -mconvert=dup -sconvert=dup /
 This results in a period of time (possibly very long if the devices are
 big) where you don't have the protection guarantees of multiple copies
 of metadata chunks.

 After applying this patch, one can instead use the following methodology
 for conversion from a multi-device filesystem to a single device
 filesystem:
 1. btrfs balance start -dconvert=single -mconvert=dup \
-sconvert=dup -f /
 2. btrfs device delete / /dev/sdx
 This greatly reduces the chances of the operation causing data loss due
 to a read error during the device delete.

 Signed-off-by: Austin S. Hemmelgarn ahferro...@gmail.com
 ---
  fs/btrfs/volumes.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 07629e9..38a9522 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -3152,10 +3152,8 @@ int btrfs_balance(struct btrfs_balance_control
 *bctl,

^^^, that should be a single line

 num_devices--;
 }
 btrfs_dev_replace_unlock(fs_info-dev_replace);
 -   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
 -   if (num_devices == 1)
 -   allowed |= BTRFS_BLOCK_GROUP_DUP;
 -   else if (num_devices  1)
 +   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
 +   if (num_devices  1)
 allowed |= (BTRFS_BLOCK_GROUP_RAID0 | 
 BTRFS_BLOCK_GROUP_RAID1);
 if (num_devices  2)
 allowed |= BTRFS_BLOCK_GROUP_RAID5;
 @@ -3221,6 +3219,21 @@ int btrfs_balance(struct btrfs_balance_control
 *bctl,

^^^, ditto

 goto out;
 }
 }
 +   if (((bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT) 
 +   (bctl-sys.target  ~BTRFS_BLOCK_GROUP_DUP) ||
 +   (bctl-meta.flags  BTRFS_BALANCE_ARGS_CONVERT) 
 +   (bctl-meta.target  ~BTRFS_BLOCK_GROUP_DUP)) 
 +   (num_devs  1)) {
 +   if (bctl-flags  BTRFS_BALANCE_FORCE) {
 +   btrfs_info(fs_info, force conversion of 
 metadata 
 +  to dup profile on multiple 
 devices);
 +   } else {
 +   btrfs_err(fs_info, balance will reduce 
 metadata 
 + integrity, use force if you want

Re: [PATCH 2/6] Btrfs-progs: add btrfsck functionality to btrfs

2013-11-14 Thread Ilya Dryomov
On Wed, Nov 13, 2013 at 7:13 PM, David Sterba dste...@suse.cz wrote:
 Hi,

 On Sun, Jun 02, 2013 at 05:47:38PM +0200, Dieter Ries wrote:
 For this to have any effect, 'h' must be added to getopt_long(), see
 attached patch 1.

 However, this results in btrfsck -h and --help doing different things:

 --help prints the usage message to stdout and exits with exit(0).
 -h prints the usage message to stderr and exits with exit(129).

 I made a patch to fix this, see attached patch 2.
 What it doesn't fix though is, that -h/--help and -? don't do the same
 thing. This is more complicated, as getop_long returns '?' for unknown
 options.

 FYI, both patchess added to integration.

Hi David,

FWIW, I think none of the btrfs sub-commands treat -h as a help option.
(This is an artifact that was inherited from the the old btrfs-progs
utility.)  -h vs --help is actually consistent: -h results in a btrfs
check: invalid option -- 'h' message, and therefore exits with 129.
Since 'btrfs check -h' has clearly never worked we might want to keep
the status quo.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Btrfs: do not inc uncorrectable_errors counter on ro scrubs

2013-11-03 Thread Ilya Dryomov
Currently if we discover an error when scrubbing in ro mode we a)
blindly increment the uncorrectable_errors counter, and b) spam the
dmesg with the 'unable to fixup (regular) error at ...' message, even
though a) we haven't tried to determine if the error is correctable or
not, and b) we haven't tried to fixup anything.  Fix this.

Cc: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/scrub.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f21e2df..f94b98d 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -938,8 +938,10 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
BTRFS_DEV_STAT_CORRUPTION_ERRS);
}
 
-   if (sctx-readonly  !sctx-is_dev_replace)
-   goto did_not_correct_error;
+   if (sctx-readonly) {
+   ASSERT(!sctx-is_dev_replace);
+   goto out;
+   }
 
if (!is_metadata  !have_csum) {
struct scrub_fixup_nodatasum *fixup_nodatasum;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Btrfs: disable online raid-repair on ro mounts

2013-11-03 Thread Ilya Dryomov
This disables the if needed, write the good copy back before the read
is completed part of the read sequence for read-only mounts.

Cc: Jan Schmidt list.bt...@jan-o-sch.net
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/extent_io.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c09a40d..a0b7403 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1979,6 +1979,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
start,
struct btrfs_mapping_tree *map_tree = fs_info-mapping_tree;
int ret;
 
+   ASSERT(!(fs_info-sb-s_flags  MS_RDONLY));
BUG_ON(!mirror_num);
 
/* we can't repair anything in raid56 yet */
@@ -2035,6 +2036,9 @@ int repair_eb_io_failure(struct btrfs_root *root, struct 
extent_buffer *eb,
unsigned long i, num_pages = num_extent_pages(eb-start, eb-len);
int ret = 0;
 
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
for (i = 0; i  num_pages; i++) {
struct page *p = extent_buffer_page(eb, i);
ret = repair_io_failure(root-fs_info, start, PAGE_CACHE_SIZE,
@@ -2056,12 +2060,12 @@ static int clean_io_failure(u64 start, struct page 
*page)
u64 private;
u64 private_failure;
struct io_failure_record *failrec;
-   struct btrfs_fs_info *fs_info;
+   struct inode *inode = page-mapping-host;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
struct extent_state *state;
int num_copies;
int did_repair = 0;
int ret;
-   struct inode *inode = page-mapping-host;
 
private = 0;
ret = count_range_bits(BTRFS_I(inode)-io_failure_tree, private,
@@ -2084,6 +2088,8 @@ static int clean_io_failure(u64 start, struct page *page)
did_repair = 1;
goto out;
}
+   if (fs_info-sb-s_flags  MS_RDONLY)
+   goto out;
 
spin_lock(BTRFS_I(inode)-io_tree.lock);
state = find_first_extent_bit_state(BTRFS_I(inode)-io_tree,
@@ -2093,7 +2099,6 @@ static int clean_io_failure(u64 start, struct page *page)
 
if (state  state-start = failrec-start 
state-end = failrec-start + failrec-len - 1) {
-   fs_info = BTRFS_I(inode)-root-fs_info;
num_copies = btrfs_num_copies(fs_info, failrec-logical,
  failrec-len);
if (num_copies  1)  {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts

2013-10-11 Thread Ilya Dryomov
On Fri, Oct 11, 2013 at 4:13 AM, Wang Shilong
wangsl.f...@cn.fujitsu.com wrote:
 On 10/11/2013 01:40 AM, Ilya Dryomov wrote:

 I have a question in my mind.

 Can we reach a state that there is operation in progress when filesystem
 has been readonly?If we do cancel operations on a ro filesystem, we should
 get No operations in progress .

So if I understand you correctly you are saying that we should return
the Not in progress error code if the filesystem is mounted ro and
there is nothing to cancel.  I actually did think about it, but decided
against it because one can extend this argument to starting commands
with something like If I order a start on a read-only fs and that
operation has already been started, I should get EINPROGRESS., and
that's not what we are currently doing.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts

2013-10-11 Thread Ilya Dryomov
On Fri, Oct 11, 2013 at 12:28 PM, Remco Hosman - Yerf-IT
re...@yerf-it.nl wrote:
 i would expect a RO mount never to write anything to a filesystem. not even
 replay a journal (or a seperate option for that).
 Its possible that the device is not writable at all, if its a snapshot or a
 RO iscsi device of some kind.

I agree, and that's exactly the point of this patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: nuke a bogus rw_devices decrement in __btrfs_close_devices

2013-10-10 Thread Ilya Dryomov
On mount failures, __btrfs_close_devices can be called well before
dev-replace state is read and -is_tgtdev_for_dev_replace is set.  This
leads to a bogus decrement of -rw_devices and sets off a WARN_ON in
__btrfs_close_devices if replace target device happens to be on the
lists and we fail early in the mount sequence.  Fix this by checking
the devid instead of -is_tgtdev_for_dev_replace before the decrement:
for replace targets devid is always equal to BTRFS_DEV_REPLACE_DEVID.

Cc: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 043b215..a306db9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -666,7 +666,8 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
if (device-bdev)
fs_devices-open_devices--;
 
-   if (device-writeable  !device-is_tgtdev_for_dev_replace) {
+   if (device-writeable 
+   device-devid != BTRFS_DEV_REPLACE_DEVID) {
list_del_init(device-dev_alloc_list);
fs_devices-rw_devices--;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: don't leak ioctl args in btrfs_ioctl_dev_replace

2013-10-10 Thread Ilya Dryomov
struct btrfs_ioctl_dev_replace_args memory is leaked if replace is
requested on a read-only filesystem.  Fix it.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..17bb98e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3679,9 +3679,10 @@ static long btrfs_ioctl_dev_replace(struct btrfs_root 
*root, void __user *arg)
 
switch (p-cmd) {
case BTRFS_IOCTL_DEV_REPLACE_CMD_START:
-   if (root-fs_info-sb-s_flags  MS_RDONLY)
-   return -EROFS;
-
+   if (root-fs_info-sb-s_flags  MS_RDONLY) {
+   ret = -EROFS;
+   goto out;
+   }
if (atomic_xchg(
root-fs_info-mutually_exclusive_operation_running,
1)) {
@@ -3707,7 +3708,7 @@ static long btrfs_ioctl_dev_replace(struct btrfs_root 
*root, void __user *arg)
 
if (copy_to_user(arg, p, sizeof(*p)))
ret = -EFAULT;
-
+out:
kfree(p);
return ret;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts

2013-10-10 Thread Ilya Dryomov
For both balance and replace, cancelling involves changing the on-disk
state and committing a transaction, which is not a good thing to do on
read-only filesystems.

Cc: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/dev-replace.c |3 +++
 fs/btrfs/volumes.c |3 +++
 2 files changed, 6 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9efb94e..98df261 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -650,6 +650,9 @@ static u64 __btrfs_dev_replace_cancel(struct btrfs_fs_info 
*fs_info)
u64 result;
int ret;
 
+   if (fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
mutex_lock(dev_replace-lock_finishing_cancel_unmount);
btrfs_dev_replace_lock(dev_replace);
switch (dev_replace-replace_state) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a306db9..2630f38 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3424,6 +3424,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 {
+   if (fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
mutex_lock(fs_info-balance_mutex);
if (!fs_info-balance_ctl) {
mutex_unlock(fs_info-balance_mutex);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix the dev-replace suspend sequence

2013-10-07 Thread Ilya Dryomov
Replace progresses strictly from lower to higher offsets, and the
progress is tracked in chunks, by storing the physical offset of the
dev_extent which is being copied in the cursor_left field of
btrfs_dev_replace_item.  When we are done copying the chunk,
left_cursor is updated to point one byte past the dev_extent, so that
on resume we can skip the dev_extents that have already been copied.

There is a major bug (which goes all the way back to the inception of
dev-replace in 3.8) in the way left_cursor is bumped: the bump is done
unconditionally, without any regard to the scrub_chunk return value.
On suspend (and also on any kind of error) scrub_chunk returns early,
i.e. without completing the copy.  This leads to us skipping the chunk
that hasn't been fully copied yet when resuming.

Fix this by doing the cursor_left update only if scrub_chunk ret is 0.
(On suspend scrub_chunk returns with -ECANCELED, so this fix covers
both suspend and error cases.)

Cc: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/scrub.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a18e0e2..f21e2df 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2717,8 +2717,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
mutex_unlock(fs_info-scrub_lock);
wake_up(fs_info-scrub_pause_wait);
 
-   dev_replace-cursor_left = dev_replace-cursor_right;
-   dev_replace-item_needs_writeback = 1;
btrfs_put_block_group(cache);
if (ret)
break;
@@ -2732,6 +2730,9 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
break;
}
 
+   dev_replace-cursor_left = dev_replace-cursor_right;
+   dev_replace-item_needs_writeback = 1;
+
key.offset = found_key.offset + length;
btrfs_release_path(path);
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: eliminate races in worker stopping code

2013-10-02 Thread Ilya Dryomov
The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop.  The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

   btrfs_stop_workers():check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
  working list
- removes the first worker from the
  working list
- releases the lock to wait for
  its kthread's completion
  - grabs the lock
  - if aworker is on the working list,
moves aworker from the working list
to the idle list
  - releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
  working list
  ..
btrfs_stop_workers returns, aworker is on the idle list
 FS is umounted, memory is freed
  ..
  aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba dste...@suse.cz
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/async-thread.c |   25 +++--
 fs/btrfs/async-thread.h |2 ++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 58b7d14..08cc08f 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -107,7 +107,8 @@ static void check_idle_worker(struct btrfs_worker_thread 
*worker)
worker-idle = 1;
 
/* the list may be empty if the worker is just starting */
-   if (!list_empty(worker-worker_list)) {
+   if (!list_empty(worker-worker_list) 
+   !worker-workers-stopping) {
list_move(worker-worker_list,
 worker-workers-idle_list);
}
@@ -127,7 +128,8 @@ static void check_busy_worker(struct btrfs_worker_thread 
*worker)
spin_lock_irqsave(worker-workers-lock, flags);
worker-idle = 0;
 
-   if (!list_empty(worker-worker_list)) {
+   if (!list_empty(worker-worker_list) 
+   !worker-workers-stopping) {
list_move_tail(worker-worker_list,
  worker-workers-worker_list);
}
@@ -412,6 +414,7 @@ void btrfs_stop_workers(struct btrfs_workers *workers)
int can_stop;
 
spin_lock_irq(workers-lock);
+   workers-stopping = 1;
list_splice_init(workers-idle_list, workers-worker_list);
while (!list_empty(workers-worker_list)) {
cur = workers-worker_list.next;
@@ -455,6 +458,7 @@ void btrfs_init_workers(struct btrfs_workers *workers, char 
*name, int max,
workers-ordered = 0;
workers-atomic_start_pending = 0;
workers-atomic_worker_start = async_helper;
+   workers-stopping = 0;
 }
 
 /*
@@ -480,15 +484,19 @@ static int __btrfs_start_workers(struct btrfs_workers 
*workers)
atomic_set(worker-num_pending, 0);
atomic_set(worker-refs, 1);
worker-workers = workers;
-   worker-task = kthread_run(worker_loop, worker,
-  btrfs-%s-%d, workers-name,
-  workers-num_workers + 1);
+   worker-task = kthread_create(worker_loop, worker,
+ btrfs-%s-%d, workers-name,
+ workers-num_workers + 1);
if (IS_ERR(worker-task)) {
ret = PTR_ERR(worker-task);
-   kfree(worker);
goto fail;
}
+
spin_lock_irq(workers-lock);
+   if (workers-stopping) {
+   spin_unlock_irq(workers-lock);
+   goto fail_kthread;
+   }
list_add_tail(worker-worker_list, workers-idle_list);
worker-idle = 1;
workers-num_workers++;
@@ -496,8 +504,13 @@ static int __btrfs_start_workers(struct btrfs_workers 
*workers)
WARN_ON(workers-num_workers_starting  0);
spin_unlock_irq(workers-lock);
 
+   wake_up_process(worker-task);
return 0;
+
+fail_kthread:
+   kthread_stop(worker-task);
 fail:
+   kfree(worker);
spin_lock_irq(workers-lock);
workers-num_workers_starting--;
spin_unlock_irq(workers-lock);
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 063698b..1f26792 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -107,6 +107,8 @@ struct btrfs_workers {
 
/* extra name for this worker, used

[PATCH] Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing

2013-10-02 Thread Ilya Dryomov
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents.  Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/dev-replace.c |5 +
 fs/btrfs/volumes.c |7 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index a644353..5aa0718 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -535,10 +535,7 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
list_add(tgt_device-dev_alloc_list, fs_info-fs_devices-alloc_list);
 
btrfs_rm_dev_replace_srcdev(fs_info, src_device);
-   if (src_device-bdev) {
-   /* zero out the old super */
-   btrfs_scratch_superblock(src_device);
-   }
+
/*
 * this is again a consistent state where no dev_replace procedure
 * is running, the target device is part of the filesystem, the
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0052ca8..a9ac68a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1715,6 +1715,7 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
 struct btrfs_device *srcdev)
 {
WARN_ON(!mutex_is_locked(fs_info-fs_devices-device_list_mutex));
+
list_del_rcu(srcdev-dev_list);
list_del_rcu(srcdev-dev_alloc_list);
fs_info-fs_devices-num_devices--;
@@ -1724,9 +1725,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
}
if (srcdev-can_discard)
fs_info-fs_devices-num_can_discard--;
-   if (srcdev-bdev)
+   if (srcdev-bdev) {
fs_info-fs_devices-open_devices--;
 
+   /* zero out the old super */
+   btrfs_scratch_superblock(srcdev);
+   }
+
call_rcu(srcdev-rcu, free_device);
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: unify apostrophes in output files

2013-09-19 Thread Ilya Dryomov
On Thu, Sep 19, 2013 at 7:20 PM, Eric Sandeen sand...@redhat.com wrote:
 xfstests: unify apostrophes in output files

 From: Tomas Racek tra...@redhat.com

 With coreutils v8.16 the style of apostrophes changed from `word' to
 'word'. This is breaking some tests which use the older form.

 This commit introduces function changes the golden output of the
 affected tests and introduces a filter for the older style output.

 [dchinner: modified to use a global filter in check rather than
 per-test filters]

 Signed-off-by: Tomas Racek tra...@redhat.com
 Signed-off-by: Dave Chinner dchin...@redhat.com
 Reviewed-by: Eric Sandeen sand...@redhat.com
 ---

 (Resending as proper top-level, modified, reviewed patch --Eric)

  check |  4 
  common/config |  1 +
  tests/generic/193.out | 16 
  tests/generic/230.out |  8 
  tests/generic/235.out |  2 +-
  tests/generic/245.out |  2 +-
  tests/generic/294.out |  8 
  tests/generic/306.out |  2 +-
  tests/xfs/103.out |  2 +-
  tests/xfs/200.out |  2 +-
  10 files changed, 26 insertions(+), 21 deletions(-)

 diff --git a/check b/check
 index 4085eae..ba7fd21 100755
 --- a/check
 +++ b/check
 @@ -478,6 +478,10 @@ do
 echo  - no qualified output
 err=true
 else
 +
 +   # coreutils 2.16+ changed quote formats in error messages from

Should that be 8.16+?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 18/20] Btrfs-progs: fix magic return value in cmds-balance.c

2013-09-10 Thread Ilya Dryomov
On Tue, Sep 10, 2013 at 5:28 AM, Wang Shilong
wangsl.f...@cn.fujitsu.com wrote:
 If there is no balance in progress, resume/pause/cancel
 will return 2. Usage or syntax errors will return 1.
 And 0 means operations return successfully.

 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
 v1-v2:
 Address comments from Ilya Dryomov.

Reviewed-by: Ilya Dryomov idryo...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/20] Btrfs-progs: fix magic return value in cmds-balance.c

2013-09-04 Thread Ilya Dryomov
Hi Wang,

Thank you for doing the grunt work, it's been a long standing todo.
See the comments below.

On Wed, Sep 4, 2013 at 6:22 PM, Wang Shilong wangshilong1...@gmail.com wrote:
 From: Wang Shilong wangsl.f...@cn.fujitsu.com

 If there is no balance in progress.resume/pause/cancel will
 return 2. For usage or syntal errors will return 1. 0 means
 operations return successfully.

This needs to be reworded (spelling, punctuation).


 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
  cmds-balance.c | 93 
 --
  1 file changed, 58 insertions(+), 35 deletions(-)

 diff --git a/cmds-balance.c b/cmds-balance.c
 index b7382ef..fd68051 100644
 --- a/cmds-balance.c
 +++ b/cmds-balance.c
 @@ -58,7 +58,7 @@ static int parse_one_profile(const char *profile, u64 
 *flags)
 *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE;
 } else {
 fprintf(stderr, Unknown profile '%s'\n, profile);
 -   return 1;
 +   return -ENOENT;
 }

 return 0;
 @@ -68,12 +68,14 @@ static int parse_profiles(char *profiles, u64 *flags)
  {
 char *this_char;
 char *save_ptr = NULL; /* Satisfy static checkers */
 +   int ret;

 for (this_char = strtok_r(profiles, |, save_ptr);
  this_char != NULL;
  this_char = strtok_r(NULL, |, save_ptr)) {
 -   if (parse_one_profile(this_char, flags))
 -   return 1;
 +   ret = parse_one_profile(this_char, flags);
 +   if (ret)
 +   return ret;
 }

 return 0;
 @@ -86,7 +88,7 @@ static int parse_u64(const char *str, u64 *result)

 val = strtoull(str, endptr, 10);
 if (*endptr)
 -   return 1;
 +   return -EINVAL;

 *result = val;
 return 0;
 @@ -95,6 +97,7 @@ static int parse_u64(const char *str, u64 *result)
  static int parse_range(const char *range, u64 *start, u64 *end)
  {
 char *dots;
 +   int ret;

 dots = strstr(range, ..);
 if (dots) {
 @@ -107,29 +110,31 @@ static int parse_range(const char *range, u64 *start, 
 u64 *end)
 *end = (u64)-1;
 skipped++;
 } else {
 -   if (parse_u64(rest, end))
 -   return 1;
 +   ret = parse_u64(rest, end);
 +   if (ret)
 +   return ret;
 }
 if (dots == range) {
 *start = 0;
 skipped++;
 } else {
 +   ret = parse_u64(rest, end);
 if (parse_u64(range, start))
 -   return 1;
 +   return ret;
 }

 if (*start = *end) {
 fprintf(stderr, Range %llu..%llu doesn't make 
 sense\n, (unsigned long long)*start,
 (unsigned long long)*end);
 -   return 1;
 +   return -EINVAL;
 }

 if (skipped = 1)
 return 0;
 }

 -   return 1;
 +   return -EINVAL;
  }

  static int parse_filters(char *filters, struct btrfs_balance_args *args)
 @@ -137,6 +142,7 @@ static int parse_filters(char *filters, struct 
 btrfs_balance_args *args)
 char *this_char;
 char *value;
 char *save_ptr = NULL; /* Satisfy static checkers */
 +   int ret = 0;

 if (!filters)
 return 0;
 @@ -150,70 +156,72 @@ static int parse_filters(char *filters, struct 
 btrfs_balance_args *args)
 if (!value || !*value) {
 fprintf(stderr, the profiles filter requires 
 
an argument\n);
 -   return 1;
 +   return -EINVAL;
 }
 if (parse_profiles(value, args-profiles)) {
 fprintf(stderr, Invalid profiles 
 argument\n);
 -   return 1;
 +   return -EINVAL;
 }
 args-flags |= BTRFS_BALANCE_ARGS_PROFILES;
 } else if (!strcmp(this_char, usage)) {
 if (!value || !*value) {
 fprintf(stderr, the usage filter requires 
an argument\n);
 -   return 1;
 +   return -EINVAL;
 }
 if (parse_u64(value, args-usage) ||
 args-usage  100) {
 fprintf(stderr, 

[PATCH] Btrfs: do not add replace target to the alloc_list

2013-09-01 Thread Ilya Dryomov
If replace was suspended by the umount, replace target device is added
to the fs_devices-alloc_list during a later mount.  This is obviously
wrong.  -is_tgtdev_for_dev_replace is supposed to guard against that,
but -is_tgtdev_for_dev_replace is (and can only ever be) initialized
*after* everything is opened and fs_devices lists are populated.  Fix
this by checking the devid instead: for replace targets it's always
equal to BTRFS_DEV_REPLACE_DEVID.

Cc: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
At first I thought this was caused by my btrfs_device rollback patch,
but no, it just made it easier to spot -- previously one had to reboot
or rmmod/insmod before mounting a suspended replace.

 fs/btrfs/volumes.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c9a0977..5b99f19 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -793,7 +793,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
fs_devices-rotating = 1;
 
fs_devices-open_devices++;
-   if (device-writeable  !device-is_tgtdev_for_dev_replace) {
+   if (device-writeable 
+   device-devid != BTRFS_DEV_REPLACE_DEVID) {
fs_devices-rw_devices++;
list_add(device-dev_alloc_list,
 fs_devices-alloc_list);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: stop refusing the relocation of chunk 0

2013-08-27 Thread Ilya Dryomov
AFAICT chunk 0 is no longer special, and so it should be restriped just
like every other chunk.  One reason for this change is us refusing the
relocation can lead to filesystems that can only be mounted ro, and
never rw -- see the bugzilla [1] for details.  The other reason is that
device removal code is already doing this: it will happily relocate
chunk 0 is part of shrinking the device.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=60594

Reported-by: Xavier Bassery xav...@bartica.org
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c9a0977..e8325de 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2991,10 +2991,6 @@ again:
if (found_key.objectid != key.objectid)
break;
 
-   /* chunk zero is special */
-   if (found_key.offset == 0)
-   break;
-
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
 
if (!counting) {
@@ -3030,6 +3026,8 @@ again:
spin_unlock(fs_info-balance_lock);
}
 loop:
+   if (found_key.offset == 0)
+   break;
key.offset = found_key.offset - 1;
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Btrfs: add btrfs_alloc_device and switch to it

2013-08-23 Thread Ilya Dryomov
On Fri, Aug 23, 2013 at 10:35 AM, Stefan Behrens
sbehr...@giantdisaster.de wrote:
 This WARN_ON(1) is triggered with the device replace procedure because
 BTRFS_DEV_REPLACE_DEVID is zero.

Ah, a dreaded 0 in C.  What a screw up.  This came from my rebuild
branch where btrfs_init_dev_replace_tgtdev is a bit different, and I
just auto-merged it.  I'll fix it up.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] Btrfs: add btrfs_alloc_device and switch to it

2013-08-23 Thread Ilya Dryomov
Currently btrfs_device is allocated ad-hoc in a few different places,
and as a result not all fields are initialized properly.  In particular,
readahead state is only initialized in device_list_add (at scan time),
and not in btrfs_init_new_device (when the new device is added with
'btrfs dev add').  Fix this by adding an allocation helper and switch
everybody but __btrfs_close_devices to it.  (__btrfs_close_devices is
dealt with in a later commit.)

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
v2:
- fixed a dev-replace regression spotted by Stefan Behrens

 fs/btrfs/volumes.c |  151 
 fs/btrfs/volumes.h |3 ++
 2 files changed, 97 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ae1bcb0..100c21c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -101,6 +101,27 @@ void btrfs_cleanup_fs_uuids(void)
}
 }
 
+static struct btrfs_device *__alloc_device(void)
+{
+   struct btrfs_device *dev;
+
+   dev = kzalloc(sizeof(*dev), GFP_NOFS);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   INIT_LIST_HEAD(dev-dev_list);
+   INIT_LIST_HEAD(dev-dev_alloc_list);
+
+   spin_lock_init(dev-io_lock);
+
+   spin_lock_init(dev-reada_lock);
+   atomic_set(dev-reada_in_flight, 0);
+   INIT_RADIX_TREE(dev-reada_zones, GFP_NOFS  ~__GFP_WAIT);
+   INIT_RADIX_TREE(dev-reada_extents, GFP_NOFS  ~__GFP_WAIT);
+
+   return dev;
+}
+
 static noinline struct btrfs_device *__find_device(struct list_head *head,
   u64 devid, u8 *uuid)
 {
@@ -414,17 +435,12 @@ static noinline int device_list_add(const char *path,
if (fs_devices-opened)
return -EBUSY;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device) {
+   device = btrfs_alloc_device(NULL, devid,
+   disk_super-dev_item.uuid);
+   if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   return -ENOMEM;
+   return PTR_ERR(device);
}
-   device-devid = devid;
-   device-dev_stats_valid = 0;
-   device-work.func = pending_bios_fn;
-   memcpy(device-uuid, disk_super-dev_item.uuid,
-  BTRFS_UUID_SIZE);
-   spin_lock_init(device-io_lock);
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
@@ -432,15 +448,6 @@ static noinline int device_list_add(const char *path,
return -ENOMEM;
}
rcu_assign_pointer(device-name, name);
-   INIT_LIST_HEAD(device-dev_alloc_list);
-
-   /* init readahead state */
-   spin_lock_init(device-reada_lock);
-   device-reada_curr_zone = NULL;
-   atomic_set(device-reada_in_flight, 0);
-   device-reada_next = 0;
-   INIT_RADIX_TREE(device-reada_zones, GFP_NOFS  ~__GFP_WAIT);
-   INIT_RADIX_TREE(device-reada_extents, GFP_NOFS  ~__GFP_WAIT);
 
mutex_lock(fs_devices-device_list_mutex);
list_add_rcu(device-dev_list, fs_devices-devices);
@@ -491,8 +498,9 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
list_for_each_entry(orig_dev, orig-devices, dev_list) {
struct rcu_string *name;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device)
+   device = btrfs_alloc_device(NULL, orig_dev-devid,
+   orig_dev-uuid);
+   if (IS_ERR(device))
goto error;
 
/*
@@ -506,13 +514,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
}
rcu_assign_pointer(device-name, name);
 
-   device-devid = orig_dev-devid;
-   device-work.func = pending_bios_fn;
-   memcpy(device-uuid, orig_dev-uuid, sizeof(device-uuid));
-   spin_lock_init(device-io_lock);
-   INIT_LIST_HEAD(device-dev_list);
-   INIT_LIST_HEAD(device-dev_alloc_list);
-
list_add(device-dev_list, fs_devices-devices);
device-fs_devices = fs_devices;
fs_devices-num_devices++;
@@ -1956,10 +1957,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
mutex_unlock(root-fs_info-fs_devices-device_list_mutex);
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device) {
+   device = btrfs_alloc_device(root-fs_info, NULL, NULL);
+   if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   ret = -ENOMEM

[PATCH 4/4] Btrfs: rollback btrfs_device fields on umount

2013-08-12 Thread Ilya Dryomov
It turns out we don't properly rollback in-core btrfs_device state on
umount.  We zero out -bdev, -in_fs_metadata and that's about it.  In
particular, we don't zero out -generation, and this can lead to us
refusing a mount -- a non-NULL fs_devices-latest_bdev is essential, but
btrfs_close_extra_devices will happily assign NULL to -latest_bdev if
the first device on the dev_list happens to be missing and consequently
has no bdev attached.  This happens because since commit a6b0d5c8
btrfs_close_extra_devices adjusts -latest_bdev, and in doing that,
relies on the -generation.  Fix this, and possibly other problems, by
zeroing out everything except for what device_list_add sets, so that a
mount right after insmod and 'btrfs dev scan' is no different from any
later mount in this respect.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index af17b17..2f6bc12 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -673,22 +673,19 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
if (device-can_discard)
fs_devices-num_can_discard--;
 
-   new_device = kmalloc(sizeof(*new_device), GFP_NOFS);
-   BUG_ON(!new_device); /* -ENOMEM */
-   memcpy(new_device, device, sizeof(*new_device));
+   new_device = btrfs_alloc_device(NULL, device-devid,
+   device-uuid);
+   BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
 
/* Safe because we are under uuid_mutex */
if (device-name) {
name = rcu_string_strdup(device-name-str, GFP_NOFS);
-   BUG_ON(device-name  !name); /* -ENOMEM */
+   BUG_ON(!name); /* -ENOMEM */
rcu_assign_pointer(new_device-name, name);
}
-   new_device-bdev = NULL;
-   new_device-writeable = 0;
-   new_device-in_fs_metadata = 0;
-   new_device-can_discard = 0;
-   spin_lock_init(new_device-io_lock);
+
list_replace_rcu(device-dev_list, new_device-dev_list);
+   new_device-fs_devices = device-fs_devices;
 
call_rcu(device-rcu, free_device);
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Btrfs: add btrfs_alloc_device and switch to it

2013-08-12 Thread Ilya Dryomov
Currently btrfs_device is allocated ad-hoc in a few different places,
and as a result not all fields are initialized properly.  In particular,
readahead state is only initialized in device_list_add (at scan time),
and not in btrfs_init_new_device (when the new device is added with
'btrfs dev add').  Fix this by adding an allocation helper and switch
everybody but __btrfs_close_devices to it.  (__btrfs_close_devices is
dealt with in a later commit.)

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |  150 
 fs/btrfs/volumes.h |3 ++
 2 files changed, 96 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ae1bcb0..fe52704 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -101,6 +101,27 @@ void btrfs_cleanup_fs_uuids(void)
}
 }
 
+static struct btrfs_device *__alloc_device(void)
+{
+   struct btrfs_device *dev;
+
+   dev = kzalloc(sizeof(*dev), GFP_NOFS);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   INIT_LIST_HEAD(dev-dev_list);
+   INIT_LIST_HEAD(dev-dev_alloc_list);
+
+   spin_lock_init(dev-io_lock);
+
+   spin_lock_init(dev-reada_lock);
+   atomic_set(dev-reada_in_flight, 0);
+   INIT_RADIX_TREE(dev-reada_zones, GFP_NOFS  ~__GFP_WAIT);
+   INIT_RADIX_TREE(dev-reada_extents, GFP_NOFS  ~__GFP_WAIT);
+
+   return dev;
+}
+
 static noinline struct btrfs_device *__find_device(struct list_head *head,
   u64 devid, u8 *uuid)
 {
@@ -414,17 +435,12 @@ static noinline int device_list_add(const char *path,
if (fs_devices-opened)
return -EBUSY;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device) {
+   device = btrfs_alloc_device(NULL, devid,
+   disk_super-dev_item.uuid);
+   if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   return -ENOMEM;
+   return PTR_ERR(device);
}
-   device-devid = devid;
-   device-dev_stats_valid = 0;
-   device-work.func = pending_bios_fn;
-   memcpy(device-uuid, disk_super-dev_item.uuid,
-  BTRFS_UUID_SIZE);
-   spin_lock_init(device-io_lock);
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
@@ -432,15 +448,6 @@ static noinline int device_list_add(const char *path,
return -ENOMEM;
}
rcu_assign_pointer(device-name, name);
-   INIT_LIST_HEAD(device-dev_alloc_list);
-
-   /* init readahead state */
-   spin_lock_init(device-reada_lock);
-   device-reada_curr_zone = NULL;
-   atomic_set(device-reada_in_flight, 0);
-   device-reada_next = 0;
-   INIT_RADIX_TREE(device-reada_zones, GFP_NOFS  ~__GFP_WAIT);
-   INIT_RADIX_TREE(device-reada_extents, GFP_NOFS  ~__GFP_WAIT);
 
mutex_lock(fs_devices-device_list_mutex);
list_add_rcu(device-dev_list, fs_devices-devices);
@@ -491,8 +498,9 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
list_for_each_entry(orig_dev, orig-devices, dev_list) {
struct rcu_string *name;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device)
+   device = btrfs_alloc_device(NULL, orig_dev-devid,
+   orig_dev-uuid);
+   if (IS_ERR(device))
goto error;
 
/*
@@ -506,13 +514,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
}
rcu_assign_pointer(device-name, name);
 
-   device-devid = orig_dev-devid;
-   device-work.func = pending_bios_fn;
-   memcpy(device-uuid, orig_dev-uuid, sizeof(device-uuid));
-   spin_lock_init(device-io_lock);
-   INIT_LIST_HEAD(device-dev_list);
-   INIT_LIST_HEAD(device-dev_alloc_list);
-
list_add(device-dev_list, fs_devices-devices);
device-fs_devices = fs_devices;
fs_devices-num_devices++;
@@ -1956,10 +1957,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
mutex_unlock(root-fs_info-fs_devices-device_list_mutex);
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device) {
+   device = btrfs_alloc_device(root-fs_info, NULL, NULL);
+   if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   ret = -ENOMEM;
+   ret = PTR_ERR(device);
goto error

[PATCH 3/4] Btrfs: add alloc_fs_devices and switch to it

2013-08-12 Thread Ilya Dryomov
In the spirit of btrfs_alloc_device, add a helper for allocating and
doing some common initialization of btrfs_fs_devices struct.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   71 +++-
 1 file changed, 53 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fe52704..af17b17 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -62,6 +62,48 @@ static void unlock_chunks(struct btrfs_root *root)
mutex_unlock(root-fs_info-chunk_mutex);
 }
 
+static struct btrfs_fs_devices *__alloc_fs_devices(void)
+{
+   struct btrfs_fs_devices *fs_devs;
+
+   fs_devs = kzalloc(sizeof(*fs_devs), GFP_NOFS);
+   if (!fs_devs)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_init(fs_devs-device_list_mutex);
+
+   INIT_LIST_HEAD(fs_devs-devices);
+   INIT_LIST_HEAD(fs_devs-alloc_list);
+   INIT_LIST_HEAD(fs_devs-list);
+
+   return fs_devs;
+}
+
+/**
+ * alloc_fs_devices - allocate struct btrfs_fs_devices
+ * @fsid:  a pointer to UUID for this FS.  If NULL a new UUID is
+ * generated.
+ *
+ * Return: a pointer to a new struct btrfs_fs_devices on success;
+ * ERR_PTR() on error.  Returned struct is not linked onto any lists and
+ * can be destroyed with kfree() right away.
+ */
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+{
+   struct btrfs_fs_devices *fs_devs;
+
+   fs_devs = __alloc_fs_devices();
+   if (IS_ERR(fs_devs))
+   return fs_devs;
+
+   if (fsid)
+   memcpy(fs_devs-fsid, fsid, BTRFS_FSID_SIZE);
+   else
+   generate_random_uuid(fs_devs-fsid);
+
+   return fs_devs;
+}
+
 static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 {
struct btrfs_device *device;
@@ -416,16 +458,14 @@ static noinline int device_list_add(const char *path,
 
fs_devices = find_fsid(disk_super-fsid);
if (!fs_devices) {
-   fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
-   if (!fs_devices)
-   return -ENOMEM;
-   INIT_LIST_HEAD(fs_devices-devices);
-   INIT_LIST_HEAD(fs_devices-alloc_list);
+   fs_devices = alloc_fs_devices(disk_super-fsid);
+   if (IS_ERR(fs_devices))
+   return PTR_ERR(fs_devices);
+
list_add(fs_devices-list, fs_uuids);
-   memcpy(fs_devices-fsid, disk_super-fsid, BTRFS_FSID_SIZE);
fs_devices-latest_devid = devid;
fs_devices-latest_trans = found_transid;
-   mutex_init(fs_devices-device_list_mutex);
+
device = NULL;
} else {
device = __find_device(fs_devices-devices, devid,
@@ -481,18 +521,13 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
struct btrfs_device *device;
struct btrfs_device *orig_dev;
 
-   fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
-   if (!fs_devices)
-   return ERR_PTR(-ENOMEM);
+   fs_devices = alloc_fs_devices(orig-fsid);
+   if (IS_ERR(fs_devices))
+   return fs_devices;
 
-   INIT_LIST_HEAD(fs_devices-devices);
-   INIT_LIST_HEAD(fs_devices-alloc_list);
-   INIT_LIST_HEAD(fs_devices-list);
-   mutex_init(fs_devices-device_list_mutex);
fs_devices-latest_devid = orig-latest_devid;
fs_devices-latest_trans = orig-latest_trans;
fs_devices-total_devices = orig-total_devices;
-   memcpy(fs_devices-fsid, orig-fsid, sizeof(fs_devices-fsid));
 
/* We have held the volume lock, it is safe to get the devices. */
list_for_each_entry(orig_dev, orig-devices, dev_list) {
@@ -1794,9 +1829,9 @@ static int btrfs_prepare_sprout(struct btrfs_root *root)
if (!fs_devices-seeding)
return -EINVAL;
 
-   seed_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
-   if (!seed_devices)
-   return -ENOMEM;
+   seed_devices = __alloc_fs_devices();
+   if (IS_ERR(seed_devices))
+   return PTR_ERR(seed_devices);
 
old_devices = clone_fs_devices(fs_devices);
if (IS_ERR(old_devices)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] btrfs_device fixes

2013-08-12 Thread Ilya Dryomov
Hello,

This patch series does two things:

- adds allocation helpers for struct btrfs_device and struct
  btrfs_fs_devices so that all the list_heads, spinlocks, etc are
  properly initialized and the code for that is in one place;

- fixes a bug in the umount sequence, which, under certain conditions,
  can lead to mount failures.

This is on top of 3.11-rc3, but applies cleanly to btrfs-next/master as
of now.  You can pull from:

git://github.com/idryomov/btrfs-unstable.git dev-fixes
 
Thanks,

Ilya


Ilya Dryomov (4):
  Btrfs: find_next_devid: root - fs_info
  Btrfs: add btrfs_alloc_device and switch to it
  Btrfs: add alloc_fs_devices and switch to it
  Btrfs: rollback btrfs_device fields on umount

 fs/btrfs/volumes.c |  250 +---
 fs/btrfs/volumes.h |3 +
 2 files changed, 162 insertions(+), 91 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Btrfs: find_next_devid: root - fs_info

2013-08-12 Thread Ilya Dryomov
find_next_devid() knows which root to search, so it should take an
fs_info instead of an arbitrary root.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 78b8717..ae1bcb0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1307,15 +1307,14 @@ static u64 find_next_chunk(struct btrfs_fs_info 
*fs_info)
return ret;
 }
 
-static noinline int find_next_devid(struct btrfs_root *root, u64 *objectid)
+static noinline int find_next_devid(struct btrfs_fs_info *fs_info,
+   u64 *devid_ret)
 {
int ret;
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_path *path;
 
-   root = root-fs_info-chunk_root;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -1324,20 +1323,21 @@ static noinline int find_next_devid(struct btrfs_root 
*root, u64 *objectid)
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = (u64)-1;
 
-   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+   ret = btrfs_search_slot(NULL, fs_info-chunk_root, key, path, 0, 0);
if (ret  0)
goto error;
 
BUG_ON(ret == 0); /* Corruption */
 
-   ret = btrfs_previous_item(root, path, BTRFS_DEV_ITEMS_OBJECTID,
+   ret = btrfs_previous_item(fs_info-chunk_root, path,
+ BTRFS_DEV_ITEMS_OBJECTID,
  BTRFS_DEV_ITEM_KEY);
if (ret) {
-   *objectid = 1;
+   *devid_ret = 1;
} else {
btrfs_item_key_to_cpu(path-nodes[0], found_key,
  path-slots[0]);
-   *objectid = found_key.offset + 1;
+   *devid_ret = found_key.offset + 1;
}
ret = 0;
 error:
@@ -1971,7 +1971,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
rcu_assign_pointer(device-name, name);
 
-   ret = find_next_devid(root, device-devid);
+   ret = find_next_devid(root-fs_info, device-devid);
if (ret) {
rcu_string_free(device-name);
kfree(device);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: num_tolerated_disk_barrier_failures is incorrect for RAID6

2013-07-10 Thread Ilya Dryomov
Currently num_tolerated_disk_barrier_failures gets the value of
fs_devices-num_devices in the RAID6 case.  But, RAID6 can tolerate only
two simultaneous failures, so set it to 2.

CC: Stefan Behrens sbehr...@giantdisaster.de
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/disk-io.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b8b60b6..aecf788 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3258,7 +3258,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
BTRFS_BLOCK_GROUP_RAID10)) {

num_tolerated_disk_barrier_failures = 1;
} else if (flags 
-  BTRFS_BLOCK_GROUP_RAID5) {
+  BTRFS_BLOCK_GROUP_RAID6) {

num_tolerated_disk_barrier_failures = 2;
}
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance resume + raid5/6

2013-06-22 Thread Ilya Dryomov
On Sun, Jun 16, 2013 at 06:00:10AM +, Tim Castle wrote:
 Greetings! 
 
 I'm testing raid6, and recently added two drives. 
 I haven't been able to properly resume a balance operation: the number of
 total chunks is always too low.  
 
 It seems that the balance starts and pauses properly, but always resumes
 with ~7 chunks. 

Hi Tim,

This is the expected behaviour.  Balance does not actually remember
which chunks it had processed before it was paused, instead it uses a
heuristic which is based on the amount of space used out of each chunk.
As you have an almost full filesystem, all of your chunks, except for
those ~7, are fully (or almost fully) used, and balance resume simply
skips them.  If you really want to balance everything (why?) you would
have to run balance to completion, w/o pausing it.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] How to enable btrfs reserve block space from specific device?

2013-03-27 Thread Ilya Dryomov
On Wed, Mar 27, 2013 at 09:45:59AM +0800, Zhi Yong Wu wrote:
 HI,
 
 When i work on btrfs hot relocation feature, i hit one question
 about block reservation. btrfs hot relocation need to reserve block
 space from specific devices such as SSD, but current btrfs reserving
 code doesn't differentiate if block space is reserved from SSD or HDD.
 In order to make btrfs support this, I thought that we can introduce
 one new block group or flag, but this will maybe make large impact on
 current existing other functions. For this, does any guy have some
 better idea? thanks.

Hi,

Sorry for the late reply.  I have a lot on my plate right now, so I
haven't looked at your WIP code.  However, based on my knowledge, I can
tell you that block reservation problem is completely orthogonal to the
way you choose to handle hot data storage.  A separate block group is
one way to do it, and probably the easiest one.  Block groups - chunks
is a 1:1 mapping, so, for now, it might make sense to have one giant
block group spanning over the entire hot data device.  OTOH, IIRC the
original implementation from IBM just used the rotation flag to detect
hot data devices.  You have a lot of choices here, but, if you are
planning on implementing mirrored/striped hot data devices in future,
I'd definitely go with block groups, since that'll give you some of the
infrastructure for that for free.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: document mount options in Documentation/fs/btrfs.txt

2013-03-23 Thread Ilya Dryomov
On Sat, Mar 23, 2013 at 12:48:54PM -0500, Eric Sandeen wrote:
 Document all current btrfs mount options.
 
 Signed-off-by: Eric Sandeen sand...@redhat.com
 ---
 
 please, Please, PLEASE review this and suggest improvements.
 I'm no btrfs wizard but I've done my best to get this all right
 based on commit logs, code reading, and wiki reading.  In cases where
 any of those 3 disagreed, I've done my best to document reality, but I'm
 sure it could use corrections, clarifications, etc.
 
 In particular, some of these mount options could really use more
 description of why a user might want them, rather than simply stating
 what they do.
 
 notreelog is particularly egregious, as I have nothing but self-referential
 documentation.
 
 So this can use some iteration, I'm sure, but hopefully it's a decent
 start.
 
 Thanks,
 -Eric
 
 diff --git a/Documentation/filesystems/btrfs.txt 
 b/Documentation/filesystems/btrfs.txt
 index 7671352..02a19c8 100644
 --- a/Documentation/filesystems/btrfs.txt
 +++ b/Documentation/filesystems/btrfs.txt
 @@ -1,6 +1,6 @@
  
 - BTRFS
 - =
 +BTRFS
 +=
  
  Btrfs is a new copy on write filesystem for Linux aimed at
  implementing advanced features while focusing on fault tolerance,
 @@ -34,9 +34,173 @@ The main Btrfs features include:
  * Online filesystem defragmentation
  
  
 -
 - MAILING LIST
 - 
 +Mount Options
 +=
 +
 +When mounting a btrfs filesystem, the following option are accepted.
 +Unless otherwise specified, all options default to off.
 +
 +  alloc_start=bytes
 + Debugging option to force all block allocations above a threshold.
 + The value is specified in bytes, optionally with a K, M, or G suffix,
 + case insensitive.  Default is 1MB.
 +
 +  autodefrag
 + Detect small random writes into files and queue them up for the
 + defrag process.  Works best for small files; Not well suited for
 + large database workloads.
 +
 +  check_int
 +  check_int_data
 +  check_int_print_mask=value
 + These debugging options control the behavior of the integrity checking
 + module (the BTRFS_FS_CHECK_INTEGRITY config option required).
 +
 + check_int enables the integrity checker module, which examines all
 + block write requests to ensure on-disk consistency, at a large
 + memory and CPU cost.  
 +
 + check_int_data includes extent data in the integrity checks, and
 + implies the check_int option.
 +
 + check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
 + as defined in fs/btrfs/check-integrity.c, to control the integrity
 + checker module behavior.
 +
 + See comments at the top of fs/btrfs/check-integrity.c for more info.
 +
 +  compress
 +  compress=type
 +  compress-force
 +  compress-force=type
 + Control BTRFS file data compression.  Type may be specified as zlib
 + lzo or no (for no compression, used for remounting).  If no type
 + is specified, zlib is used.  If compress-force is specified,
 + all files will be compressed, whether or not they compress well.
 + If compression is enabled, nodatacow and nodatasum are disabled.
 +
 +  degraded
 + Allow mounts to continue with missing devices.  A read-write mount may
 + fail with too many devices missing, for example if a stripe member
 + is completely missing.
 +
 +  device=devicepath
 + Specify a device during mount so that ioctls on the control device
 + can be avoided.  Especialy useful when trying to mount a multi-device
 + setup as root.  May be specified multiple times for multiple devices.
 +
 +  discard
 + Issue command to let the block device reclaim space freed by the
 + filesystem.  This is useful for SSD devices, thinly provisioned
 + LUNs and virtual machine images, but may have a performance
 + impact.
 +
 +  enospc_debug
 + Debugging option to be more verbose in some ENOSPC conditions.
 +
 +  fatal_errors=action
 + Action to take when encountering a fatal error: 
 +   bug - BUG() on a fatal error.  This is the default.
 +   panic - panic() on a fatal error.
 +
 +  flushoncommit
 + The 'flushoncommit' mount option forces any data dirtied by a write in a
 + prior transaction to commit as part of the current commit.  This makes
 + the committed state a fully consistent view of the file system from the
 + application's perspective (i.e., it includes all completed file system
 + operations).  This was previously the behavior only when a snapshot is
 + created.
 +
 +  inode_cache
 + Enable free inode number caching.   Defaults to off due to an overflow
 + problem when the free space crcs don't fit inside a single page.
 +
 +  max_inline=bytes
 + Specify the maximum amount of space, in bytes, that can be inlined in
 + a metadata B-tree leaf.  The value is specified in bytes, optionally 
 + with a K, M, or G suffix, case insensitive.  In practice, this value
 + is 

[PATCH] Btrfs: fix a mismerge in btrfs_balance()

2013-03-06 Thread Ilya Dryomov
Raid56 merge (merge commit e942f88) had mistakenly removed a call to
__cancel_balance(), which resulted in balance not cleaning up after itself
after a successful finish.  (Cleanup includes switching the state, removing
the balance item and releasing mut_ex_op testnset lock.)  Bring it back.

Reported-by: David Sterba dste...@suse.cz
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 35bb2d4..33440da 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3230,6 +3230,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
update_ioctl_balance_args(fs_info, 0, bargs);
}
 
+   if ((ret  ret != -ECANCELED  ret != -ENOSPC) ||
+   balance_need_close(fs_info)) {
+   __cancel_balance(fs_info);
+   }
+
wake_up(fs_info-balance_wait_q);
 
return ret;
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: clean snapshots one by one

2013-03-06 Thread Ilya Dryomov
On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote:
 The mail did not make it to the list on Monday, anyway, now I have a
 few testing results to share.
 
 The umount responsiveness is much improved, it took a few seconds when
 there were many roots to clean scheduled.
 
 With
 
 $ echo 'func btrfs_drop_snapshot +pf'  
 /sys/debug/kernel/dynamic_debug/control
 $ echo 'func btrfs_clean_one_deleted_snapshot +pf'  
 /sys/debug/kernel/dynamic_debug/control
 
 one can watch how it proceeds.
 
 After umount and and mount, the cleaning process starts again, after 30
 seconds when transaction commit triggers.
 
 Next test was to let it run with balance (just 'fi balance', no other
 options). Worked well, but I've hit an oops from balance cancel command that
 was triggered sometime during the balance. According to the log it happened
 right after the balance finished. CCing Ilya, I'm not sure if this is somehow
 induced by the patch.
 
 3407 BUG_ON(fs_info-balance_ctl || 
 atomic_read(fs_info-balance_running));

Were you by any chance running this on top of a for-linus that included
a raid56 merge commit?  If so, this is result of a raid56 mismerge --
fs_info-balance_ctl is supposed to be cleared in __cancel_balance(), a
call to which was accidentally nuked.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: return EPERM in btrfs_rm_device()

2013-03-02 Thread Ilya Dryomov
On Sat, Mar 02, 2013 at 12:13:59AM -0700, Jerry Snitselaar wrote:
 Currently there are error paths in btrfs_rm_device() where EINVAL is
 returned telling the user they passed an invalid argument even though
 they passed a valid device. Change to return EPERM instead as the
 operation is not permitted.
 
 Signed-off-by: Jerry Snitselaar jerry.snitsel...@oracle.com
 ---
  fs/btrfs/volumes.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 5cbb7f4..3e1586c 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -1392,14 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char 
 *device_path)
   if ((all_avail  BTRFS_BLOCK_GROUP_RAID10)  num_devices = 4) {
   printk(KERN_ERR btrfs: unable to go below four devices 
  on raid10\n);
 - ret = -EINVAL;
 + ret = -EPERM;
   goto out;
   }
  
   if ((all_avail  BTRFS_BLOCK_GROUP_RAID1)  num_devices = 2) {
   printk(KERN_ERR btrfs: unable to go below two 
  devices on raid1\n);
 - ret = -EINVAL;
 + ret = -EPERM;
   goto out;
   }
  
 @@ -1449,14 +1449,14 @@ int btrfs_rm_device(struct btrfs_root *root, char 
 *device_path)
  
   if (device-is_tgtdev_for_dev_replace) {
   pr_err(btrfs: unable to remove the dev_replace target dev\n);
 - ret = -EINVAL;
 + ret = -EPERM;
   goto error_brelse;
   }
  
   if (device-writeable  root-fs_info-fs_devices-rw_devices == 1) {
   printk(KERN_ERR btrfs: unable to remove the only writeable 
  device\n);
 - ret = -EINVAL;
 + ret = -EPERM;

I don't think returning EPERM in these cases is any better than EINVAL.
FWIW, many other btrfs ioctls, especially balance, suffer from this as
well.  What we really need is some kind of error message delivery
system, but that's not going to happen any time soon...

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep

2013-02-26 Thread Ilya Dryomov
On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote:
 The coverity runs had a false positive complaining that save_ptr
 is uninitialized in the call to strtok_r.
 
 We could initialize it, but Zach points out that just using
 strsep is a lot simpler if there's only one delimiter,
 so just switch to that.
 
 Signed-off-by: Eric Sandeen sand...@redhat.com
 ---
 
 V2: Remove accidentally-added debug printfs, thanks Geoffredo!
 
 diff --git a/cmds-balance.c b/cmds-balance.c
 index b671e1d..cfbb8eb 100644
 --- a/cmds-balance.c
 +++ b/cmds-balance.c
 @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 
 *flags)
  static int parse_profiles(char *profiles, u64 *flags)
  {
   char *this_char;
 - char *save_ptr;
  
 - for (this_char = strtok_r(profiles, |, save_ptr);
 -  this_char != NULL;
 -  this_char = strtok_r(NULL, |, save_ptr)) {
 + while ((this_char = strsep(profiles, |))) {
   if (parse_one_profile(this_char, flags))
   return 1;
   }
 @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct 
 btrfs_balance_args *args)
  {
   char *this_char;
   char *value;
 - char *save_ptr;
  
   if (!filters)
   return 0;
  
 - for (this_char = strtok_r(filters, ,, save_ptr);
 -  this_char != NULL;
 -  this_char = strtok_r(NULL, ,, save_ptr)) {
 + while ((this_char = strsep(filters , ,))) {

  ^^^ whitespace


One of the differences between strtok() and strsep() is that the former
allows multiple delimiters between two tokens.  With strsep(), this

btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt

fails with error, whereas with strtok() it passes.  I don't have a
strong opinion here (this has been loosely modeled on the way mount(8)
handles -o options), but might it be better to just initialize save_ptr?
(And yes, I know that strsep() is better ;))

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep

2013-02-26 Thread Ilya Dryomov
On Tue, Feb 26, 2013 at 02:46:30PM -0600, Eric Sandeen wrote:
 On 2/26/13 2:40 PM, Ilya Dryomov wrote:
  On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote:
  The coverity runs had a false positive complaining that save_ptr
  is uninitialized in the call to strtok_r.
 
  We could initialize it, but Zach points out that just using
  strsep is a lot simpler if there's only one delimiter,
  so just switch to that.
 
  Signed-off-by: Eric Sandeen sand...@redhat.com
  ---
 
  V2: Remove accidentally-added debug printfs, thanks Geoffredo!
 
  diff --git a/cmds-balance.c b/cmds-balance.c
  index b671e1d..cfbb8eb 100644
  --- a/cmds-balance.c
  +++ b/cmds-balance.c
  @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 
  *flags)
   static int parse_profiles(char *profiles, u64 *flags)
   {
 char *this_char;
  -  char *save_ptr;
   
  -  for (this_char = strtok_r(profiles, |, save_ptr);
  -   this_char != NULL;
  -   this_char = strtok_r(NULL, |, save_ptr)) {
  +  while ((this_char = strsep(profiles, |))) {
 if (parse_one_profile(this_char, flags))
 return 1;
 }
  @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct 
  btrfs_balance_args *args)
   {
 char *this_char;
 char *value;
  -  char *save_ptr;
   
 if (!filters)
 return 0;
   
  -  for (this_char = strtok_r(filters, ,, save_ptr);
  -   this_char != NULL;
  -   this_char = strtok_r(NULL, ,, save_ptr)) {
  +  while ((this_char = strsep(filters , ,))) {
  
^^^ whitespace
  
  
  One of the differences between strtok() and strsep() is that the former
  allows multiple delimiters between two tokens.  With strsep(), this
  
  btrfs balance -dfoo1=bar1,,,foo2=bar2 mnt
  
  fails with error, whereas with strtok() it passes.  I don't have a
  strong opinion here (this has been loosely modeled on the way mount(8)
  handles -o options), but might it be better to just initialize save_ptr?
  (And yes, I know that strsep() is better ;))
 
 I don't really care much either way, TBH.  Initializing it seems a little
 bit magic, but with a comment as to why, it'd be fine.  If you did it this
 way intentionally to allow the above format, and changing it would break
 expectations, then I'll happily just initialize save_ptr.

Yeah, although I doubt anybody would notice, it's probably better to
keep the old behaviour.

 
 (And, I realize that lots of these changes are pedantic and seemingly
 pointless, but we've gotten the static checker errors down from over 100
 to under 30 and dropping; the more noise we remove the more likely we are
 to pay attention to the output and catch actual errors.  At least that's
 my feeling; if people think this is getting to be pointless churn, I'm
 ok with that, too).

Not at all.  Btrfs-progs had rarely seen any cleanups, with you and Zach
patching it up and David's integration work it has gotten a whole new
life ;)  BTW, huge thanks for the kernel-style build output, I am
forever grateful..

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: eliminate a use-after-free in btrfs_balance()

2013-02-12 Thread Ilya Dryomov
Commit 5af3e8cc introduced a use-after-free at volumes.c:3139: bctl is freed
above in __cancel_balance() in all cases except for balance pause.  Fix this
by moving the offending check a couple statements above, the meaning of the
check is preserved.

Reported-by: Chris Mason chris.ma...@fusionio.com
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 15f6efd..5d25aed 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3126,6 +3126,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
mutex_lock(fs_info-balance_mutex);
atomic_dec(fs_info-balance_running);
 
+   if (bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT) {
+   fs_info-num_tolerated_disk_barrier_failures =
+   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
+   }
+
if (bargs) {
memset(bargs, 0, sizeof(*bargs));
update_ioctl_balance_args(fs_info, 0, bargs);
@@ -3136,11 +3141,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
__cancel_balance(fs_info);
}
 
-   if (bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT) {
-   fs_info-num_tolerated_disk_barrier_failures =
-   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-   }
-
wake_up(fs_info-balance_wait_q);
 
return ret;
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: allow for selecting only completely empty chunks

2013-02-12 Thread Ilya Dryomov
Enhance balance usage filter by making it possible to balance out only
completely empty chunks.  Today, usage filter properly acts on values
from 1 to 99 inclusive, usage=100 selects all chunks, and usage=0
selects no chunks.  This commit changes the usage=0 case: the new
meaning is to restripe only completely empty chunks and nothing else.

Suggested-by: David Sterba dste...@suse.cz
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 15f6efd..7e41619 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2615,7 +2615,7 @@ static int chunk_usage_filter(struct btrfs_fs_info 
*fs_info, u64 chunk_offset,
chunk_used = btrfs_block_group_used(cache-item);
 
if (bargs-usage == 0)
-   user_thresh = 0;
+   user_thresh = 1;
else if (bargs-usage  100)
user_thresh = cache-key.offset;
else
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: accept zero for balance usage filter

2013-02-12 Thread Ilya Dryomov
On Mon, Feb 11, 2013 at 01:33:52PM +0100, David Sterba wrote:
 The condition can be relaxed to accept also 0 which will delete
 unoccupied chunks and does not need space for the actual data
 relocation. Until there is an automatic empty chunk reclaim, we
 can use this as a last resort option under enospc.
 
 CC: Ilya Dryomov idryo...@gmail.com
 Signed-off-by: David Sterba dste...@suse.cz

Josef, please don't pull this one into btrfs-next, it's been deprecated
by Btrfs: allow for selecting only completely empty chunks from me.
David and I discussed this on IRC, he is OK with it.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: make 0 a valid usage filter argument

2013-02-12 Thread Ilya Dryomov
This is a progs counterpart to a Btrfs: allow for selecting only
completely empty chunks.  usage=0 now means select only only
completely empty chunks and nothing else.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 cmds-balance.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cmds-balance.c b/cmds-balance.c
index 38a7426..1f888db 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -159,7 +159,7 @@ static int parse_filters(char *filters, struct 
btrfs_balance_args *args)
return 1;
}
if (parse_u64(value, args-usage) ||
-   args-usage  1 || args-usage  100) {
+   args-usage  100) {
fprintf(stderr, Invalid usage argument: %s\n,
   value);
return 1;
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: move crc32c optimization init

2013-02-12 Thread Ilya Dryomov
Don't call crc32c_optimization_init() until we know that a command is
actually going to be invoked.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 btrfs.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 687acec..7752bd6 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -261,8 +261,6 @@ int main(int argc, char **argv)
 {
const struct cmd_struct *cmd;
 
-   crc32c_optimization_init();
-
argc--;
argv++;
handle_options(argc, argv);
@@ -278,6 +276,8 @@ int main(int argc, char **argv)
 
handle_help_options_next_level(cmd, argc, argv);
 
+   crc32c_optimization_init();
+
fixup_argv0(argv, cmd-token);
exit(cmd-fn(argc, argv));
 }
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] Btrfs-progs: add the rescue section to btrfs

2013-02-08 Thread Ilya Dryomov
On Fri, Feb 08, 2013 at 01:37:02AM +0100, Ian Kumlien wrote:
 the btrfs command now lists:
 btrfs rescue select-super -s number device
 Select a superblock
 btrfs rescue dump-super device
 Dump a superblock to disk
 btrfs rescue debug-tree [options] device
 Debug the filesystem
 
 btrfs-dump-super.c was imported in to cmds-rescue-super-ops.c
 
 This patch integrates all the functionality...
 
 cmds-rescue.c is used to glue cmds-rescue-debug-tree.c,
 cmds-rescue-restore.c and cmds-rescue-super-ops.c together to
 make the source files more managable.

I think btrfs-debug-tree should go under debug -- btrfs debug dump-tree.
Same goes for btrfs-dump-super -- btrfs debug dump-super.  These
commands won't help an average user to rescue a single byte, they are
there to help developers.  I suppose you can also import btrfs-image
under btrfs debug dump-image.

This leaves the question of whether we want select-super under rescue.
Given that it can easily do more harm than good, it might not be the
best place for it..

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] include btrfsck in btrfs - including name check

2013-01-30 Thread Ilya Dryomov
On Wed, Jan 30, 2013 at 10:11:44PM +0100, Ian Kumlien wrote:
 On Wed, Jan 30, 2013 at 12:33:42PM -0800, Filipe Brandenburger wrote:
  Hi Ian,
  
  On Tue, Jan 29, 2013 at 3:03 PM, Ian Kumlien po...@demius.net wrote:
   This patch includes fsck as a subcommand of btrfs, but if you rename
   the binary to btrfsck (or, preferably, use a symlink) it will act like
   the old btrfs command.
  
  You can rename files in your git (there's git mv for that), only
  thing is when you generate the patch with format-patch (or git show,
  git diff etc.) pass it the -M option to detect moves and act
  appropriately.
 
 git send-email seems to send the full diff, diffing against /dev/null =P
 This is why i skipped that part.
 
  Regarding your patches, I really like the idea of btrfs fsck but I
  think I'd prefer to keep the external commands as wrapper scripts
  instead of adding busybox-style name detection to btrfs... But then,
  that's just my opinion.
 
 Well, now both works.
 
  I guess I would have a btrfsck that would simply contain:
  
  #! /bin/sh
  exec btrfs fsck $@
  
  Downside is that error reporting (e.g. invalid syntax, etc.) would
  show btrfs fsck instead of the command the user actually typed...
 
 Actually it still does, due to how btrfs handles things... It's a simple
 enough fix and it will make rescue cd's or dracut images, or just about
 anything.
 
 I understand your point, but i think this is a simpler solution =)

FWIW I agree with Filipe, this name detection thing looks ugly to me.
The merge itself is a good idea, but I think we should stick with shell
wrappers for everything else.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix a regression in balance usage filter

2013-01-21 Thread Ilya Dryomov
Commit 3fed40cc (Btrfs: cleanup duplicated division functions), which
was merged into 3.8-rc1, has introduced a regression by removing logic
that was guarding us against bad user input.  Bring it back.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 86279c3..b577aee 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2614,7 +2614,14 @@ static int chunk_usage_filter(struct btrfs_fs_info 
*fs_info, u64 chunk_offset,
cache = btrfs_lookup_block_group(fs_info, chunk_offset);
chunk_used = btrfs_block_group_used(cache-item);
 
-   user_thresh = div_factor_fine(cache-key.offset, bargs-usage);
+   if (bargs-usage == 0)
+   user_thresh = 0;
+   else if (bargs-usage  100)
+   user_thresh = cache-key.offset;
+   else
+   user_thresh = div_factor_fine(cache-key.offset,
+ bargs-usage);
+
if (chunk_used  user_thresh)
ret = 0;
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] Btrfs: fix mutually exclusive op is running error code

2013-01-20 Thread Ilya Dryomov
The error code that is returned in response to starting a mutually
exclusive operation when there is one already running got silently
changed from EINVAL to EINPROGRESS by 5ac00add.  Returning EINPROGRESS
to, say, add_dev, when rm_dev is running is misleading.  Furthermore,
the operation itself may want to use EINPROGRESS for other purposes.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ab82636..3aebb28 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1337,7 +1337,7 @@ static noinline int btrfs_ioctl_resize(struct file *file,
1)) {
pr_info(btrfs: dev add/delete/balance/replace/resize operation 
in progress\n);
mnt_drop_write_file(file);
-   return -EINPROGRESS;
+   return -EINVAL;
}
 
mutex_lock(root-fs_info-volume_mutex);
@@ -2193,7 +2193,7 @@ static int btrfs_ioctl_defrag(struct file *file, void 
__user *argp)
if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running,
1)) {
pr_info(btrfs: dev add/delete/balance/replace/resize operation 
in progress\n);
-   return -EINPROGRESS;
+   return -EINVAL;
}
ret = mnt_want_write_file(file);
if (ret) {
@@ -2267,7 +2267,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, 
void __user *arg)
if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running,
1)) {
pr_info(btrfs: dev add/delete/balance/replace/resize operation 
in progress\n);
-   return -EINPROGRESS;
+   return -EINVAL;
}
 
mutex_lock(root-fs_info-volume_mutex);
@@ -2304,7 +2304,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void 
__user *arg)
1)) {
pr_info(btrfs: dev add/delete/balance/replace/resize operation 
in progress\n);
mnt_drop_write_file(file);
-   return -EINPROGRESS;
+   return -EINVAL;
}
 
mutex_lock(root-fs_info-volume_mutex);
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] Btrfs: fix unlock order in btrfs_ioctl_resize

2013-01-20 Thread Ilya Dryomov
Fix unlock order in btrfs_ioctl_resize().

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3aebb28..b9a0190 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1447,8 +1447,8 @@ out_free:
kfree(vol_args);
 out:
mutex_unlock(root-fs_info-volume_mutex);
-   mnt_drop_write_file(file);
atomic_set(root-fs_info-mutually_exclusive_operation_running, 0);
+   mnt_drop_write_file(file);
return ret;
 }
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Btrfs: fix unlock order in btrfs_ioctl_rm_dev

2013-01-20 Thread Ilya Dryomov
Fix unlock order in btrfs_ioctl_rm_dev().

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b9a0190..9c079dd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2320,8 +2320,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void 
__user *arg)
kfree(vol_args);
 out:
mutex_unlock(root-fs_info-volume_mutex);
-   mnt_drop_write_file(file);
atomic_set(root-fs_info-mutually_exclusive_operation_running, 0);
+   mnt_drop_write_file(file);
return ret;
 }
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Mutually exclusive ops fixes

2013-01-20 Thread Ilya Dryomov
Hello,

This is a set of fixes for mutually exclusive ops stuff added by Stefan
as part of the dev-replace patchset.  The main one here is the first
patch, it fixes a regression in balancing interface.  The second patch
brings EINVAL error code back.  The rest are relatively minor ordering
fixes.

Thanks,

Ilya


Ilya Dryomov (5):
  Btrfs: bring back balance pause/resume logic
  Btrfs: fix mutually exclusive op is running error code
  Btrfs: fix unlock order in btrfs_ioctl_resize
  Btrfs: fix unlock order in btrfs_ioctl_rm_dev
  Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag

 fs/btrfs/ioctl.c   |  107 ++-
 fs/btrfs/volumes.c |   10 +++-
 2 files changed, 86 insertions(+), 31 deletions(-)

-- 
1.7.9.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Mutually exclusive ops fixes

2013-01-20 Thread Ilya Dryomov
On Sun, Jan 20, 2013 at 05:22:27PM +0200, Ilya Dryomov wrote:
 Hello,
 
 This is a set of fixes for mutually exclusive ops stuff added by Stefan
 as part of the dev-replace patchset.  The main one here is the first
 patch, it fixes a regression in balancing interface.  The second patch
 brings EINVAL error code back.  The rest are relatively minor ordering
 fixes.

You can pull from

git://github.com/idryomov/btrfs-unstable.git mutex-ops@next-master

top commit 2f68aafa90fedb3cdb80f0a8517a2130c5c035b0

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Btrfs: bring back balance pause/resume logic

2013-01-20 Thread Ilya Dryomov
Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
as part of dev-replace merge).  Offending commit took a stab at making
mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
replace_dev) not block behind volume_mutex if another such operation is
in progress and instead return an error right away.  Balancing front-end
relied on the blocking behaviour, so the fix is ugly, but short of a
complete rework, it's the best we can do.

Reported-by: Liu Bo bo.li@oracle.com
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c   |   78 ++-
 fs/btrfs/volumes.c |   10 --
 2 files changed, 71 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 171786a..ab82636 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3441,8 +3441,8 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
struct btrfs_fs_info *fs_info = root-fs_info;
struct btrfs_ioctl_balance_args *bargs;
struct btrfs_balance_control *bctl;
+   bool need_unlock; /* for mut. excl. ops lock */
int ret;
-   int need_to_clear_lock = 0;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3451,14 +3451,61 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
if (ret)
return ret;
 
-   mutex_lock(fs_info-volume_mutex);
+again:
+   if (!atomic_xchg(fs_info-mutually_exclusive_operation_running, 1)) {
+   mutex_lock(fs_info-volume_mutex);
+   mutex_lock(fs_info-balance_mutex);
+   need_unlock = true;
+   goto locked;
+   }
+
+   /*
+* mut. excl. ops lock is locked.  Three possibilites:
+*   (1) some other op is running
+*   (2) balance is running
+*   (3) balance is paused -- special case (think resume)
+*/
mutex_lock(fs_info-balance_mutex);
+   if (fs_info-balance_ctl) {
+   /* this is either (2) or (3) */
+   if (!atomic_read(fs_info-balance_running)) {
+   mutex_unlock(fs_info-balance_mutex);
+   if (!mutex_trylock(fs_info-volume_mutex))
+   goto again;
+   mutex_lock(fs_info-balance_mutex);
+
+   if (fs_info-balance_ctl 
+   !atomic_read(fs_info-balance_running)) {
+   /* this is (3) */
+   need_unlock = false;
+   goto locked;
+   }
+
+   mutex_unlock(fs_info-balance_mutex);
+   mutex_unlock(fs_info-volume_mutex);
+   goto again;
+   } else {
+   /* this is (2) */
+   mutex_unlock(fs_info-balance_mutex);
+   ret = -EINPROGRESS;
+   goto out;
+   }
+   } else {
+   /* this is (1) */
+   mutex_unlock(fs_info-balance_mutex);
+   pr_info(btrfs: dev add/delete/balance/replace/resize operation 
in progress\n);
+   ret = -EINVAL;
+   goto out;
+   }
+
+locked:
+   BUG_ON(!atomic_read(fs_info-mutually_exclusive_operation_running));
 
if (arg) {
bargs = memdup_user(arg, sizeof(*bargs));
if (IS_ERR(bargs)) {
ret = PTR_ERR(bargs);
-   goto out;
+   goto out_unlock;
}
 
if (bargs-flags  BTRFS_BALANCE_RESUME) {
@@ -3478,13 +3525,10 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
bargs = NULL;
}
 
-   if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running,
-   1)) {
-   pr_info(btrfs: dev add/delete/balance/replace/resize operation 
in progress\n);
+   if (fs_info-balance_ctl) {
ret = -EINPROGRESS;
goto out_bargs;
}
-   need_to_clear_lock = 1;
 
bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
if (!bctl) {
@@ -3505,11 +3549,17 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
}
 
 do_balance:
-   ret = btrfs_balance(bctl, bargs);
/*
-* bctl is freed in __cancel_balance or in free_fs_info if
-* restriper was paused all the way until unmount
+* Ownership of bctl and mutually_exclusive_operation_running
+* goes to to btrfs_balance.  bctl is freed in __cancel_balance,
+* or, if restriper was paused all the way until unmount, in
+* free_fs_info.  mutually_exclusive_operation_running is
+* cleared in __cancel_balance.
 */
+   need_unlock = false;
+
+   ret = btrfs_balance(bctl, bargs);
+
if (arg

Re: [PATCH] Btrfs: fix crash of starting balance

2013-01-15 Thread Ilya Dryomov
On Tue, Jan 15, 2013 at 10:47:57PM +0800, Liu Bo wrote:
 We will crash on BUG_ON(ret == -EEXIST) when we do not resume the existing
 balance but attempt to start a new one.
 
 The steps can be:
 1. start balance
 2. pause balance
 3. start balance
 
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/volumes.c |7 ++-
  1 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 5cce6aa..3901654 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -3100,7 +3100,12 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
   goto out;
  
   if (!(bctl-flags  BTRFS_BALANCE_RESUME)) {
 - BUG_ON(ret == -EEXIST);
 + /*
 +  * This can happen when we do not resume the existing balance
 +  * but try to start a new one instead.
 +  */
 + if (ret == -EEXIST)
 + goto out;
   set_balance_control(bctl);
   } else {
   BUG_ON(ret != -EEXIST);

OK, it seems balance pause/resume logic got broken by dev-replace code
(5ac00addc7ac09110995fe967071d191b5981cc1), which went into v3.8-rc1.
This is most certainly not the right way to fix it, that BUG_ON is there
for a reason.  I'll send a fix in a couple of days.

Thanks,

Ilya

 -- 
 1.7.7.6
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Ilya Dryomov
On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote:
 This serie of patches updated the command btrfs filesystem
 df. I update this command because it is not so easy to get
 the information about the disk usage from the command fi df and fi show.
 
 From the man page (see 2nd patch):
 
 [...]
 The  command  btrfs  filesystem  df is used to query the
 status of the chunks, how many space on the disk(s) are used  by
 the  chunks,  how many space are available in the chunks, and an
 estimation of the free space of the filesystem.
 [...]
 
 $ ./btrfs filesystem df --help
 usage: btrfs filesystem disk-usage [-d][-s][-k] path [path..]
 
 Show space usage information for a mount point(s).
 
 -k  Set KB (1024 bytes) as unit
 -s  Don't show the summary section
 -d  Don't show the detail section

Hi Goffredo,

Why not show just the summary section by default and have a flag (-v)
for the details section?

 $ ./btrfs filesystem df /
 Path: /
 Summary:
   Disk_size: 72.57GB
   Disk_allocated:25.10GB
   Disk_unallocated:  47.48GB
   Logical_size:  23.06GB
   Used:  11.01GB
   Free_(Estimated):  55.66GB(Max: 59.52GB, Min: 35.78GB)
   Data_to_disk_ratio:   92 %
 
 Details:
   Chunk-type  Mode  Chunk-size Logical-sizeUsed

Type for the first column is probably enough.

Why is the third column called Chunk-size?  If my understanding is
correct, it's just a break down of Disk_allocated from the summary
section.  If so, why not call it Disk_allocated to avoid confusion?

Also, why do you use dashes instead of underbars for table headers?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov
On Wed, Oct 03, 2012 at 01:43:15PM +0200, Goffredo Baroncelli wrote:
 The  command  btrfs  filesystem  df is used to query the
 status of the chunks, how many space on the disk(s) are used  by
 the  chunks,  how many space are available in the chunks, and an
 estimation of the free space of the filesystem.
 ---
  cmds-filesystem.c |  264 
 +++--
  1 file changed, 215 insertions(+), 49 deletions(-)
 
 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index b1457de..6c3ebdc 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c

(snipped)

 +static int cmd_disk_free(int argc, char **argv)
 +{
 +
 + int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT;
 + int i, more_than_one=0;
 +
 + if (check_argc_min(argc, 2))
 + usage(cmd_disk_free_usage);
 +
 + for(i=1; i argc ; i++){
 + if(!strcmp(argv[i],-d))
 + flags = ~DF_SHOW_DETAIL;
 + else if(!strcmp(argv[i],-s))
 + flags = ~DF_SHOW_SUMMARY;
 + else if(!strcmp(argv[i],-k))
 + flags = ~DF_HUMAN_UNIT;
 + else{
 + int r;
 + if(more_than_one)
 + printf(\n);
 + r = _cmd_disk_free(argv[i], flags);
 + if( r ) return r;
 + more_than_one=1;
 + }

Is there any reason getopt(), or better yet, getopt_long() won't work?

 +
 + }
 +
 + return 0;
 +}
 + 
 +
 + 
  static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search)
  {
   char uuidbuf[37];
 @@ -529,7 +695,7 @@ static int cmd_label(int argc, char **argv)
  
  const struct cmd_group filesystem_cmd_group = {
   filesystem_cmd_group_usage, NULL, {
 - { df, cmd_df, cmd_df_usage, NULL, 0 },
 + { df, cmd_disk_free, cmd_disk_free_usage, NULL, 0 },

If this command is going to replace df, you should change the function
name back to cmd_df.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov
On Wed, Oct 03, 2012 at 06:34:38PM +0200, Goffredo Baroncelli wrote:
 On 10/03/2012 05:02 PM, Ilya Dryomov wrote:
 On Wed, Oct 03, 2012 at 01:43:15PM +0200, Goffredo Baroncelli wrote:
 The  command  btrfs  filesystem  df is used to query the
 status of the chunks, how many space on the disk(s) are used  by
 the  chunks,  how many space are available in the chunks, and an
 estimation of the free space of the filesystem.
 ---
   cmds-filesystem.c |  264 
  +++--
   1 file changed, 215 insertions(+), 49 deletions(-)
 
 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index b1457de..6c3ebdc 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c
 
 (snipped)
 
 +static int cmd_disk_free(int argc, char **argv)
 +{
 +
 +   int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT;
 +   int i, more_than_one=0;
 +
 +   if (check_argc_min(argc, 2))
 +   usage(cmd_disk_free_usage);
 +
 +   for(i=1; i  argc ; i++){
 +   if(!strcmp(argv[i],-d))
 +   flags= ~DF_SHOW_DETAIL;
 +   else if(!strcmp(argv[i],-s))
 +   flags= ~DF_SHOW_SUMMARY;
 +   else if(!strcmp(argv[i],-k))
 +   flags= ~DF_HUMAN_UNIT;
 +   else{
 +   int r;
 +   if(more_than_one)
 +   printf(\n);
 +   r = _cmd_disk_free(argv[i], flags);
 +   if( r ) return r;
 +   more_than_one=1;
 +   }
 
 Is there any reason getopt(), or better yet, getopt_long() won't work?
 
 In the beginning there were also the switches +d, +s, +k, then I
 dropped them. So I leaved this style. The code is simple enough to
 not justify a change.

It's not a matter of style.  It is generally expected that several short
options can be specified after one dash.  Further, if somebody else were
to add an option with an optional parameter, they'd have to spend time
rewriting your stuff.

In addition to that, other parts of progs already use getopt() in
similar situations.  One example would be the scrub subgroup: scrub
commands are equally simple (a few short switches, no parameters), and
yet getopt() is used.

 
 
 +
 +   }
 +
 +   return 0;
 +}  
 +   
 +
 +   
   static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search)
   {
 char uuidbuf[37];
 @@ -529,7 +695,7 @@ static int cmd_label(int argc, char **argv)
 
   const struct cmd_group filesystem_cmd_group = {
 filesystem_cmd_group_usage, NULL, {
 -   { df, cmd_df, cmd_df_usage, NULL, 0 },
 +   { df, cmd_disk_free, cmd_disk_free_usage, NULL, 0 },
 
 If this command is going to replace df, you should change the function
 name back to cmd_df.
 
 I was never convinced to use 'df'. At the beginning when I wrote the
 first parser of btrfs, was suggested (not by me) to use long
 command and allow the parser to match a contracted command until
 there was any ambiguity. I suggested to use disk-free, but everybody
 were confortable with df.. so I leaved it as official name. But I
 prefer for the internal code a more verbose name.

Well, all your patch is doing is extending the functionality of an
existing command.  The official name for that command is df, and you
are not changing its name.  Why change the name of an existing function?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Ilya Dryomov
On Wed, Oct 03, 2012 at 06:46:00PM +0200, Goffredo Baroncelli wrote:
 On 10/03/2012 05:01 PM, Ilya Dryomov wrote:
 Type for the first column is probably enough.
 
 Why is the third column called Chunk-size?  If my understanding is
 correct, it's just a break down of Disk_allocated from the summary
 section.  If so, why not call it Disk_allocated to avoid confusion?
 
 Using everywhere Disk_something was my first attempt. But after
 some thoughts I decided that these are two different kind of
 information. It is true that Disk_allocated is the sum of
 Chunk-Sizes... But my feels is that this is a kind of
 implementation details. If some other type of allocation unit will
 be added to BTRFS, then these will be added to Disk_allocated, but
 not to Chunk list...
 I prefer to not change the wording until an enough critical mass of
 people converge to a unique solution .

It is the chunks that is the implementation detail that we want to hide.
Average Btrfs user wouldn't want to know anything about chunks, the only
thing he'd be interested in is Disk_allocated and similar fields.

Moreover, I am pretty sure Chunk-Size would actually confuse people.
I stared at your example output for a few seconds trying to comprehend a
21GB Chunk-Size on a 72GB partition.  What you call Chunk-Size is
actually a sum of sizes of chunks of that particular type, and a few
lines above you call the same exact sum (only this time over all types
of chunks) Disk_allocated.  So I think it's only logical to rename the
column in question to Disk_allocated to match the summary section.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Ilya Dryomov
On Wed, Oct 03, 2012 at 10:01:00PM +0200, Goffredo Baroncelli wrote:
 On 10/03/2012 07:46 PM, Ilya Dryomov wrote:
 On Wed, Oct 03, 2012 at 06:46:00PM +0200, Goffredo Baroncelli wrote:
 On 10/03/2012 05:01 PM, Ilya Dryomov wrote:
 Type for the first column is probably enough.
 
 Why is the third column called Chunk-size?  If my understanding is
 correct, it's just a break down of Disk_allocated from the summary
 section.  If so, why not call it Disk_allocated to avoid confusion?
 
 Using everywhere Disk_something  was my first attempt. But after
 some thoughts I decided that these are two different kind of
 information. It is true that Disk_allocated is the sum of
 Chunk-Sizes... But my feels is that this is a kind of
 implementation details. If some other type of allocation unit will
 be added to BTRFS, then these will be added to Disk_allocated, but
 not to Chunk list...
 I prefer to not change the wording until an enough critical mass of
 people converge to a unique solution .
 
 It is the chunks that is the implementation detail that we want to hide.
 Average Btrfs user wouldn't want to know anything about chunks, the only
 thing he'd be interested in is Disk_allocated and similar fields.
 
 The df standard tool id sufficient for the average user.
 We need only to export these information via the standard syscall
 stat[v]fs. Basically we should try to implement the algorithm of the
 Free_(Estimated) space for the statfs(2) syscall.
 Who uses btrfs tools, is an user with knowledge of btrfs higher than
 the average.
 
 Moreover, I am pretty sure Chunk-Size would actually confuse people.
 I stared at your example output for a few seconds trying to comprehend a
 21GB Chunk-Size on a 72GB partition.  What you call Chunk-Size is
 actually a sum of sizes of chunks of that particular type, and a few
 lines above you call the same exact sum (only this time over all types
 of chunks) Disk_allocated.  So I think it's only logical to rename the
 column in question to Disk_allocated to match the summary section.
 
 What about
 [...]
 Details:
   Chunk_type  Mode  Size_(disk) Size_(logical) Used
   DataSingle   21.01GB  21.01GB 10.53GB
   System  DUP  80.00MB  40.00MB  4.00KB
 [...]
 
 ?

This is definitely better.  Can you also drop Chunk_type in favor of
Type?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov
On Wed, Oct 03, 2012 at 07:22:31PM +0200, Goffredo Baroncelli wrote:
 From: Goffredo Baroncelli kreij...@inwind.it
 
 The  command  btrfs  filesystem  df is used to query the
 status of the chunks, how many space on the disk(s) are used  by
 the  chunks,  how many space are available in the chunks, and an
 estimation of the free space of the filesystem.
 ---
  cmds-filesystem.c |  284 
 -
  1 file changed, 235 insertions(+), 49 deletions(-)
 
 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index b1457de..2701904 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c
 @@ -22,6 +22,7 @@
  #include errno.h
  #include uuid/uuid.h
  #include ctype.h
 +#include sys/vfs.h
  
  #include kerncompat.h
  #include ctree.h
 @@ -39,25 +40,55 @@ static const char * const filesystem_cmd_group_usage[] = {
   NULL
  };
  
 -static const char * const cmd_df_usage[] = {
 - btrfs filesystem df path,
 - Show space usage information for a mount point,
 - NULL
 -};
 +static u64 disk_size( char *path){
 + struct statfs   sfs;
 +
 + if( statfs(path, sfs)  0 )
 + return 0;
 + else
 + return sfs.f_bsize * sfs.f_blocks;
 +
 +}
 +
 +static void print_string(char *s, int w)
 +{
 + int i;
 +
 + printf(%s, s);
 + for( i = strlen(s) ; i  w ; i++ ) 
 + putchar(' ');
 +}
 +
 +#define DF_SHOW_SUMMARY  (11)
 +#define DF_SHOW_DETAIL   (12)
 +#define DF_HUMAN_UNIT(13)
 +
 +static void pretty_sizes_r(u64 size, int w, int mode)
 +{
 + if( mode  DF_HUMAN_UNIT ){
 + char *s = pretty_sizes(size);
 + printf(%*s, w, s);
 + free(s);
 + } else {
 + printf(%*llu, w, size/1024);
 +
 + }
 +}
  
 -static int cmd_df(int argc, char **argv)
 +static int _cmd_disk_free(char *path, int mode)
  {
   struct btrfs_ioctl_space_args *sargs;
   u64 count = 0, i;
   int ret;
   int fd;
 - int e;
 - char *path;
 -
 - if (check_argc_exact(argc, 2))
 - usage(cmd_df_usage);
 -
 - path = argv[1];
 + int e, width;
 + u64  total_disk;/* filesystem size == sum of
 +disks sizes */
 + u64  total_chunks;  /* sum of chunks sizes on disk(s) */
 + u64  total_used;/* logical space used */
 + u64  total_free;/* logical space un-used */
 + double K;
 + 
  
   fd = open_file_or_dir(path);
   if (fd  0) {
 @@ -103,56 +134,211 @@ static int cmd_df(int argc, char **argv)
   return ret;
   }
  
 - for (i = 0; i  sargs-total_spaces; i++) {
 - char description[80];
 - char *total_bytes;
 - char *used_bytes;
 - int written = 0;
 - u64 flags = sargs-spaces[i].flags;
 + total_disk = disk_size(path);
 + e = errno;
 + if( total_disk == 0 ){
 + fprintf(stderr, ERROR: couldn't get space info on '%s' - %s\n,
 + path, strerror(e));
 + close(fd);
 + free(sargs);
 + return 19;
 + }
 + 
 + total_chunks = total_used = total_free = 0;
  
 - memset(description, 0, 80);
 + for (i = 0; i  sargs-total_spaces; i++) {
 + int  ratio=1;
 + u64  allocated;
  
 - if (flags  BTRFS_BLOCK_GROUP_DATA) {
 - if (flags  BTRFS_BLOCK_GROUP_METADATA) {
 - snprintf(description, 14, %s,
 -  Data+Metadata);
 - written += 13;
 - } else {
 - snprintf(description, 5, %s, Data);
 - written += 4;
 - }
 - } else if (flags  BTRFS_BLOCK_GROUP_SYSTEM) {
 - snprintf(description, 7, %s, System);
 - written += 6;
 - } else if (flags  BTRFS_BLOCK_GROUP_METADATA) {
 - snprintf(description, 9, %s, Metadata);
 - written += 8;
 - }
 + u64 flags = sargs-spaces[i].flags;
  
   if (flags  BTRFS_BLOCK_GROUP_RAID0) {
 - snprintf(description+written, 8, %s, , RAID0);
 - written += 7;
 + ratio=1;
   } else if (flags  BTRFS_BLOCK_GROUP_RAID1) {
 - snprintf(description+written, 8, %s, , RAID1);
 - written += 7;
 + ratio=2;
   } else if (flags  BTRFS_BLOCK_GROUP_DUP) {
 - snprintf(description+written, 6, %s, , DUP);
 - written += 5;
 + ratio=2;
   } else if (flags  BTRFS_BLOCK_GROUP_RAID10) {
 - snprintf(description+written, 9, %s, , RAID10);
 -

Re: [PATCH V4 1/2] Btrfs: cleanup duplicated division functions

2012-09-27 Thread Ilya Dryomov
Hi Miao,

You haven't addressed any of my concerns with v3.

On Thu, Sep 27, 2012 at 06:19:58PM +0800, Miao Xie wrote:

(snipped)

 the parameters are right. So besides the code cleanup, this patch also
 add a check for the usage of the space balance, it is the only place that
 we need add check to make sure the parameters of div_factor are right till
 now. Besides that, the old kernel may hold the wrong usage value, so we
 must rectify it.

Cleaning up/unifying duplicated functions and changing the existing
logic are two very different things.  If you, in the course of writing
this patch, became unhappy with the way balancing ioctl deals with
invalid input, please send a separate patch.

Before your patch, volumes.c had its own copy of div_factor_fine():

static u64 div_factor_fine(u64 num, int factor)
{
if (factor = 0)
return 0;
if (factor = 100)
return num;

num *= factor;
do_div(num, 100);
return num;
}

which was called from chunk_usage_filter() on unvalidated user input.
As far as the cleanup part of your patch goes, you've dropped
factor = 0 / factor = 100 logic, merged volumes.c's copy with
extent-tree.c's copy and renamed div_factor_fine() to div_factor().  To
make chunk_usage_filter() happy again, it's enough to move the dropped
logic directly to the call site:

static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
  struct btrfs_balance_args *bargs)
{
...

-   user_thresh = div_factor_fine(cache-key.offset, bargs-usage);
+   if (bargs-usage == 0)
+   user_thresh = 0;
+   else if (bargs-usage = 100)
+   user_thresh = cache-key.offset;
+   else
+   user_thresh = div_factor(cache-key.offset, bargs-usage);

...
}

So I would suggest you drop all hunks related to changing the way
balancing ioctl works and make the above change to chunk_usage_filter()
instead.  Once again, if you are unhappy with usage filter argument
handling, send a separate patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions

2012-09-24 Thread Ilya Dryomov
On Mon, Sep 24, 2012 at 06:47:42PM +0200, David Sterba wrote:
 On Mon, Sep 24, 2012 at 10:05:02AM +0800, Miao Xie wrote:
  Generally, we should check the value when it is input. If not, we might
  run our program with the wrong value, and it is possible to cause unknown
  problems. So I think the above chunk is necessary.
 
 The difference is that giving an invalid value will exit early (your
 version), while Ilya's version will clamp the 'usage' to the allowed
 range during processing. From usability POV, I'd prefer to let the
 command fail early with a verbose error eg. that the range is invalid.

I think that's the job for progs, and that's where this and a few other
checks are currently implemented.

There is no possibility for unknown problems, that's exactly how it's
been implemented before the cleanup.  The purpose of balance filters
(and the usage filter in particular) is to lower the number of chunks we
have to relocate.  If someone decides to use raw ioctls, and supplies us
with the invalid value, we just cut it down to a 100 and proceed.
usage=100 is the equivalent of not using the usage filter at all, so the
raw ioctl user will get what he deserves.

This is very similar to what we currently do with other filters.  For
example, soft can only be used with convert, and this is checked by
progs.  But, if somebody were to set a soft flag without setting
convert we would simply ignore that soft.  Same goes for drange
and devid, invalid ranges, invalid devids, etc.  Pulling all these
checks into the kernel is questionable at best, and, if enough people
insist on it, should be discussed separately.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions

2012-09-23 Thread Ilya Dryomov
On Fri, Sep 21, 2012 at 05:07:46PM +0800, Miao Xie wrote:
 div_factor{_fine} has been implemented for two times, and these two functions
 are very similar, so cleanup the reduplicate implement and drop the original
 div_factor(), and then rename div_factor_fine() to div_factor(). So the 
 divisor
 of the new div_factor() is 100, not 10.
 
 And I move div_factor into a independent file named math.h because it is a
 common math function, may be used by every composition of btrfs.
 
 Because these functions are mostly used on the hot path, and we are sure
 the parameters are right in the most cases, we don't add complex checks
 for the parameters. But in the other place, we must check and make sure
 the parameters are right. So besides the code cleanup, this patch also
 add a check for the usage of the space balance, it is the only place that
 we need add check to make sure the parameters of div_factor are right till 
 now.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
 Changelog v2 - v3:
 - drop the original div_factor and rename div_factor_fine to div_factor
 - drop the check of the factor
 
 Changelog v1 - v2:
 - add missing check
 ---
  fs/btrfs/extent-tree.c |   29 ++---
  fs/btrfs/ioctl.c   |   18 ++
  fs/btrfs/math.h|   33 +
  fs/btrfs/relocation.c  |2 +-
  fs/btrfs/transaction.c |2 +-
  fs/btrfs/volumes.c |   30 +-
  6 files changed, 64 insertions(+), 50 deletions(-)
  create mode 100644 fs/btrfs/math.h
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index a010234..bcb9ced 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -33,6 +33,7 @@
  #include volumes.h
  #include locking.h
  #include free-space-cache.h
 +#include math.h
  
  #undef SCRAMBLE_DELAYED_REFS
  
 @@ -648,24 +649,6 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info 
 *info)
   rcu_read_unlock();
  }
  
 -static u64 div_factor(u64 num, int factor)
 -{
 - if (factor == 10)
 - return num;
 - num *= factor;
 - do_div(num, 10);
 - return num;
 -}
 -
 -static u64 div_factor_fine(u64 num, int factor)
 -{
 - if (factor == 100)
 - return num;
 - num *= factor;
 - do_div(num, 100);
 - return num;
 -}
 -
  u64 btrfs_find_block_group(struct btrfs_root *root,
  u64 search_start, u64 search_hint, int owner)
  {
 @@ -674,7 +657,7 @@ u64 btrfs_find_block_group(struct btrfs_root *root,
   u64 last = max(search_hint, search_start);
   u64 group_start = 0;
   int full_search = 0;
 - int factor = 9;
 + int factor = 90;
   int wrapped = 0;
  again:
   while (1) {
 @@ -708,7 +691,7 @@ again:
   if (!full_search  factor  10) {
   last = search_start;
   full_search = 1;
 - factor = 10;
 + factor = 100;
   goto again;
   }
  found:
 @@ -3513,7 +3496,7 @@ static int should_alloc_chunk(struct btrfs_root *root,
   if (force == CHUNK_ALLOC_LIMITED) {
   thresh = btrfs_super_total_bytes(root-fs_info-super_copy);
   thresh = max_t(u64, 64 * 1024 * 1024,
 -div_factor_fine(thresh, 1));
 +div_factor(thresh, 1));
  
   if (num_bytes - num_allocated  thresh)
   return 1;
 @@ -3521,12 +3504,12 @@ static int should_alloc_chunk(struct btrfs_root *root,
   thresh = btrfs_super_total_bytes(root-fs_info-super_copy);
  
   /* 256MB or 2% of the FS */
 - thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2));
 + thresh = max_t(u64, 256 * 1024 * 1024, div_factor(thresh, 2));
   /* system chunks need a much small threshold */
   if (sinfo-flags  BTRFS_BLOCK_GROUP_SYSTEM)
   thresh = 32 * 1024 * 1024;
  
 - if (num_bytes  thresh  sinfo-bytes_used  div_factor(num_bytes, 8))
 + if (num_bytes  thresh  sinfo-bytes_used  div_factor(num_bytes, 80))
   return 0;
   return 1;
  }
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 9384a2a..d8d53f7 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -3335,6 +3335,24 @@ static long btrfs_ioctl_balance(struct file *file, 
 void __user *arg)
  
   goto do_balance;
   }
 +
 + if ((bargs-data.flags  BTRFS_BALANCE_ARGS_USAGE) 
 + (bargs-data.usage  0 || bargs-data.usage  100)) {
 + ret = -EINVAL;
 + goto out_bargs;
 + }
 +
 + if ((bargs-meta.flags  BTRFS_BALANCE_ARGS_USAGE) 
 + (bargs-meta.usage  0 || bargs-meta.usage  100)) {
 + ret = -EINVAL;
 + goto out_bargs;
 + }
 +
 + if ((bargs-sys.flags  BTRFS_BALANCE_ARGS_USAGE) 
 + (bargs-sys.usage  0 || bargs-sys.usage  100)) {
 +

Re: [PATCH] Btrfs-progs: btrfs subvolume delete could delete subvolumes

2012-09-21 Thread Ilya Dryomov
On Fri, Sep 21, 2012 at 02:54:08PM +0800, Anand jain wrote:
 From: Anand Jain anand.j...@oracle.com
 
 With this user will be able to provide more than one subvolume
 to delete.
 eg: btrfs subvolume delete subvol1 subvol2
 
 Signed-off-by: Anand Jain anand.j...@oracle.com
 ---
  cmds-subvolume.c |   36 
  man/btrfs.8.in   |4 ++--
  2 files changed, 26 insertions(+), 14 deletions(-)
 
 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index f4aa80f..cfeaa8d 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -188,31 +188,34 @@ int test_issubvolume(char *path)
  }
  
  static const char * const cmd_subvol_delete_usage[] = {
 - btrfs subvolume delete name,
 - Delete a subvolume,
 + btrfs subvolume delete subvolume [subvolume...],
 + Delete subvolume(s),
   NULL
  };
  
  static int cmd_subvol_delete(int argc, char **argv)
  {
 - int res, fd, len, e;
 + int res, fd, len, e, cnt = 1, ret = 0;
   struct btrfs_ioctl_vol_args args;
   char*dname, *vname, *cpath;
   char*path;
  
 - if (check_argc_exact(argc, 2))
 + if (argc  2)
   usage(cmd_subvol_delete_usage);

check_argc_min(argc, 2)

Haven't looked at the rest.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions

2012-09-17 Thread Ilya Dryomov
On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote:
 On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote:
  On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote:
  div_factor{_fine} has been implemented for two times, cleanup it.
  And I move them into a independent file named math.h because they are
  common math functions.
  
  You removed the sanity checks:
  
  -   if (factor = 0)
  -   return 0;
  -   if (factor = 100)
  -   return num;
 
 As inline functions, they should not contain complex checks, the caller should
 make sure the parameters are right. I think.

div_factor_fine() in volumes.c is not inline, and is called from
chunk_usage_filter() on unvalidated user input.  If you think the caller
should do those checks, you should move them to the caller as part of
your patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change small filesystem to normal

2012-07-22 Thread Ilya Dryomov
On Sun, Jul 22, 2012 at 05:06:24PM +0200, Swâmi Petaramesh wrote:
 Hi,
 
 I've created a small BTRFS filesystem, where metadata+data are mixed
 (and metadata are not DUP'ed).
 
 Then I've enlarged the FS to 1 GB ; now I'd like to make it normal
 with separate data and metadata, and DUP'ed metadata.
 
 Is there a way tp do this without reformatting the FS ?

No, currently there is no way to do this.  You'll have to create a new
filesystem with mkfs.btrfs.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi df won't show total=

2012-07-09 Thread Ilya Dryomov
On Mon, Jul 09, 2012 at 10:06:24PM +0200, Jan Engelhardt wrote:
 
 On Monday 2012-07-09 21:25, Hugo Mills wrote:
 On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote:
  
  On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be
  observed: after a change of the profiles, total=,used= is no
  longer shown:
  
  20:49 mmsrv1:~ # btrfs fi df /top.srv/
  Data, RAID10: total=152.00GiB, used=121.07GiB
  System, RAID1: total=40.00MiB, used=44.00KiB
  System: total=4.00MiB, used=0.00
  Metadata, RAID1: total=112.00GiB, used=1.30GiB
  Metadata: total=8.00MiB, used=0.00
 [...]
  21:10 mmsrv1:~ # btrfs fi df /top.srv/
  Data, RAID10: total=156.00GiB, used=124.35GiB
  System, RAID10: total=128.00MiB, used=48.00KiB
  System: total=4.00MiB, used=0.00
  Metadata, RAID10: total=112.00GiB, used=1.38GiB
 
What's the problem here? You no longer have any RAID1 chunks, so
 it's not showing them.
 
 Rather tha a 4-line output, I would have expected this 6-line output
 that I would also get when mkfs'ing a new fresh btrfs volume with
 raid10 from the start:
 
 Data, RAID10: total=156.00GiB, used=124.35GiB
 Data: total=foo, used=bar
 System, RAID10: total=128.00MiB, used=48.00KiB
 System: total=4.00MiB, used=0.00
 Metadata, RAID10: total=112.00GiB, used=1.38GiB
 Metadata: total=foo, used=bar

mkfs creates those non-raid chunks for a pretty stupid reason, they
really shouldn't be there if you create a raid10 fs.  Balance does the
right thing and removes them.  Fixing this along with another mkfs
annoyance related to this one is on my TODO list.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-05 Thread Ilya Dryomov
On Thu, Jul 05, 2012 at 10:20:16AM -0700, Zach Brown wrote:
 On 07/05/2012 10:14 AM, Alexander Block wrote:
 On Thu, Jul 5, 2012 at 7:08 PM, Zach Brownz...@zabbo.net  wrote:
 
 Careful, timespec will be different sizes in 32bit userspace and a 64bit
 kernel.  I'd use btrfs_timespec to get a fixed size timespec and avoid
 all the compat_timespec noise.  (I'd then also worry about padding and
 might pack the struct.. I always lose track of the best practice across
 all archs.)
 
 Hmm we currently don't have ctree.h in ioctl.h. Can I include it there
 or are there problems with that? As an alternative I could define my
 own struct for that.
 
 Hmm, yeah, it looks like ioctl.h is well isolated and doesn't really
 have a precedent for pulling in format bits from the kernel
 implementation.
 
 I'd do as you suggested and just make its own ioctl_timespec with a
 comment that its duplicating other similar structures to keep ioctl.h
 from getting tangled up in the kernel-side includes.

This has been done for restriper, see struct btrfs_balance_args vs
struct btrfs_disk_balance_args.  You could do the same thing:

struct btrfs_ioctl_timespec {
__u64 sec;
__u32 nsec;
} __attribute__ ((__packed__));

and take endianess into account with le{64,32}_to_cpu and
cpu_to_le{64,32} macros.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-05 Thread Ilya Dryomov
On Thu, Jul 05, 2012 at 11:37:40AM -0700, Zach Brown wrote:
 
 and take endianess into account with le{64,32}_to_cpu and
 cpu_to_le{64,32} macros.
 
 The kernel doesn't support system calls from userspace of a different
 endianness, no worries there :)

What if you are on a big-endian machine with a big-endian kernel and
userspace?  Everything on-disk should be little-endian, so if you are
going to write stuff you got from userspace to disk, at some point you
have to make sure you are writing out bytes in the right order.

Alex already does that, so my remarks are moot ;)

+   root_item-stime.sec = cpu_to_le64(sa-stime.tv_sec);
+   root_item-stime.nsec = cpu_to_le64(sa-stime.tv_nsec);
+   root_item-rtime.sec = cpu_to_le64(sa-rtime.tv_sec);
+   root_item-rtime.nsec = cpu_to_le64(sa-rtime.tv_nsec);

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-05 Thread Ilya Dryomov
On Thu, Jul 05, 2012 at 09:18:41PM +0200, Alexander Block wrote:
 On Thu, Jul 5, 2012 at 9:01 PM, Zach Brown z...@zabbo.net wrote:
  On 07/05/2012 11:59 AM, Ilya Dryomov wrote:
 
  What if you are on a big-endian machine with a big-endian kernel and
  userspace?  Everything on-disk should be little-endian, so if you are
  going to write stuff you got from userspace to disk, at some point you
  have to make sure you are writing out bytes in the right order.
 
  Alex already does that, so my remarks are moot ;)
 
 
  Yeah, indeed, we were only talking about the ioctl interface crossing
  the user-kernel barrier :).
 
 I decided to not use __leXX variables in the new btrfs_ioctl_timespec
 structure because as most arguments found ioctls are currently in cpu
 dependent endianess. The kernel will then do the endianess conversion
 as it already did with struct timespec.

That's exactly the point of adding btrfs_ioctl_timespec instead of just
copying btrfs_timespec definition.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Btrfs-progs: make get default report correctly

2012-07-02 Thread Ilya Dryomov
On Mon, Jul 02, 2012 at 09:39:45AM +0800, Liu Bo wrote:
 On 06/29/2012 06:36 PM, Ilya Dryomov wrote:
 
  On Fri, Jun 29, 2012 at 06:00:37PM +0800, Liu Bo wrote:
  We have both set-default and get-default, but actually our get-default
  does not show which one is the default one.
 
  This patch plus the kernel side patch fix this.
  
  Are you referring to the fact that get-default in Chris' btrfs-progs
  master lists all subvolumes instead of showing the default one?  If so,
  I fixed that problem a while ago, patch is in Hugo's integration branch.
  
 
 
 So can you please show me a link about your patch?

I'm sorry, it turns out Hugo hasn't pushed it yet.  Here is a link:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/16187/

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Btrfs-progs: add support to set subvolume/snapshot readonly

2012-07-02 Thread Ilya Dryomov
On Mon, Jul 02, 2012 at 10:07:42AM +0800, Liu Bo wrote:
 On 06/29/2012 06:21 PM, Ilya Dryomov wrote:
 
  On Fri, Jun 29, 2012 at 06:00:36PM +0800, Liu Bo wrote:
  Setting subvolume/snapshot readonly has been missing for a long time.
 
  With this patch, we can set a subvolume/snapshot readonly via:
 
  obtrfs subvolume set-ro path
  
  Alexander's 'btrfs property' patches do exactly this, but in a much more
  generic and extensible way.  'btrfs property' subgroup provides a
  uniform interface for getting and setting properties of filesystem
  objects in general, not only those of subvolumes and snapshots.  It
  provides a much better user interface, and it also allows us to easily
  rethink kernel-user interface for generic get/set in future.
  
 
 
 Thanks for the explanation!
 
 But I prefer keeping the current categories {subvolume,filesystem,device,...}:
 
 o Compatibility, we cannot remove the old commands until we make sure that no 
 users will
   use them.

We are not going to remove old commands any time soon.  However, adding
new ones that clearly fall into get/set category, is not a good idea.
Especially when there is a generic interface on its way.

 
 o We've three properties {default, readonly, lable}, is it worthy making 
 another new interface?

It's not just about subvolumes.  There will be a lot more properties on
the table as filesystem matures, for example device speeds, subvolume
profiles, quotas.

 
 o Current categories are clear and clean.

Once again, it's not just about subvolumes.  Current categories are
indeed clear, but adding two commands for each non-trivial property that
comes up in future does not seem practical to me.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: no dev_stats entry found / OK on first mount after mkfs

2012-07-02 Thread Ilya Dryomov
On Mon, Jul 02, 2012 at 02:46:11PM -0700, Christian Kujau wrote:
 Hi,
 
 after upgrading from 3.4.0 to 3.5.0-rc5 on this powerpc machine, the 
 following is printed during bootup:
 
   [   18.630750] device fsid ce8c9df5-0a93-47c6-adf6-25084f352a4f devid 1 
 transid 11061 /dev/hda7
   [   18.637193] btrfs: disk space caching is enabled
   [   18.706423] btrfs: no dev_stats entry found for device /dev/hda7 (devid 
 1) (OK on first mount after mkfs)
 
 The btrfs on hda7 has been created many months ago and has been mounted 
 several times since then. Assuming first mount after mkfs does not apply 
 here, is it then NOT OK that no dev_stats entry has been found?
 
 IOW: should I worry about this message?

No, you should not.  If this is your first mount after upgrading your
kernel to the one with btrfs dev stats feature, first mount after mkfs
sort of applies.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Btrfs-progs: add support to set subvolume/snapshot readonly

2012-06-29 Thread Ilya Dryomov
On Fri, Jun 29, 2012 at 06:00:36PM +0800, Liu Bo wrote:
 Setting subvolume/snapshot readonly has been missing for a long time.
 
 With this patch, we can set a subvolume/snapshot readonly via:
 
 obtrfs subvolume set-ro path

Alexander's 'btrfs property' patches do exactly this, but in a much more
generic and extensible way.  'btrfs property' subgroup provides a
uniform interface for getting and setting properties of filesystem
objects in general, not only those of subvolumes and snapshots.  It
provides a much better user interface, and it also allows us to easily
rethink kernel-user interface for generic get/set in future.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Btrfs-progs: make get default report correctly

2012-06-29 Thread Ilya Dryomov
On Fri, Jun 29, 2012 at 06:00:37PM +0800, Liu Bo wrote:
 We have both set-default and get-default, but actually our get-default
 does not show which one is the default one.
 
 This patch plus the kernel side patch fix this.

Are you referring to the fact that get-default in Chris' btrfs-progs
master lists all subvolumes instead of showing the default one?  If so,
I fixed that problem a while ago, patch is in Hugo's integration branch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: restore restriper state on all mounts

2012-06-26 Thread Ilya Dryomov
First of all, thanks for reviewing!

On Tue, Jun 26, 2012 at 06:17:39PM +0200, David Sterba wrote:
 On Fri, Jun 22, 2012 at 09:24:12PM +0300, Ilya Dryomov wrote:
  Fix a bug that triggered asserts in btrfs_balance() in both normal and
  resume modes -- restriper state was not properly restored on read-only
  mounts.  This factors out resuming code from btrfs_restore_balance(),
  which is now also called earlier in the mount sequence to avoid the
  problem of some early writes getting the old profile.
  
  Signed-off-by: Ilya Dryomov idryo...@gmail.com
  ---
  diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
  index 77872da..dae7cd6 100644
  --- a/fs/btrfs/disk-io.c
  +++ b/fs/btrfs/disk-io.c
  @@ -2492,9 +2497,6 @@ retry_root_backup:
  err = btrfs_orphan_cleanup(fs_info-tree_root);
  up_read(fs_info-cleanup_work_sem);
   
  -   if (!err)
  -   err = btrfs_recover_balance(fs_info-tree_root);
  -
  if (err) {
  close_ctree(tree_root);
  return err;
  @@ -2518,6 +2520,9 @@ fail_cleaner:
   fail_block_groups:
  btrfs_free_block_groups(fs_info);
   
  +fail_balance_ctl:
  +   kfree(fs_info-balance_ctl);
 
 I think you need to set fs_info-balance_ctl to NULL, otherwise this
 could lead to double free from free_fs_info. I was looking along the

Yes, I do.  I meant to call unset_balance_control(fs_info) there, but
changed it to kfree(), because of the BUG_ON() in the former.

unset_balance_control(), of course, sets -balance_ctl to NULL ;)

 call paths and didn't see free_fs_info called on the mount failure path:
 
 vfs-mount
   btrfs_mount
 btrfs_fill_super
   open_ctree
 (recover balance fails, frees ctl)
 
 error is propagated back to vfs, no other fs callback is done (like
 kill_super which does call free_fs_info).
 
 The only exit path that is not going through free_fs_info is after error
 from btrfs_fill_super, and this can fail from various reasons.
 
 Either I'm missing something, or we leak a btrfs_fs_info every time a
 mount fails ...

No, we don't, you just missed it.  It's freed from btrfs_kill_super(),
which is called from deactivate_locked_super() after btrfs_fill_super()
errors out.

 
 
 Back to your patch, apart from the balance_ctl pointer reset, both are
 ok and given the number of bug reports [useless padding text here]
 
   this should go to 3.5-rc.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: do not return EINVAL instead of ENOMEM from open_ctree()

2012-06-22 Thread Ilya Dryomov
When bailing from open_ctree() err is returned, not ret.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/disk-io.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e1890b1..f06db81 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2244,7 +2244,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(fs_info-caching_workers);
ret |= btrfs_start_workers(fs_info-readahead_workers);
if (ret) {
-   ret = -ENOMEM;
+   err = -ENOMEM;
goto fail_sb_buffer;
}
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting

2012-06-22 Thread Ilya Dryomov
There used to be a BUG_ON(ret) there before EH patch (79787eaa) went in.
Bail out with EINVAL.

Cc: David Sterba dste...@suse.cz
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/disk-io.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f06db81..77872da 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2461,8 +2461,8 @@ retry_root_backup:
 
if (!(sb-s_flags  MS_RDONLY)) {
ret = btrfs_cleanup_fs_roots(fs_info);
-   if (ret) {
-   }
+   if (ret)
+   goto fail_trans_kthread;
 
ret = btrfs_recover_relocation(tree_root);
if (ret  0) {
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: resume balance on rw (re)mounts properly

2012-06-22 Thread Ilya Dryomov
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/disk-io.c |   24 +++-
 fs/btrfs/super.c   |4 
 fs/btrfs/volumes.c |   36 +++-
 fs/btrfs/volumes.h |1 +
 4 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dae7cd6..e863f58 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2490,17 +2490,23 @@ retry_root_backup:
goto fail_trans_kthread;
}
 
-   if (!(sb-s_flags  MS_RDONLY)) {
-   down_read(fs_info-cleanup_work_sem);
-   err = btrfs_orphan_cleanup(fs_info-fs_root);
-   if (!err)
-   err = btrfs_orphan_cleanup(fs_info-tree_root);
+   if (sb-s_flags  MS_RDONLY)
+   return 0;
+
+   down_read(fs_info-cleanup_work_sem);
+   if ((ret = btrfs_orphan_cleanup(fs_info-fs_root)) ||
+   (ret = btrfs_orphan_cleanup(fs_info-tree_root))) {
up_read(fs_info-cleanup_work_sem);
+   close_ctree(tree_root);
+   return ret;
+   }
+   up_read(fs_info-cleanup_work_sem);
 
-   if (err) {
-   close_ctree(tree_root);
-   return err;
-   }
+   ret = btrfs_resume_balance_async(fs_info);
+   if (ret) {
+   printk(KERN_WARNING btrfs: failed to resume balance\n);
+   close_ctree(tree_root);
+   return ret;
}
 
return 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0eb9a4d..e239915 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1187,6 +1187,10 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
if (ret)
goto restore;
 
+   ret = btrfs_resume_balance_async(fs_info);
+   if (ret)
+   goto restore;
+
sb-s_flags = ~MS_RDONLY;
}
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f7649b9..0f1778c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2845,28 +2845,46 @@ out:
 
 static int balance_kthread(void *data)
 {
-   struct btrfs_balance_control *bctl =
-   (struct btrfs_balance_control *)data;
-   struct btrfs_fs_info *fs_info = bctl-fs_info;
+   struct btrfs_fs_info *fs_info = data;
int ret = 0;
 
mutex_lock(fs_info-volume_mutex);
mutex_lock(fs_info-balance_mutex);
 
-   set_balance_control(bctl);
-
-   if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
-   printk(KERN_INFO btrfs: force skipping balance\n);
-   } else {
+   if (fs_info-balance_ctl) {
printk(KERN_INFO btrfs: continuing balance\n);
-   ret = btrfs_balance(bctl, NULL);
+   ret = btrfs_balance(fs_info-balance_ctl, NULL);
}
 
mutex_unlock(fs_info-balance_mutex);
mutex_unlock(fs_info-volume_mutex);
+
return ret;
 }
 
+int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info)
+{
+   struct task_struct *tsk;
+
+   spin_lock(fs_info-balance_lock);
+   if (!fs_info-balance_ctl) {
+   spin_unlock(fs_info-balance_lock);
+   return 0;
+   }
+   spin_unlock(fs_info-balance_lock);
+
+   if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
+   printk(KERN_INFO btrfs: force skipping balance\n);
+   return 0;
+   }
+
+   tsk = kthread_run(balance_kthread, fs_info, btrfs-balance);
+   if (IS_ERR(tsk))
+   return PTR_ERR(tsk);
+
+   return 0;
+}
+
 int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 {
struct btrfs_balance_control *bctl;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index e1b1a64..95f6637 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -281,6 +281,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 
new_size);
 int btrfs_init_new_device(struct btrfs_root *root, char *path);
 int btrfs_balance(struct btrfs_balance_control *bctl,
  struct btrfs_ioctl_balance_args *bargs);
+int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info);
 int btrfs_recover_balance(struct btrfs_fs_info *fs_info);
 int btrfs_pause_balance(struct btrfs_fs_info *fs_info);
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: restore restriper state on all mounts

2012-06-22 Thread Ilya Dryomov
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts.  This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/disk-io.c |   15 ++-
 fs/btrfs/volumes.c |   39 +++
 fs/btrfs/volumes.h |2 +-
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 77872da..dae7cd6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2354,17 +2354,22 @@ retry_root_backup:
  BTRFS_CSUM_TREE_OBJECTID, csum_root);
if (ret)
goto recovery_tree_root;
-
csum_root-track_dirty = 1;
 
fs_info-generation = generation;
fs_info-last_trans_committed = generation;
 
+   ret = btrfs_recover_balance(fs_info);
+   if (ret) {
+   printk(KERN_WARNING btrfs: failed to recover balance\n);
+   goto fail_tree_roots;
+   }
+
ret = btrfs_init_dev_stats(fs_info);
if (ret) {
printk(KERN_ERR btrfs: failed to init dev_stats: %d\n,
   ret);
-   goto fail_block_groups;
+   goto fail_balance_ctl;
}
 
ret = btrfs_init_space_info(fs_info);
@@ -2492,9 +2497,6 @@ retry_root_backup:
err = btrfs_orphan_cleanup(fs_info-tree_root);
up_read(fs_info-cleanup_work_sem);
 
-   if (!err)
-   err = btrfs_recover_balance(fs_info-tree_root);
-
if (err) {
close_ctree(tree_root);
return err;
@@ -2518,6 +2520,9 @@ fail_cleaner:
 fail_block_groups:
btrfs_free_block_groups(fs_info);
 
+fail_balance_ctl:
+   kfree(fs_info-balance_ctl);
+
 fail_tree_roots:
free_root_pointers(fs_info, 1);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8a3d259..f7649b9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2867,9 +2867,8 @@ static int balance_kthread(void *data)
return ret;
 }
 
-int btrfs_recover_balance(struct btrfs_root *tree_root)
+int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 {
-   struct task_struct *tsk;
struct btrfs_balance_control *bctl;
struct btrfs_balance_item *item;
struct btrfs_disk_balance_args disk_bargs;
@@ -2882,29 +2881,30 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
if (!path)
return -ENOMEM;
 
-   bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
-   if (!bctl) {
-   ret = -ENOMEM;
-   goto out;
-   }
-
key.objectid = BTRFS_BALANCE_OBJECTID;
key.type = BTRFS_BALANCE_ITEM_KEY;
key.offset = 0;
 
-   ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0);
+   ret = btrfs_search_slot(NULL, fs_info-tree_root, key, path, 0, 0);
if (ret  0)
-   goto out_bctl;
+   goto out;
if (ret  0) { /* ret = -ENOENT; */
ret = 0;
-   goto out_bctl;
+   goto out;
+   }
+
+   bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
+   if (!bctl) {
+   ret = -ENOMEM;
+   goto out;
}
 
leaf = path-nodes[0];
item = btrfs_item_ptr(leaf, path-slots[0], struct btrfs_balance_item);
 
-   bctl-fs_info = tree_root-fs_info;
-   bctl-flags = btrfs_balance_flags(leaf, item) | BTRFS_BALANCE_RESUME;
+   bctl-fs_info = fs_info;
+   bctl-flags = btrfs_balance_flags(leaf, item);
+   bctl-flags |= BTRFS_BALANCE_RESUME;
 
btrfs_balance_data(leaf, item, disk_bargs);
btrfs_disk_balance_args_to_cpu(bctl-data, disk_bargs);
@@ -2913,14 +2913,13 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
btrfs_balance_sys(leaf, item, disk_bargs);
btrfs_disk_balance_args_to_cpu(bctl-sys, disk_bargs);
 
-   tsk = kthread_run(balance_kthread, bctl, btrfs-balance);
-   if (IS_ERR(tsk))
-   ret = PTR_ERR(tsk);
-   else
-   goto out;
+   mutex_lock(fs_info-volume_mutex);
+   mutex_lock(fs_info-balance_mutex);
 
-out_bctl:
-   kfree(bctl);
+   set_balance_control(bctl);
+
+   mutex_unlock(fs_info-balance_mutex);
+   mutex_unlock(fs_info-volume_mutex);
 out:
btrfs_free_path(path);
return ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 74366f2..e1b1a64 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -281,7 +281,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 
new_size);
 int btrfs_init_new_device(struct btrfs_root *root, char *path);
 int btrfs_balance(struct btrfs_balance_control *bctl,
  struct

[PATCH 0/2] Restriper restore bugfix

2012-06-22 Thread Ilya Dryomov
This fixes restriper restore bug which triggered asserts for users with
interrupted balances on rootfs btrfs.

Thanks,

Ilya


Ilya Dryomov (2):
  Btrfs: restore restriper state on all mounts
  Btrfs: resume balance on rw (re)mounts properly

 fs/btrfs/disk-io.c |   39 ++--
 fs/btrfs/super.c   |4 +++
 fs/btrfs/volumes.c |   73 
 fs/btrfs/volumes.h |3 +-
 4 files changed, 76 insertions(+), 43 deletions(-)

-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] Btrfs: count the chunks which will be relocated at first

2012-05-17 Thread Ilya Dryomov
On Thu, May 17, 2012 at 07:56:53PM +0800, Miao Xie wrote:
 the balance function should count the chunks which will be relocated at first,
 and then relocate those chunks one by one.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 759d024..91da8a2 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2580,7 +2580,7 @@ again:
  
   chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
  
 - if (!counting) {
 + if (counting) {
   spin_lock(fs_info-balance_lock);
   bctl-stat.considered++;
   spin_unlock(fs_info-balance_lock);

__btrfs_balance() already calculates the approximate number of chunks
that will be relocated and stores that value in bctl-stat.expected.
The stat.considered counter OTOH is supposed to reflect the number of
chunks processed through balance filters and it is meant to be updated
at relocation pass, so AFAICS if (!counting) is the right test.

What exactly are you trying to fix here ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] Btrfs: pause/recover the space balance when doing remount

2012-05-17 Thread Ilya Dryomov
On Thu, May 17, 2012 at 07:57:40PM +0800, Miao Xie wrote:
 pause the space balance threads when remounting the fs to be readonly,
 and recover it when remounting it from r/o to r/w
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/super.c   |9 -
  fs/btrfs/volumes.c |8 +++-
  2 files changed, 15 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 7deb00e..ea17f0a 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1148,6 +1148,9 @@ static int btrfs_remount(struct super_block *sb, int 
 *flags, char *data)
   if (*flags  MS_RDONLY) {
   sb-s_flags |= MS_RDONLY;
  
 + /* pause restriper - we want to resume on remount to r/w */
 + btrfs_pause_balance(root-fs_info);
 +
   ret = btrfs_commit_super(root);
   if (ret)
   goto restore;
 @@ -1174,7 +1177,10 @@ static int btrfs_remount(struct super_block *sb, int 
 *flags, char *data)
   if (ret)
   goto restore;
  
 - sb-s_flags = ~MS_RDONLY;
 + if (sb-s_flags  MS_RDONLY) {
 + sb-s_flags = ~MS_RDONLY;
 + btrfs_recover_balance(fs_info-tree_root);
 + }
   }
  
   return 0;
 @@ -1190,6 +1196,7 @@ restore:
   fs_info-alloc_start = old_alloc_start;
   fs_info-thread_pool_size = old_thread_pool_size;
   fs_info-metadata_ratio = old_metadata_ratio;
 +
   return ret;
  }
  
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 91da8a2..c536d52 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2833,7 +2833,13 @@ static int balance_kthread(void *data)
   mutex_lock(fs_info-volume_mutex);
   mutex_lock(fs_info-balance_mutex);
  
 - set_balance_control(bctl);
 + if (fs_info-balance_ctl) {
 + kfree(bctl);
 + bctl = fs_info-balance_ctl;
 + bctl-flags = bctl-flags | BTRFS_BALANCE_RESUME;
 + } else {
 + set_balance_control(bctl);
 + }
  
   if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
   printk(KERN_INFO btrfs: force skipping balance\n);

This is a known bug.  There is a deeper problem here, related to the
fact that we restore balancing state not early enough and that we don't
restore it on ro mounts at all.  I have a patch in the works to fix that
problem, and it also fixes this one the right way.  I'm backed up with
other things right now, but I'll post it as soon as I get a chance.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Btrfs: fix memory leak in btrfs_pause_balance()

2012-05-17 Thread Ilya Dryomov
On Thu, May 17, 2012 at 07:58:53PM +0800, Miao Xie wrote:
 We forget to free fs_info-balance_ctl in the btrfs_pause_balance()
 when umounting the fs.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index c536d52..fd7fe80 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2937,6 +2937,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
   ret = -ENOTCONN;
   }
  
 + if (btrfs_fs_closing(fs_info)  fs_info-balance_ctl)
 + unset_balance_control(fs_info);
 +
   mutex_unlock(fs_info-balance_mutex);
   return ret;
  }

It is kfree()'d in free_fs_info(), which should be called on unmount.
Am I missing something here ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: failed disk (was: kernel 3.3.4 damages filesystem (?))

2012-05-09 Thread Ilya Dryomov
On Wed, May 09, 2012 at 03:37:35PM +0100, Hugo Mills wrote:
 On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote:
  Du meintest am 07.05.12:
  
  [...]
  
   With a file system like ext2/3/4 I can work with several directories
   which are mounted together, but (as said before) one broken disk
   doesn't disturb the others.
  
  mkfs.btrfs -m raid1 -d single should give you that.
  
  Just a small bug, perhaps:
  
  created a system with
  
  mkfs.btrfs -m raid1 -d single /dev/sdl1
  mount /dev/sdl1 /mnt/Scsi
  btrfs device add /dev/sdk1 /mnt/Scsi
  btrfs device add /dev/sdm1 /mnt/Scsi
  (filling with data)
  
  and
  
  btrfs fi df /mnt/Scsi
  
  now tells
  
  Data, RAID0: total=183.18GB, used=76.60GB
  Data: total=80.01GB, used=79.83GB
  System, DUP: total=8.00MB, used=32.00KB
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=1.00GB, used=192.74MB
  Metadata: total=8.00MB, used=0.00
  
  --
  
  Data, RAID0 confuses me (not very much ...), and the system for  
  metadata (RAID1) is not told.
 
DUP is two copies of each block, but it allows the two copies to
 live on the same device. It's done this because you started with a
 single device, and you can't do RAID-1 on one device. The first bit of

What Hugo said.  Newer mkfs.btrfs will error out if you try to do this.

 metadata you write to it should automatically upgrade the DUP chunk to
 RAID-1.

We don't upgrade chunks in place, only during balance.

 
As to the spurious upgrade of single to RAID-0, I thought Ilya
 had stopped it doing that. What kernel version are you running?

I did, but again, we were doing it only as part of balance, not as part
of normal operation.

Helmut, do you have any additional data points - the output of btrfs fi
df right after you created FS or somewhere in the middle of filling it ?

Also could you please paste the output of btrfs fi show and tell us what
kernel version you are running ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it possible to reclaim block groups once they are allocated to data or metadata?

2012-05-06 Thread Ilya Dryomov
On Sun, May 06, 2012 at 01:07:06PM +1000, Mike Sampson wrote:
 On Sat, May 5, 2012 at 10:52 PM, Hugo Mills h...@carfax.org.uk wrote:
  On Sat, May 05, 2012 at 10:37:01PM +1000, Mike Sampson wrote:
  Hello list,
 
  recently reformatted my home partition from XFS to RAID1 btrfs. I used
  the default options to mkfs.btrfs except for enabling raid1 for data
  as well as metadata. Filesystem is made up of two 1TB drives.
 
  mike@mercury (0) pts/3 ~ $ sudo btrfs filesystem show
  Label: none  uuid: f08a8896-e03e-4064-9b94-9342fb547e47
        Total devices 2 FS bytes used 888.06GB
        devid    1 size 931.51GB used 931.51GB path /dev/sdb1
        devid    2 size 931.51GB used 931.49GB path /dev/sdc1
 
  Btrfs Btrfs v0.19
 
  mike@mercury (0) pts/3 ~ $ btrfs filesystem df /home
  Data, RAID1: total=893.48GB, used=886.60GB
  Data: total=8.00MB, used=0.00
  System, RAID1: total=8.00MB, used=136.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=38.00GB, used=2.54GB
  Metadata: total=8.00MB, used=0.00
 
  As can be seen I don't have a lot of free space left and while I am
  planning on adding more storage soon I would like to gain a little
  breathing room until I can do this. While I don't have a lot of space
  remaining in Data, RAID1 I do have a good chunk in Metadata, RAID1.
  2.5GB used out of 38GB. Does this excess become available
  automatically to the file system when the block groups in Data, RAID1
  are exhausted or, if not, is there a way to manually reallocate them?
 
    Youre best bet at the moment is to try a partial balance of
  metadata chunks:
 
  # btrfs balance start -m /home
 
    That will rewrite all of your metadata, putting it through the
  allocator again, and removing the original allocated chunks. This
  should have the effect of reducing the allocation of metadata chunks.
 
    You will need a 3.3 kernel, or later, and an up-to-date userspace
  from cmason's git repository.
 
 Gave this a shot and it did help.
 
 mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ uname -r
 3.3.4-2-ARCH
 
 mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ sudo ./btrfs
 balance start -m /home
 
 Done, had to relocate 40 out of 934 chunks
 
 mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ ./btrfs
 filesystem df /home
 Data, RAID1: total=900.97GB, used=880.06GB
 Data: total=8.00MB, used=0.00
 System, RAID1: total=32.00MB, used=136.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=30.50GB, used=2.54GB
 
 There is now 8GB less in Metadata and I was able to delete some files
 as well to free up space. There is still a lot of wasted space in the
 metadata block groups. It seems that it allocates more metadata block
 groups than required for my filesystem. This will do until I am able
 to add a couple of devices to the system. Is there anyway to adjust
 the block group allocation strategy at filesystem creation?

No.  Chunk allocator currently allocates a lot more chunks than actually
needed, and it impacts both balancing and normal operation.  Try this:

# btrfs balance start -musage=10 /home

This is suboptimal, but it should get rid of more chunks.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it possible to reclaim block groups once they are allocated to data or metadata?

2012-05-06 Thread Ilya Dryomov
On Sun, May 06, 2012 at 11:37:27AM +0100, Hugo Mills wrote:
 On Sun, May 06, 2012 at 01:26:45PM +0300, Ilya Dryomov wrote:
  On Sun, May 06, 2012 at 01:07:06PM +1000, Mike Sampson wrote:
   There is now 8GB less in Metadata and I was able to delete some files
   as well to free up space. There is still a lot of wasted space in the
   metadata block groups. It seems that it allocates more metadata block
   groups than required for my filesystem. This will do until I am able
   to add a couple of devices to the system. Is there anyway to adjust
   the block group allocation strategy at filesystem creation?
  
  No.  Chunk allocator currently allocates a lot more chunks than actually
  needed, and it impacts both balancing and normal operation.  Try this:
  
  # btrfs balance start -musage=10 /home
  
  This is suboptimal, but it should get rid of more chunks.
 
While we're talking about it, what is the parameter to the usage
 option? I'm assuming it selects chunks which are less than some amount
 full -- but is the value a percentage, or a quantity in megabytes
 (power-of-10 or power-of-2), or something else?

It's a percentage, so the command above will balance out chunks that are
less than 10 percent full.  I'll update btrfs man page and the wiki page
you started as soon as I can.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: balancing metadata fails with no space left on device

2012-05-06 Thread Ilya Dryomov
On Sun, May 06, 2012 at 01:19:38PM +0200, Martin Steigerwald wrote:
 Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald:
  Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald:
   Hi!
   
   merkaba:~ btrfs balance start -m /
   ERROR: error during balancing '/' - No space left on device
   There may be more info in syslog - try dmesg | tail
   merkaba:~#19 dmesg | tail -22
   [   62.918734] CPU0: Package power limit normal
   [  525.229976] btrfs: relocating block group 20422066176 flags 1
   [  526.940452] btrfs: found 3048 extents
   [  528.803778] btrfs: found 3048 extents
 […]
   [  635.906517] btrfs: found 1 extents
   [  636.038096] btrfs: 1 enospc errors during balance
   
   
   merkaba:~ btrfs filesystem show
   failed to read /dev/sr0
   Label: 'debian'  uuid: […]
   
   Total devices 1 FS bytes used 7.89GB
   devid1 size 18.62GB used 17.58GB path /dev/dm-0
   
   Btrfs Btrfs v0.19
   merkaba:~ btrfs filesystem df /
   Data: total=15.52GB, used=7.31GB
   System, DUP: total=32.00MB, used=4.00KB
   System: total=4.00MB, used=0.00
   Metadata, DUP: total=1.00GB, used=587.83MB
  
  I thought data tree might have been to big, so out of curiousity I
  tried a full balance. It shrunk the data tree but it failed as well:
  
  merkaba:~ btrfs balance start /
  ERROR: error during balancing '/' - No space left on device
  There may be more info in syslog - try dmesg | tail
  merkaba:~#19 dmesg | tail -63
  [   89.306718] postgres (2876): /proc/2876/oom_adj is deprecated,
  please use /proc/2876/oom_score_adj instead.
  [  159.939728] btrfs: relocating block group 21994930176 flags 34
  [  160.010427] btrfs: relocating block group 21860712448 flags 1
  [  161.188104] btrfs: found 6 extents
  [  161.507388] btrfs: found 6 extents
 […]
  [  335.897953] btrfs: relocating block group 1103101952 flags 1
  [  347.888295] btrfs: found 28458 extents
  [  352.736987] btrfs: found 28458 extents
  [  353.099659] btrfs: 1 enospc errors during balance
  
  merkaba:~ btrfs filesystem df /
  Data: total=10.00GB, used=7.31GB
  System, DUP: total=64.00MB, used=4.00KB
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=1.12GB, used=587.20MB
  
  merkaba:~ btrfs filesystem show
  failed to read /dev/sr0
  Label: 'debian'  uuid: […]
  Total devices 1 FS bytes used 7.88GB
  devid1 size 18.62GB used 12.38GB path /dev/dm-0
  
  
  For the sake of it I tried another time. It failed again:
  
  martin@merkaba:~ dmesg | tail -32
  [  353.099659] btrfs: 1 enospc errors during balance
  [  537.057375] btrfs: relocating block group 32833011712 flags 36
 […]
  [  641.479140] btrfs: relocating block group 22062039040 flags 34
  [  641.695614] btrfs: relocating block group 22028484608 flags 34
  [  641.840179] btrfs: found 1 extents
  [  641.965843] btrfs: 1 enospc errors during balance
  
  
  merkaba:~#19 btrfs filesystem df /
  Data: total=10.00GB, used=7.31GB
  System, DUP: total=32.00MB, used=4.00KB
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=1.12GB, used=586.74MB
  merkaba:~ btrfs filesystem show
  failed to read /dev/sr0
  Label: 'debian'  uuid: […]
  Total devices 1 FS bytes used 7.88GB
  devid1 size 18.62GB used 12.32GB path /dev/dm-0
  
  Btrfs Btrfs v0.19
  
  
  Well, in order to be gentle to the SSD again I stop my experiments now
  ;).
 
 I had subjective impression that the speed of the BTRFS filesystem 
 decreased after all these
 
 Anyway, after reading the a -musage hint by Ilya in thread
 
 Is it possible to reclaim block groups once they ar allocated to data or 
 metadata?

Currently there is no way to reclaim block groups other than performing
a balance.  We will add a kernel thread for this in future, but a couple
of things have to be fixed before that can happen.

 
 
 I tried:
 
 merkaba:~ btrfs filesystem df /
 Data: total=10.00GB, used=7.34GB
 System, DUP: total=32.00MB, used=4.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=1.12GB, used=586.39MB
 
 merkaba:~ btrfs balance start -musage=1 / 
 Done, had to relocate 2 out of 13 chunks
 
 merkaba:~ btrfs filesystem df /  
 Data: total=10.00GB, used=7.34GB
 System, DUP: total=32.00MB, used=4.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=1.00GB, used=586.39MB
 
 So this worked.
 
 But I wasn´t able to specify less than a Gig:

A follow up to the -musage hint says that the argument it takes is the
percentage.  That is -musage=X will balance out block groups that are
less than X percent used.

 
 merkaba:~ btrfs balance start -musage=0.8 /
 Invalid usage argument: 0.8
 merkaba:~#1 btrfs balance start -musage=700M /
 Invalid usage argument: 700M
 
 
 When I try without usage I get the old behavior back:
 
 merkaba:~#1 btrfs balance start -m /  
 ERROR: error during balancing '/' - No space left on device
 There may be more info in syslog - try dmesg | tail
 
 
 merkaba:~ btrfs balance start -musage=1 /   
 Done, had to relocate 2 out of 13 chunks
 

Re: 'filesystem resize max' tries to use devid 1

2012-04-24 Thread Ilya Dryomov
On Tue, Apr 24, 2012 at 11:31:15AM -0400, Josef Bacik wrote:
 On Sun, Apr 22, 2012 at 11:03:06PM -0400, Jeremy Atkins wrote:
  Back story:
  I started my pool with a 200gb partition at the end of my drive (sdc5)
  , until I was able to clear out the data at the beginning of my drive.
   When I was ready, I ran `btrfs dev add /dev/sdc4 /` then `btrfs dev
  del /dev/sdc5 /`,
  
  $ sudo btrfs fi resize max /
  Resize '/' of 'max'
  ERROR: unable to resize '/' - Invalid argument
  
  in dmesg i see:
  [   72.729685] btrfs: resizer unable to find device 1
  
  $ sudo btrfs fi df /
  Data: total=34.00GB, used=17.79GB
  System, DUP: total=64.00MB, used=20.00KB
  Metadata, DUP: total=33.50GB, used=394.93MB
  
  $ sudo btrfs fi sh
  Label: none  uuid: b0ad55e2-09e0-4658-8cab-d2e11ba03753
          Total devices 1 FS bytes used 17.18GB
          devid    2 size 1.62TB used 101.12GB path /dev/sdc4
  
  
  $ uname -r
  3.3.2-1.fc16.x86_64
  
  btrfs-progs version is current git master ( commit 1957076 )
  
  After writing this email, and searching around the wiki some, I
  discovered the command to resize specific devids,
  
  [antrat@tbox ~]$ sudo btrfs fi resize 2:max /
  Resize '/' of '2:max'
  
  and in dmesg:
  [ 1661.933884] btrfs: resizing devid 2
  [ 1661.933895] btrfs: new size for /dev/sdc4 is 1995564908544
 
 Hrm, if we're using filesystem resize max we should probably just max out all
 the devices and not require one specific device, what do you think Ilya?

I'd keep the devid:max syntax for maxing out a single device and add a
new fs:max or similar to resize all devices.  ABI here is just a string,
so I think it can be easily done.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'filesystem resize max' tries to use devid 1

2012-04-24 Thread Ilya Dryomov
On Tue, Apr 24, 2012 at 11:42:26AM -0400, Josef Bacik wrote:
 On Tue, Apr 24, 2012 at 06:39:03PM +0300, Ilya Dryomov wrote:
  On Tue, Apr 24, 2012 at 11:31:15AM -0400, Josef Bacik wrote:
   On Sun, Apr 22, 2012 at 11:03:06PM -0400, Jeremy Atkins wrote:
Back story:
I started my pool with a 200gb partition at the end of my drive (sdc5)
, until I was able to clear out the data at the beginning of my drive.
 When I was ready, I ran `btrfs dev add /dev/sdc4 /` then `btrfs dev
del /dev/sdc5 /`,

$ sudo btrfs fi resize max /
Resize '/' of 'max'
ERROR: unable to resize '/' - Invalid argument

in dmesg i see:
[   72.729685] btrfs: resizer unable to find device 1

$ sudo btrfs fi df /
Data: total=34.00GB, used=17.79GB
System, DUP: total=64.00MB, used=20.00KB
Metadata, DUP: total=33.50GB, used=394.93MB

$ sudo btrfs fi sh
Label: none  uuid: b0ad55e2-09e0-4658-8cab-d2e11ba03753
        Total devices 1 FS bytes used 17.18GB
        devid    2 size 1.62TB used 101.12GB path /dev/sdc4


$ uname -r
3.3.2-1.fc16.x86_64

btrfs-progs version is current git master ( commit 1957076 )

After writing this email, and searching around the wiki some, I
discovered the command to resize specific devids,

[antrat@tbox ~]$ sudo btrfs fi resize 2:max /
Resize '/' of '2:max'

and in dmesg:
[ 1661.933884] btrfs: resizing devid 2
[ 1661.933895] btrfs: new size for /dev/sdc4 is 1995564908544
   
   Hrm, if we're using filesystem resize max we should probably just max out 
   all
   the devices and not require one specific device, what do you think Ilya?
  
  I'd keep the devid:max syntax for maxing out a single device and add a
  new fs:max or similar to resize all devices.  ABI here is just a string,
  so I think it can be easily done.
  
 
 Right I think the devid:max thing is handy for people who want to do specific
 things, but I think just a normal btrfs fi resize max should do all devices 
 for
 simplicity and easy of use, no need for a fs:max.  Thanks,

Sorry, I misread.  'btrfs fi resize max mnt' should of course resize
the entire FS.  I think this popped up before, at least a couple times.
I'll do it if nobody else wants to.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix max chunk size check in chunk allocator

2012-04-13 Thread Ilya Dryomov
Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.

Cc: Arne Jansen sensi...@gmx.net
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a872b48..c462d84 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3324,12 +3324,14 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
stripe_size = devices_info[ndevs-1].max_avail;
num_stripes = ndevs * dev_stripes;
 
-   if (stripe_size * num_stripes  max_chunk_size * ncopies) {
+   if (stripe_size * ndevs  max_chunk_size * ncopies) {
stripe_size = max_chunk_size * ncopies;
-   do_div(stripe_size, num_stripes);
+   do_div(stripe_size, ndevs);
}
 
do_div(stripe_size, dev_stripes);
+
+   /* align to BTRFS_STRIPE_LEN */
do_div(stripe_size, BTRFS_STRIPE_LEN);
stripe_size *= BTRFS_STRIPE_LEN;
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/volumes.c:2733

2012-04-10 Thread Ilya Dryomov
On Fri, Mar 30, 2012 at 10:44:05PM +0200, Sander wrote:
 Ilya Dryomov wrote (ao):
  On Fri, Mar 30, 2012 at 07:49:56PM +0200, Sander wrote:
  Thanks. btrfs-debug-tree confirms that you've got a balance item on
  media.
 
  After that mount it back and see if there is btrfs: continuing
  balance line in dmesg (and if btrfs-balance kthread shows up)?
   
   There is no such line in dmesg, and currently no btrfs-balance kthread
   is running. I've pulled Chris Masons for-linus and booted with the
   resulting kernel.
  
  And given the above it's weird. We are failing to locate the item
  during mount for some reason and I would like to find out why. So if
  you are up for running debugging patches (really just compiling btrfs
  module and sending me dmesg output) I would appreciate that.
 
 Sure, please send me patches.

I'm sorry this took a week, I was backed up.  If you still have that fs
around in that state, could you please apply the patch below, mount it
and send me dmesg output ?  (no need to run balance or anything, just
mount)

Thanks,

Ilya



diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 20196f4..86fa082 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1867,6 +1867,7 @@ int open_ctree(struct super_block *sb,
csum_root = fs_info-csum_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info);
dev_root = fs_info-dev_root = btrfs_alloc_root(fs_info);
+printk(open_ctree\n);
 
if (!tree_root || !extent_root || !csum_root ||
!chunk_root || !dev_root) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a872b48..2e39348 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2834,6 +2834,7 @@ static int balance_kthread(void *data)
mutex_lock(fs_info-balance_mutex);
 
set_balance_control(bctl);
+printk(balance_kthread: flags %llu\n, (unsigned long long)bctl-flags);
 
if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
printk(KERN_INFO btrfs: force skipping balance\n);
@@ -2858,6 +2859,7 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
struct btrfs_key key;
int ret;
 
+printk(recover_balance\n);
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -2872,7 +2874,11 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
key.type = BTRFS_BALANCE_ITEM_KEY;
key.offset = 0;
 
+printk(key.obj %llu\n, (unsigned long long)key.objectid);
+printk(key.type %d\n, key.type);
+printk(key.off %llu\n, (unsigned long long)key.offset);
ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0);
+printk(search ret %d\n, ret);
if (ret  0)
goto out_bctl;
if (ret  0) { /* ret = -ENOENT; */
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove lock assert from get_restripe_target()

2012-04-09 Thread Ilya Dryomov
This fixes a regression introduced by fc67c450.  spin_is_locked() always
returns 0 on UP kernels, which caused assert in get_restripe_target() to
be fired on every call from btrfs_reduce_alloc_profile() on UP systems.
Remove it completely for now, it's not clear if it's going to be needed
in future.

Reported-by: Bobby Powers bobbypow...@gmail.com
Reported-by: Mitch Harder mitch.har...@sabayonlinux.org
Tested-by: Mitch Harder mitch.har...@sabayonlinux.org
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/extent-tree.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a844204..8db0884 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3152,15 +3152,14 @@ static void set_avail_alloc_bits(struct btrfs_fs_info 
*fs_info, u64 flags)
 /*
  * returns target flags in extended format or 0 if restripe for this
  * chunk_type is not in progress
+ *
+ * should be called with either volume_mutex or balance_lock held
  */
 static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
 {
struct btrfs_balance_control *bctl = fs_info-balance_ctl;
u64 target = 0;
 
-   BUG_ON(!mutex_is_locked(fs_info-volume_mutex) 
-  !spin_is_locked(fs_info-balance_lock));
-
if (!bctl)
return 0;
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at extent-tree.c:3162 (get_restripe_target)

2012-04-06 Thread Ilya Dryomov
On Fri, Apr 06, 2012 at 10:52:13AM -0500, Mitch Harder wrote:
 On Fri, Apr 6, 2012 at 10:43 AM, David Sterba d...@jikos.cz wrote:
  On Fri, Apr 06, 2012 at 10:33:04AM -0500, Mitch Harder wrote:
  I'm consistently hitting the BUG_ON in the new get_restripe_target
  function that was added by [PATCH 4/8] Btrfs: add
  get_restripe_target() helper when trying to initially mount a btrfs
  partition.
 
  Lines fs/btrfs/extent-tree.c:3161-3162:
 
        BUG_ON(!mutex_is_locked(fs_info-volume_mutex) 
               !spin_is_locked(fs_info-balance_lock));
 
  Is it a single CPU machine? See
 
  http://www.spinics.net/lists/linux-btrfs/msg15884.html
 
 
 Thanks!
 
 Yes, this is a single CPU machine.  I'll test the solution from that thread.

Sorry about that.  I'm going to hold off a bit to see if there will be
any objections to v2 from that thread, but we will definitely do
something about this by rc3 or rc4 at worst.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs, lockdep: get_restripe_target: use lockdep in BUG_ON

2012-04-06 Thread Ilya Dryomov
On Wed, Apr 04, 2012 at 10:04:12PM -0400, Bobby Powers wrote:
 spin_is_locked always returns 0 on non-SMP systems, which causes btrfs
 to fail the mount.  There is documentation pending as to why checking
 for spin_is_locked is a bad idea:
 
 https://lkml.org/lkml/2012/3/27/413
 
 The suggested lockdep_assert_held() is not appropriate in this case,
 as what get_restripe_target() is checking for is that either
 volume_mutex is held or balance_lock is held.  Luckily
 lockdep_assert_held() is a simple macro:
 
 WARN_ON(debug_locks  !lockdep_is_held(l))
 
 We can mimic the structure in get_restripe_target(), but we need to
 make sure lockdep_is_held() is defined for the !LOCKDEP case.
 
 CC: Ilya Dryomov idryo...@gmail.com
 CC: Chris Mason chris.ma...@oracle.com
 CC: Andi Kleen a...@linux.intel.com
 CC: Jeff Mahoney je...@suse.de
 CC: Ingo Molnar mi...@redhat.com
 CC: linux-ker...@vger.kernel.org
 Signed-off-by: Bobby Powers bobbypow...@gmail.com
 ---
  fs/btrfs/extent-tree.c  |5 +++--
  include/linux/lockdep.h |1 +
  2 files changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index a844204..4d13eb1 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -24,6 +24,7 @@
  #include linux/kthread.h
  #include linux/slab.h
  #include linux/ratelimit.h
 +#include linux/lockdep.h
  #include compat.h
  #include hash.h
  #include ctree.h
 @@ -3158,8 +3159,8 @@ static u64 get_restripe_target(struct btrfs_fs_info 
 *fs_info, u64 flags)
   struct btrfs_balance_control *bctl = fs_info-balance_ctl;
   u64 target = 0;
  
 - BUG_ON(!mutex_is_locked(fs_info-volume_mutex) 
 -!spin_is_locked(fs_info-balance_lock));
 + BUG_ON(debug_locks  !lockdep_is_held(fs_info-volume_mutex) 
 +!lockdep_is_held(fs_info-balance_lock));
  
   if (!bctl)
   return 0;
 diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
 index d36619e..94c0edb 100644
 --- a/include/linux/lockdep.h
 +++ b/include/linux/lockdep.h
 @@ -392,6 +392,7 @@ struct lock_class_key { };
  
  #define lockdep_depth(tsk)   (0)
  
 +#define lockdep_is_held(l)   (0)
  #define lockdep_assert_held(l)   do { } while (0)
  
  #define lockdep_recursing(tsk)   (0)
 -- 
 1.7.10.rc3.3.g19a6c

OK, Mitch's report prompted me to actually take a look.  This patch is
wrong: by defining lockdep_is_held(l) to 0 in !LOCKDEP case you
effectively mimic the behaviour of spin_is_locked() which is what
getting us in the first place.

get_restripe_target() interface is WIP so I will replace BUG_ON with a
comment and send a patch through btrfs tree.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs, lockdep: get_restripe_target: use lockdep in BUG_ON

2012-04-05 Thread Ilya Dryomov
On Thu, Apr 05, 2012 at 12:23:01PM -0400, Bobby Powers wrote:
 On Wed, Apr 4, 2012 at 10:04 PM, Bobby Powers bobbypow...@gmail.com wrote:
  spin_is_locked always returns 0 on non-SMP systems, which causes btrfs
  to fail the mount.  There is documentation pending as to why checking
 
 I guess I should be explicit in stating that this is a regression, so
 this patch or something else that addresses it should be pulled into
 3.4.

Yes, this is a regression and spin_is_locked() definitely has to go.  I
don't have a strong opinion on this assert, if there are objections to
v2 I'm OK with ripping that BUG_ON entirely and replacing it with a
comment (this function and its callers are WIP).

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/volumes.c:2733

2012-03-30 Thread Ilya Dryomov
On Fri, Mar 30, 2012 at 07:49:56PM +0200, Sander wrote:
 Hello Ilya,
 
 Ilya Dryomov wrote (ao):
I'm definitely intrested in reproducing it. Could you please umount this
filesystem, capture the output of 'btrfs-debug-tree -d dev' and post 
it
somewhere ?
   
   Will do. It is the / filesystem, so I'll need to reboot.
  
  I need this to confirm that balance item is on disk.
 
 I'm sorry it took so long. I'll mail the output to you directly.

Thanks.  btrfs-debug-tree confirms that you've got a balance item on
media.

 
After that mount it back and see if there is btrfs: continuing
balance line in dmesg (and if btrfs-balance kthread shows up)?
 
 There is no such line in dmesg, and currently no btrfs-balance kthread
 is running. I've pulled Chris Masons for-linus and booted with the
 resulting kernel.

And given the above it's weird.  We are failing to locate the item
during mount for some reason and I would like to find out why.  So if
you are up for running debugging patches (really just compiling btrfs
module and sending me dmesg output) I would appreciate that.

If you don't want to do that you can try to compile btrfs-progs from
git, mount fs and run 'btrfs balance resume mnt'.  If that doesn't
work I'll send you a small util that will simply delete the item from
disk.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/volumes.c:2733

2012-03-29 Thread Ilya Dryomov
On Thu, Mar 29, 2012 at 12:52:35PM +0200, Sander wrote:
 Hello all,
 
 I can't seem to balance my btrfs filesystem. It segfaults, and gives a
 kernel bug:
 
 [ 1355.139099] [ cut here ]
 [ 1355.139099] kernel BUG at fs/btrfs/volumes.c:2733!
 [ 1355.149322] Internal error: Oops - BUG: 0 [#1] SMP
 [ 1355.149322] Modules linked in:
 [ 1355.154479] CPU: 0Not tainted  (3.3.0 #8)
 [ 1355.162109] PC is at btrfs_balance+0x312/0xb04
 [ 1355.166778] LR is at btrfs_run_delayed_iputs+0x2d/0xac
 [ 1355.166931] pc : [c0138c3a]lr : [c01234d5]psr: 6033
 [ 1355.166931] sp : cb141d98  ip :   fp : be83fdb4
 [ 1355.166931] r10:   r9 :   r8 : 
 [ 1355.184173] r7 :   r6 : ffef  r5 : ede7f000  r4 : ed730e00
 [ 1355.189636] r3 :   r2 :   r1 :   r0 : 0007
 [ 1355.203277] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment 
 user
 [ 1355.203277] Control: 50c5387d  Table: 8b15c04a  DAC: 0015
 [ 1355.203277] Process btrfs (pid: 1798, stack limit = 0xcb1402f8)
 [ 1355.203277] Stack: (0xcb141d98 to 0xcb142000)
 [ 1355.203277] 1d80:   
 c145f944 
 [ 1355.227691] 1da0:     0003 ee478d40 
 0015 
 [ 1355.227691] 1dc0:       
  
 [ 1355.253356] 1de0:       
 ede7fcd4 ede7fcd8
 [ 1355.253356] 1e00:      271aee1c 
 000200da c0160fd5
 [ 1355.253356] 1e20: ed74ec00  d6257680  ed730e00  
 ede7f4e8 ede7fcb0
 [ 1355.279022] 1e40: ede7f000  be83fdb4 c013d489 eec9c118 be83ebf8 
 ed74ec00 eec8a370
 [ 1355.279022] 1e60: d6257680  eec8a528  be83fdb4 c013fc6b 
 001d 00eb
 [ 1355.279022] 1e80: 0007 0001 e6f7e680 c015cecd cb141ea4  
 cb141ef0 
 [ 1355.296142] 1ea0: cb15c000 01ff   0001  
 eeabaac0 0001
 [ 1355.296142] 1ec0: ed5428c0 c1414788 d6257688 00eb cb141ef0 c016011b 
 cb141ef0 
 [ 1355.321807] 1ee0: 0001 c016018f  0817 0001 271aee1c 
 ede92250 d6257680
 [ 1355.321807] 1f00: be83ebf8 be83ebf8 eec8a528  cb14  
 be83fdb4 c0088075
 [ 1355.321807] 1f20: 4000 c00887ff     
  
 [ 1355.338928] 1f40:      271aee1c 
 0003 d6257680
 [ 1355.338928] 1f60: be83ebf8 5000940c d6257680 be83ebf8 5000940c 0003 
  cb14
 [ 1355.364593] 1f80:  c008885d 0003  be83fec7 0003 
 0013c478 0036
 [ 1355.364593] 1fa0: c000c5a4 c000c401 be83fec7 0003 0003 5000940c 
 be83ebf8 be83fbf8
 [ 1355.364593] 1fc0: be83fec7 0003 0013c478 0036 0002 b7ad 
 0001 be83fdb4
 [ 1355.381713] 1fe0: 00024b3d be83ebf0 b7f7 b6ea7f9c 8010 0003 
 00052d17 00090224
 [ 1355.381713] [c0138c3a] (btrfs_balance+0x312/0xb04) from [c013d489] 
 (btrfs_ioctl_balance+0x109/0x174)
 [ 1355.381713] [c013d489] (btrfs_ioctl_balance+0x109/0x174) from 
 [c013fc6b] (btrfs_ioctl+0xbf5/0xd42)
 [ 1355.418518] [c013fc6b] (btrfs_ioctl+0xbf5/0xd42) from [c0088075] 
 (vfs_ioctl+0xd/0x28)
 [ 1355.418518] [c0088075] (vfs_ioctl+0xd/0x28) from [c00887ff] 
 (do_vfs_ioctl+0x35d/0x38e)
 [ 1355.427093] [c00887ff] (do_vfs_ioctl+0x35d/0x38e) from [c008885d] 
 (sys_ioctl+0x2d/0x44)
 [ 1355.88] [c008885d] (sys_ioctl+0x2d/0x44) from [c000c401] 
 (ret_fast_syscall+0x1/0x44)
 [ 1355.88] Code: d107 f116 0f11 d100 (de02) 4620 
 [ 1355.458343] ---[ end trace f06b6b8fcd08e6d5 ]---
 
 A new 'btrfs filesystem balance /' seems to just hang, and is
 unkillable.
 
 After a reboot, I tried again, with the same result:
 
 [   81.048767] [ cut here ]
 [   81.053619] kernel BUG at fs/btrfs/volumes.c:2733!
 [   81.053619] Internal error: Oops - BUG: 0 [#1] SMP
 [   81.059295] Modules linked in:
 [   81.059295] CPU: 1Not tainted  (3.3.0 #8)
 [   81.071411] PC is at btrfs_balance+0x312/0xb04
 [   81.074890] LR is at btrfs_run_delayed_iputs+0x2d/0xac
 [   81.074890] pc : [c0138c3a]lr : [c01234d5]psr: 6133
 [   81.074890] sp : edda5d98  ip :   fp : beb62d64
 [   81.093475] r10:   r9 :   r8 : 
 [   81.098327] r7 :   r6 : ffef  r5 : ed73f000  r4 : ee311c00
 [   81.098327] r3 :   r2 :   r1 :   r0 : 0007
 [   81.112609] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment 
 user
 [   81.112609] Control: 50c5387d  Table: a7fb404a  DAC: 0015
 [   81.112609] Process btrfs (pid: 752, stack limit = 0xedda42f8)
 [   81.132354] Stack: (0xedda5d98 to 0xedda6000)
 [   81.132354] 5d80:   
 c145f944 
 [   81.145477] 5da0:     

Re: kernel BUG at fs/btrfs/volumes.c:2733

2012-03-29 Thread Ilya Dryomov
On Thu, Mar 29, 2012 at 05:14:22PM +0200, Sander wrote:
 Ilya Dryomov wrote (ao):
  On Thu, Mar 29, 2012 at 12:52:35PM +0200, Sander wrote:
   After a reboot, I tried again, with the same result:
   
   [   81.048767] [ cut here ]
   [   81.053619] kernel BUG at fs/btrfs/volumes.c:2733!
   [   81.053619] Internal error: Oops - BUG: 0 [#1] SMP
   [   81.059295] Modules linked in:
   [   81.059295] CPU: 1Not tainted  (3.3.0 #8)
   [   81.071411] PC is at btrfs_balance+0x312/0xb04
   [   81.074890] LR is at btrfs_run_delayed_iputs+0x2d/0xac
 
  So you have balance item on disk, but the kernel doesn't seem to know
  about it in advance, which is odd and so when you try to run balance it
  panics on one of the safety checks.
  
   The system is a pandaboard running a plain Linus kernel 3.3.0 with a
   btrfs filesystem, over two Intel 320 600GB ssd's, connected via usb (on
   an usb hub), on top of md_crypt. Mount options:
   subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd
   
   Before the balance, I deleted about 2500 snapshots and waited for the
   btrfs kernel threads to calm down. Then I initiated a btrfs filesystem
   scrub. Unfortunately during the scrub, the filesystem balance started.
   Might be related.
  
  That's indeed pretty cool, I wonder how that could happen.
 
 I create 5 snapshots of 5 different subvolumes every 5 minutes, and the
 system is low on memory:
 
  total   used   free sharedbuffers cached
 Mem:   745712 33  0  0480
 -/+ buffers/cache:231514
 Swap:0  0  0
 
 There is ample space on the fileystem:
 panda:~# df -h /
 Filesystem  Size  Used Avail Use% 
 Mounted on
 /dev/mapper/ata-INTEL_SSDSA2CW600G3_CVPR112405AJ600FGN  1.1T   17G  1.1T   2% 
 /
 
 panda:~# btrfs filesystem df /
 Data, RAID0: total=24.00GB, used=15.69GB
 System, RAID1: total=64.00MB, used=12.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=23.00GB, used=231.26MB
 
 Do you need more information?

No, that's enough for now.

 
  I'm definitely intrested in reproducing it. Could you please umount this
  filesystem, capture the output of 'btrfs-debug-tree -d dev' and post it
  somewhere ?
 
 Will do. It is the / filesystem, so I'll need to reboot.

I need this to confirm that balance item is on disk.

 
  After that mount it back and see if there is btrfs: continuing
  balance line in dmesg (and if btrfs-balance kthread shows up)?
 
 There was none after the first reboot, but I'll pay extra attention to
 that after the next reboot.
 
  If so, just let it run, it should finish the balance and remove
  on-disk item. (You can query the status of running balance with 'btrfs
  balance status mnt')
 
 Do I need newer tools for that? This is Debian Sid (unstable):

Yeah, you do.  That command is in master now, but it's not really
needed.  If btrfs-balance shows up, just wait for it to finish, it
should get rid of the balance item.  If it doesn't show up but the item
is there we will have to dig deeper.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/volumes.c:2733

2012-03-29 Thread Ilya Dryomov
On Thu, Mar 29, 2012 at 04:39:29PM +0200, Sander wrote:
 Hello Josef,
 
 Josef Bacik wrote (ao):
  On Thu, Mar 29, 2012 at 12:52:35PM +0200, Sander wrote:
   I can't seem to balance my btrfs filesystem. It segfaults, and gives a
   kernel bug:
   
   [ 1355.139099] [ cut here ]
   [ 1355.139099] kernel BUG at fs/btrfs/volumes.c:2733!
   [ 1355.149322] Internal error: Oops - BUG: 0 [#1] SMP
   [ 1355.149322] Modules linked in:
   [ 1355.154479] CPU: 0Not tainted  (3.3.0 #8)
   [ 1355.162109] PC is at btrfs_balance+0x312/0xb04
   [ 1355.166778] LR is at btrfs_run_delayed_iputs+0x2d/0xac
 
   The system is a pandaboard running a plain Linus kernel 3.3.0 with a
   btrfs filesystem, over two Intel 320 600GB ssd's, connected via usb (on
   an usb hub), on top of md_crypt. Mount options:
   subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd
   
   Before the balance, I deleted about 2500 snapshots and waited for the
   btrfs kernel threads to calm down. Then I initiated a btrfs filesystem
   scrub. Unfortunately during the scrub, the filesystem balance started.
   Might be related.
  
  Well that's kind of cool.  So 2 options
  
  1) If you are in a hurry and need this stuff back right away run btrfs fi
  balance resume / and it should work, buuutt
  
  2) If you aren't in a hurry I'd really like to try and reproduce this 
  locally
  and if I can't I'd like to be able to send you patches to help me figure 
  out how
  to fix this problem.
 
 I am in no hurry at all. The filesystem seems just fine the way it is
 (after a reboot), so there is no stuff to get back right away. Does
 the kernel bug suggest the filesystem is fubar?

No, as I said in another mail you are trapping over a simle sanity
check.  FS should be OK.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] Restriper fixes

2012-03-27 Thread Ilya Dryomov
Hi Chris,

The main one here is the improvement to btrfs_can_relocate(), which is
now a tiny bit smarter and does not return ENOSPC when there's plenty of
unallocated space for target chunks.  This, in addition to my patch
which disables silent profile upgrades, should lower a number of
corner cases in profile changing.

The rest are a bunch of cleanups and some minor fixes.

Please pull from

git://github.com/idryomov/btrfs-unstable.git for-chris

top commit 213e64da90d14537cd63f7090d6c4d1fcc75d9f8

Thanks,

Ilya


Ilya Dryomov (8):
  Btrfs: add wrappers for working with alloc profiles
  Btrfs: make profile_is_valid() check more strict
  Btrfs: move alloc_profile_is_valid() to volumes.c
  Btrfs: add get_restripe_target() helper
  Btrfs: add __get_block_group_index() helper
  Btrfs: improve the logic in btrfs_can_relocate()
  Btrfs: validate target profiles only if we are going to use them
  Btrfs: allow dup for data chunks in mixed mode

 fs/btrfs/ctree.h   |   33 +--
 fs/btrfs/extent-tree.c |  158 ++--
 fs/btrfs/volumes.c |   88 ---
 3 files changed, 152 insertions(+), 127 deletions(-)

-- 
1.7.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >