from:"Ilya Dryomov"

Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Ilya Dryomov

Hi,

We are hitting the attached lockup on a somewhat regular basis during
nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
on btrfs_free_cluster::lock, which is held by writer, who seems to be
stuck in an endless loop in btrfs_find_space_cluster(), trying to
cleanup bitmaps list.  Smells like a list corruption to me?

The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but
the surroundings look sufficiently similar to upstream and given recent
patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in
setup_cluster_bitmaps") I thought this might be relevant for upstream.

Thanks,

Ilya
[74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted 
3.13.0-83-generic #127-Ubuntu
[74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
09/07/2015
[74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: 
88083c564000
[74750.666936] RIP: 0010:[]  [] 
_raw_spin_lock+0x37/0x50
[74750.675645] RSP: 0018:88083c565b50  EFLAGS: 0206
[74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: 09a0
[74750.689126] RDX: 09a6 RSI: 09a6 RDI: 88084f6671b0
[74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: 0001
[74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: 8807e829d4e0
[74750.712140] R13: a02073ab R14: 88083c565ae0 R15: 8807e829d4e0
[74750.719811] FS:  () GS:88087fd8() 
knlGS:
[74750.728450] CS:  0010 DS:  ES:  CR0: 80050033
[74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: 001407e0
[74750.742437] Stack:
[74750.745001]  88083c565c28 a01b07a3 88074873a000 
880530131d20
[74750.753033]  880791982000 88083c565cef 1000 
00040002
[74750.761065]  a01a3a97 0020 880523f4a800 
0001
[74750.769105] Call Trace:
[74750.772137]  [] find_free_extent+0x213/0xc30 [btrfs]
[74750.779259]  [] ? btrfs_del_items+0x367/0x470 [btrfs]
[74750.786476]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[74750.793869]  [] __btrfs_prealloc_file_range+0xe5/0x380 
[btrfs]
[74750.801876]  [] btrfs_prealloc_file_range_trans+0x30/0x40 
[btrfs]
[74750.810136]  [] btrfs_write_dirty_block_groups+0x4d3/0x620 
[btrfs]
[74750.818469]  [] commit_cowonly_roots+0x151/0x213 [btrfs]
[74750.825940]  [] btrfs_commit_transaction+0x483/0x970 
[btrfs]
[74750.833765]  [] transaction_kthread+0x1b5/0x240 [btrfs]
[74750.841156]  [] ? btrfs_cleanup_transaction+0x550/0x550 
[btrfs]
[74750.849244]  [] kthread+0xd2/0xf0
[74750.854729]  [] ? kthread_create_on_node+0x1c0/0x1c0
[74750.861861]  [] ret_from_fork+0x58/0x90
[74750.867868]  [] ? kthread_create_on_node+0x1c0/0x1c0

[74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted 
3.13.0-83-generic #127-Ubuntu
[74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
09/07/2015
[74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15)
[74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: 
88078ad98000
[74758.690802] RIP: 0010:[]  [] 
_raw_spin_lock+0x32/0x50
[74758.701202] RSP: 0018:88078ad99728  EFLAGS: 0202
[74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: 09a0
[74758.717910] RDX: 09a4 RSI: 09a4 RDI: 88084f6671b0
[74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: 0001
[74758.736311] R10:  R11: ea001b38bc00 R12: 88084e918770
[74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: 88078ad99698
[74758.754665] FS:  () GS:88087fd4() 
knlGS:
[74758.764805] CS:  0010 DS:  ES:  CR0: 80050033
[74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: 001407e0
[74758.781766] Stack:
[74758.785786]  88078ad99800 a01b07a3 8807e6bc5000 
880853034e70
[74758.795307]  88078ad997b8 88078ad998b7 880853034e40 
00040002
[74758.804838]  88078ad99778 0020 880523f4a800 
0001
[74758.814337] Call Trace:
[74758.818811]  [] find_free_extent+0x213/0xc30 [btrfs]
[74758.827389]  [] ? alloc_extent_state+0x21/0xc0 [btrfs]
[74758.836147]  [] ? __lookup_extent_mapping+0xa0/0x150 
[btrfs]
[74758.845413]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[74758.854224]  [] cow_file_range+0x135/0x430 [btrfs]
[74758.862593]  [] run_delalloc_range+0x312/0x350 [btrfs]
[74758.871310]  [] ? 
find_lock_delalloc_range.constprop.43+0x1b9/0x1f0 [btrfs]
[74758.881857]  [] ? ata_scsi_queuecmd+0x133/0x400
[74758.889974]  [] __extent_writepage+0x2f4/0x760 [btrfs]
[74758.898703]  [] ? btrfs_add_delayed_iput+0x61/0xc0 [btrfs]
[74758.907763]  [] ? __blk_run_queue+0x33/0x40
[74758.915511]  [] ? find_get_pages_tag+0xd1/0x180
[74758.923600]  [] ? kmem_cache_alloc_trace+0x3c/0x1f0
[74758.932051]  [] 
ext

Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Ilya Dryomov

On Wed, Mar 30, 2016 at 7:25 PM, Liu Bo  wrote:
> On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote:
>> Hi,
>>
>> We are hitting the attached lockup on a somewhat regular basis during
>> nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
>> on btrfs_free_cluster::lock, which is held by writer, who seems to be
>> stuck in an endless loop in btrfs_find_space_cluster(), trying to
>> cleanup bitmaps list.  Smells like a list corruption to me?
>
> My objdump shows that find_free_extent() may wait on 
> down_read(&space_info->groups_sem);

It's spinning on a spinlock:

6228 if (last_ptr) {
6229 spin_lock(&last_ptr->lock);
6230 if (last_ptr->block_group)

>
> One possible thing is that there're too many entries in bitmap, and
> list_for_each_entry just is just stuck there.

I don't think so - look at the two journal_write splats.

>
> Are these stacks from "Blocked more than 120s"?

No, these are all soft lockups.  Sorry, I had to edit it to make it
readable and concentrated on stack traces.  I guess "BUG: soft lockup"
were mixed up with modules and object code all over the place.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Possible recursive locking on btrfs_inode::log_mutex

2016-08-20 Thread Ilya Dryomov

Hi,

I'm seeing the attached lockdep splat on 4.8-rc2 - I'm guessing it got
introduced in 4.8-rc.  It appears to be very easy to trigger and shows up
within seconds after mkfs + mount.

Thanks,

Ilya
kernel: =
kernel: [ INFO: possible recursive locking detected ]
kernel: 4.8.0-rc2-ceph-00297-gab9f8d2 #1 Not tainted
kernel: -
kernel: ceph-osd/25016 is trying to acquire lock:
kernel: (&ei->log_mutex){+.+...}, at: [] 
btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: #012but task is already holding lock:
kernel: (&ei->log_mutex){+.+...}, at: [] 
btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: #012other info that might help us debug this:
kernel: Possible unsafe locking scenario:
kernel:   CPU0
kernel:   
kernel:  lock(&ei->log_mutex);
kernel:  lock(&ei->log_mutex);
kernel: #012 *** DEADLOCK ***
kernel: May be due to missing lock nesting notation
kernel: 6 locks held by ceph-osd/25016:
kernel: #0:  (sb_writers#15){.+.+.+}, at: [] 
__sb_start_write+0xad/0xe0
kernel: #1:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [] 
lock_rename+0x2e/0x100
kernel: #2:  (&sb->s_type->i_mutex_key#19){+.+.+.}, at: [] 
lock_two_nondirectories+0x3e/0x70
kernel: #3:  (&sb->s_type->i_mutex_key#19/4){+.+...}, at: [] 
lock_two_nondirectories+0x66/0x70
kernel: #4:  (sb_internal#2){.+.+.+}, at: [] 
__sb_start_write+0x7b/0xe0
kernel: #5:  (&ei->log_mutex){+.+...}, at: [] 
btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: #012stack backtrace:
kernel: CPU: 0 PID: 25016 Comm: ceph-osd Not tainted 
4.8.0-rc2-ceph-00297-gab9f8d2 #1
kernel: Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
kernel: 0086 668dbca4 88042a277870 81413e51
kernel: 828f4630 828f4630 88042a277930 811069c0
kernel: dfc5c151 8804 4efe3813430e32cf 880429f38000
kernel: Call Trace:
kernel: [] dump_stack+0x85/0xc4
kernel: [] __lock_acquire+0xce0/0x1680
kernel: [] ? mutex_unlock+0xe/0x10
kernel: [] lock_acquire+0x197/0x1f0
kernel: [] ? btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: [] mutex_lock_nested+0x79/0x370
kernel: [] ? btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: [] btrfs_log_inode+0x17e/0xb50 [btrfs]
kernel: [] ? __might_sleep+0x70/0x90
kernel: [] ? btrfs_i_callback+0x20/0x20 [btrfs]
kernel: [] ? btrfs_invalidate_inodes+0x198/0x1f0 [btrfs]
kernel: [] ? _raw_spin_unlock+0x27/0x40
kernel: [] btrfs_log_inode+0x534/0xb50 [btrfs]
kernel: [] ? trace_hardirqs_on_caller+0x178/0x1d0
kernel: [] btrfs_log_inode_parent+0x881/0x970 [btrfs]
kernel: [] ? _raw_spin_unlock+0x27/0x40
kernel: [] ? btrfs_update_inode+0xd7/0x110 [btrfs]
kernel: [] btrfs_log_new_name+0x7b/0x80 [btrfs]
kernel: [] btrfs_rename2+0xfd2/0x16c0 [btrfs]
kernel: [] ? lock_two_nondirectories+0x66/0x70
kernel: [] ? down_write_nested+0x4f/0x80
kernel: [] ? lock_two_nondirectories+0x66/0x70
kernel: [] vfs_rename+0x467/0x830
kernel: [] ? read_seqcount_begin+0x90/0xa0
kernel: [] SyS_rename+0x1ee/0x3c0
kernel: [] do_syscall_64+0x7a/0x1c0
kernel: [] entry_SYSCALL64_slow_path+0x25/0x25

Another 4.8-rc locked splat: btrfs_close_devices()

2016-09-08 Thread Ilya Dryomov

Hello,

This one seems to have appeared after Anand's commit
142388194191 ("btrfs: do not background blkdev_put()") got merged into
4.8-rc4.

Thanks,

Ilya
[  983.284212] ==
[  983.290401] [ INFO: possible circular locking dependency detected ]
[  983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted
[  983.302081] ---
[  983.308357] umount/21720 is trying to acquire lock:
[  983.313243]  (&bdev->bd_mutex){+.+.+.}, at: [] 
blkdev_put+0x31/0x150
[  983.321264] 
[  983.321264] but task is already holding lock:
[  983.327101]  (&fs_devs->device_list_mutex){+.+...}, at: [] 
__btrfs_close_devices+0x46/0x200 [btrfs]
[  983.337839] 
[  983.337839] which lock already depends on the new lock.
[  983.337839] 
[  983.346024] 
[  983.346024] the existing dependency chain (in reverse order) is:
[  983.353512] 
-> #4 (&fs_devs->device_list_mutex){+.+...}:
[  983.359096][] lock_acquire+0x1bc/0x1f0
[  983.365143][] mutex_lock_nested+0x65/0x350
[  983.371521][] btrfs_show_devname+0x36/0x1f0 [btrfs]
[  983.378710][] show_vfsmnt+0x4e/0x150
[  983.384593][] m_show+0x17/0x20
[  983.389957][] seq_read+0x2b5/0x3b0
[  983.395669][] __vfs_read+0x28/0x100
[  983.401464][] vfs_read+0xab/0x150
[  983.407080][] SyS_read+0x52/0xb0
[  983.412609][] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.419617] 
-> #3 (namespace_sem){++}:
[  983.424024][] lock_acquire+0x1bc/0x1f0
[  983.430074][] down_write+0x49/0x80
[  983.435785][] lock_mount+0x67/0x1c0
[  983.441582][] do_add_mount+0x32/0xf0
[  983.447458][] finish_automount+0x5a/0xc0
[  983.453682][] follow_managed+0x1b3/0x2a0
[  983.459912][] lookup_fast+0x300/0x350
[  983.465875][] path_openat+0x3a7/0xaa0
[  983.471846][] do_filp_open+0x85/0xe0
[  983.477731][] do_sys_open+0x14c/0x1f0
[  983.483702][] SyS_open+0x1e/0x20
[  983.489240][] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.496254] 
-> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}:
[  983.501798][] lock_acquire+0x1bc/0x1f0
[  983.507855][] down_write+0x49/0x80
[  983.513558][] start_creating+0x87/0x100
[  983.519703][] debugfs_create_dir+0x17/0x100
[  983.526195][] bdi_register+0x93/0x210
[  983.532165][] bdi_register_owner+0x43/0x70
[  983.538570][] device_add_disk+0x1fb/0x450
[  983.544888][] loop_add+0x1e6/0x290
[  983.550596][] loop_init+0x10b/0x14f
[  983.556394][] do_one_initcall+0xa7/0x180
[  983.562618][] kernel_init_freeable+0x1cc/0x266
[  983.569370][] kernel_init+0xe/0x100
[  983.575166][] ret_from_fork+0x1f/0x40
[  983.581131] 
-> #1 (loop_index_mutex){+.+.+.}:
[  983.585801][] lock_acquire+0x1bc/0x1f0
[  983.591858][] mutex_lock_nested+0x65/0x350
[  983.598256][] lo_open+0x1f/0x60
[  983.603704][] __blkdev_get+0x123/0x400
[  983.609757][] blkdev_get+0x34a/0x350
[  983.615639][] blkdev_open+0x64/0x80
[  983.621428][] do_dentry_open+0x1c6/0x2d0
[  983.627651][] vfs_open+0x69/0x80
[  983.633181][] path_openat+0x834/0xaa0
[  983.639152][] do_filp_open+0x85/0xe0
[  983.645035][] do_sys_open+0x14c/0x1f0
[  983.650999][] SyS_open+0x1e/0x20
[  983.656535][] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.663541] 
-> #0 (&bdev->bd_mutex){+.+.+.}:
[  983.668107][] __lock_acquire+0x1003/0x17b0
[  983.674510][] lock_acquire+0x1bc/0x1f0
[  983.680561][] mutex_lock_nested+0x65/0x350
[  983.686967][] blkdev_put+0x31/0x150
[  983.692761][] btrfs_close_bdev+0x4f/0x60 [btrfs]
[  983.699699][] __btrfs_close_devices+0xcb/0x200 
[btrfs]
[  983.707178][] btrfs_close_devices+0x2b/0xa0 [btrfs]
[  983.714380][] close_ctree+0x265/0x340 [btrfs]
[  983.721061][] btrfs_put_super+0x19/0x20 [btrfs]
[  983.727908][] generic_shutdown_super+0x6f/0x100
[  983.734744][] kill_anon_super+0x16/0x30
[  983.740888][] btrfs_kill_super+0x1e/0x130 [btrfs]
[  983.747909][] deactivate_locked_super+0x49/0x80
[  983.754745][] deactivate_super+0x5d/0x70
[  983.760977][] cleanup_mnt+0x5c/0x80
[  983.766773][] __cleanup_mnt+0x12/0x20
[  983.772738][] task_work_run+0x7e/0xc0
[  983.778708][] exit_to_usermode_loop+0x7e/0xb4
[  983.785373][] syscall_return_slowpath+0xbb/0xd0
[  983.792212][] entry_SYSCALL_64_fastpath+0xbf/0xc1
[  983.799225] 
[  983.799225] other info that might help us debug this:
[  983.799225] 
[  983.807291] Chain exists of:
  &bdev->bd_mutex --> namespace_sem --> &fs_devs->device_list_mutex

[  983.816521]  Possible unsafe locking scenario:
[  983.816521] 
[  983.822489]CPU0CPU1
[

Re: btrfs balance resume + raid5/6

2013-06-22 Thread Ilya Dryomov

On Sun, Jun 16, 2013 at 06:00:10AM +, Tim Castle wrote:
> Greetings! 
> 
> I'm testing raid6, and recently added two drives. 
> I haven't been able to properly resume a balance operation: the number of
> total chunks is always too low.  
> 
> It seems that the balance starts and pauses properly, but always resumes
> with ~7 chunks. 

Hi Tim,

This is the expected behaviour.  Balance does not actually remember
which chunks it had processed before it was paused, instead it uses a
heuristic which is based on the amount of space used out of each chunk.
As you have an almost full filesystem, all of your chunks, except for
those ~7, are fully (or almost fully) used, and balance resume simply
skips them.  If you really want to balance everything (why?) you would
have to run balance to completion, w/o pausing it.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: num_tolerated_disk_barrier_failures is incorrect for RAID6

2013-07-10 Thread Ilya Dryomov

Currently num_tolerated_disk_barrier_failures gets the value of
fs_devices->num_devices in the RAID6 case.  But, RAID6 can tolerate only
two simultaneous failures, so set it to 2.

CC: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/disk-io.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b8b60b6..aecf788 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3258,7 +3258,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
BTRFS_BLOCK_GROUP_RAID10)) {

num_tolerated_disk_barrier_failures = 1;
} else if (flags &
-  BTRFS_BLOCK_GROUP_RAID5) {
+  BTRFS_BLOCK_GROUP_RAID6) {

num_tolerated_disk_barrier_failures = 2;
}
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] Btrfs: rollback btrfs_device fields on umount

2013-08-12 Thread Ilya Dryomov

It turns out we don't properly rollback in-core btrfs_device state on
umount.  We zero out ->bdev, ->in_fs_metadata and that's about it.  In
particular, we don't zero out ->generation, and this can lead to us
refusing a mount -- a non-NULL fs_devices->latest_bdev is essential, but
btrfs_close_extra_devices will happily assign NULL to ->latest_bdev if
the first device on the dev_list happens to be missing and consequently
has no bdev attached.  This happens because since commit a6b0d5c8
btrfs_close_extra_devices adjusts ->latest_bdev, and in doing that,
relies on the ->generation.  Fix this, and possibly other problems, by
zeroing out everything except for what device_list_add sets, so that a
mount right after insmod and 'btrfs dev scan' is no different from any
later mount in this respect.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index af17b17..2f6bc12 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -673,22 +673,19 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
if (device->can_discard)
fs_devices->num_can_discard--;
 
-   new_device = kmalloc(sizeof(*new_device), GFP_NOFS);
-   BUG_ON(!new_device); /* -ENOMEM */
-   memcpy(new_device, device, sizeof(*new_device));
+   new_device = btrfs_alloc_device(NULL, &device->devid,
+   device->uuid);
+   BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
 
/* Safe because we are under uuid_mutex */
if (device->name) {
name = rcu_string_strdup(device->name->str, GFP_NOFS);
-   BUG_ON(device->name && !name); /* -ENOMEM */
+   BUG_ON(!name); /* -ENOMEM */
rcu_assign_pointer(new_device->name, name);
}
-   new_device->bdev = NULL;
-   new_device->writeable = 0;
-   new_device->in_fs_metadata = 0;
-   new_device->can_discard = 0;
-   spin_lock_init(&new_device->io_lock);
+
list_replace_rcu(&device->dev_list, &new_device->dev_list);
+   new_device->fs_devices = device->fs_devices;
 
call_rcu(&device->rcu, free_device);
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] Btrfs: add btrfs_alloc_device and switch to it

2013-08-12 Thread Ilya Dryomov

Currently btrfs_device is allocated ad-hoc in a few different places,
and as a result not all fields are initialized properly.  In particular,
readahead state is only initialized in device_list_add (at scan time),
and not in btrfs_init_new_device (when the new device is added with
'btrfs dev add').  Fix this by adding an allocation helper and switch
everybody but __btrfs_close_devices to it.  (__btrfs_close_devices is
dealt with in a later commit.)

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |  150 
 fs/btrfs/volumes.h |3 ++
 2 files changed, 96 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ae1bcb0..fe52704 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -101,6 +101,27 @@ void btrfs_cleanup_fs_uuids(void)
}
 }
 
+static struct btrfs_device *__alloc_device(void)
+{
+   struct btrfs_device *dev;
+
+   dev = kzalloc(sizeof(*dev), GFP_NOFS);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   INIT_LIST_HEAD(&dev->dev_list);
+   INIT_LIST_HEAD(&dev->dev_alloc_list);
+
+   spin_lock_init(&dev->io_lock);
+
+   spin_lock_init(&dev->reada_lock);
+   atomic_set(&dev->reada_in_flight, 0);
+   INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
+   INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
+
+   return dev;
+}
+
 static noinline struct btrfs_device *__find_device(struct list_head *head,
   u64 devid, u8 *uuid)
 {
@@ -414,17 +435,12 @@ static noinline int device_list_add(const char *path,
if (fs_devices->opened)
return -EBUSY;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device) {
+   device = btrfs_alloc_device(NULL, &devid,
+   disk_super->dev_item.uuid);
+   if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   return -ENOMEM;
+   return PTR_ERR(device);
}
-   device->devid = devid;
-   device->dev_stats_valid = 0;
-   device->work.func = pending_bios_fn;
-   memcpy(device->uuid, disk_super->dev_item.uuid,
-  BTRFS_UUID_SIZE);
-   spin_lock_init(&device->io_lock);
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
@@ -432,15 +448,6 @@ static noinline int device_list_add(const char *path,
return -ENOMEM;
}
rcu_assign_pointer(device->name, name);
-   INIT_LIST_HEAD(&device->dev_alloc_list);
-
-   /* init readahead state */
-   spin_lock_init(&device->reada_lock);
-   device->reada_curr_zone = NULL;
-   atomic_set(&device->reada_in_flight, 0);
-   device->reada_next = 0;
-   INIT_RADIX_TREE(&device->reada_zones, GFP_NOFS & ~__GFP_WAIT);
-   INIT_RADIX_TREE(&device->reada_extents, GFP_NOFS & ~__GFP_WAIT);
 
mutex_lock(&fs_devices->device_list_mutex);
list_add_rcu(&device->dev_list, &fs_devices->devices);
@@ -491,8 +498,9 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
list_for_each_entry(orig_dev, &orig->devices, dev_list) {
struct rcu_string *name;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device)
+   device = btrfs_alloc_device(NULL, &orig_dev->devid,
+   orig_dev->uuid);
+   if (IS_ERR(device))
goto error;
 
/*
@@ -506,13 +514,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
}
rcu_assign_pointer(device->name, name);
 
-   device->devid = orig_dev->devid;
-   device->work.func = pending_bios_fn;
-   memcpy(device->uuid, orig_dev->uuid, sizeof(device->uuid));
-   spin_lock_init(&device->io_lock);
-   INIT_LIST_HEAD(&device->dev_list);
-   INIT_LIST_HEAD(&device->dev_alloc_list);
-
list_add(&device->dev_list, &fs_devices->devices);
device->fs_devices = fs_devices;
fs_devices->num_devices++;
@@ -1956,10 +1957,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
 
-

[PATCH 3/4] Btrfs: add alloc_fs_devices and switch to it

2013-08-12 Thread Ilya Dryomov

In the spirit of btrfs_alloc_device, add a helper for allocating and
doing some common initialization of btrfs_fs_devices struct.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |   71 +++-
 1 file changed, 53 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fe52704..af17b17 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -62,6 +62,48 @@ static void unlock_chunks(struct btrfs_root *root)
mutex_unlock(&root->fs_info->chunk_mutex);
 }
 
+static struct btrfs_fs_devices *__alloc_fs_devices(void)
+{
+   struct btrfs_fs_devices *fs_devs;
+
+   fs_devs = kzalloc(sizeof(*fs_devs), GFP_NOFS);
+   if (!fs_devs)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_init(&fs_devs->device_list_mutex);
+
+   INIT_LIST_HEAD(&fs_devs->devices);
+   INIT_LIST_HEAD(&fs_devs->alloc_list);
+   INIT_LIST_HEAD(&fs_devs->list);
+
+   return fs_devs;
+}
+
+/**
+ * alloc_fs_devices - allocate struct btrfs_fs_devices
+ * @fsid:  a pointer to UUID for this FS.  If NULL a new UUID is
+ * generated.
+ *
+ * Return: a pointer to a new &struct btrfs_fs_devices on success;
+ * ERR_PTR() on error.  Returned struct is not linked onto any lists and
+ * can be destroyed with kfree() right away.
+ */
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+{
+   struct btrfs_fs_devices *fs_devs;
+
+   fs_devs = __alloc_fs_devices();
+   if (IS_ERR(fs_devs))
+   return fs_devs;
+
+   if (fsid)
+   memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
+   else
+   generate_random_uuid(fs_devs->fsid);
+
+   return fs_devs;
+}
+
 static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
 {
struct btrfs_device *device;
@@ -416,16 +458,14 @@ static noinline int device_list_add(const char *path,
 
fs_devices = find_fsid(disk_super->fsid);
if (!fs_devices) {
-   fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
-   if (!fs_devices)
-   return -ENOMEM;
-   INIT_LIST_HEAD(&fs_devices->devices);
-   INIT_LIST_HEAD(&fs_devices->alloc_list);
+   fs_devices = alloc_fs_devices(disk_super->fsid);
+   if (IS_ERR(fs_devices))
+   return PTR_ERR(fs_devices);
+
list_add(&fs_devices->list, &fs_uuids);
-   memcpy(fs_devices->fsid, disk_super->fsid, BTRFS_FSID_SIZE);
fs_devices->latest_devid = devid;
fs_devices->latest_trans = found_transid;
-   mutex_init(&fs_devices->device_list_mutex);
+
device = NULL;
} else {
device = __find_device(&fs_devices->devices, devid,
@@ -481,18 +521,13 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
struct btrfs_device *device;
struct btrfs_device *orig_dev;
 
-   fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
-   if (!fs_devices)
-   return ERR_PTR(-ENOMEM);
+   fs_devices = alloc_fs_devices(orig->fsid);
+   if (IS_ERR(fs_devices))
+   return fs_devices;
 
-   INIT_LIST_HEAD(&fs_devices->devices);
-   INIT_LIST_HEAD(&fs_devices->alloc_list);
-   INIT_LIST_HEAD(&fs_devices->list);
-   mutex_init(&fs_devices->device_list_mutex);
fs_devices->latest_devid = orig->latest_devid;
fs_devices->latest_trans = orig->latest_trans;
fs_devices->total_devices = orig->total_devices;
-   memcpy(fs_devices->fsid, orig->fsid, sizeof(fs_devices->fsid));
 
/* We have held the volume lock, it is safe to get the devices. */
list_for_each_entry(orig_dev, &orig->devices, dev_list) {
@@ -1794,9 +1829,9 @@ static int btrfs_prepare_sprout(struct btrfs_root *root)
if (!fs_devices->seeding)
return -EINVAL;
 
-   seed_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
-   if (!seed_devices)
-   return -ENOMEM;
+   seed_devices = __alloc_fs_devices();
+   if (IS_ERR(seed_devices))
+   return PTR_ERR(seed_devices);
 
old_devices = clone_fs_devices(fs_devices);
if (IS_ERR(old_devices)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] btrfs_device fixes

2013-08-12 Thread Ilya Dryomov

Hello,

This patch series does two things:

- adds allocation helpers for struct btrfs_device and struct
  btrfs_fs_devices so that all the list_heads, spinlocks, etc are
  properly initialized and the code for that is in one place;

- fixes a bug in the umount sequence, which, under certain conditions,
  can lead to mount failures.

This is on top of 3.11-rc3, but applies cleanly to btrfs-next/master as
of now.  You can pull from:

git://github.com/idryomov/btrfs-unstable.git dev-fixes
 
Thanks,

Ilya


Ilya Dryomov (4):
  Btrfs: find_next_devid: root -> fs_info
  Btrfs: add btrfs_alloc_device and switch to it
  Btrfs: add alloc_fs_devices and switch to it
  Btrfs: rollback btrfs_device fields on umount

 fs/btrfs/volumes.c |  250 +---
 fs/btrfs/volumes.h |3 +
 2 files changed, 162 insertions(+), 91 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] Btrfs: find_next_devid: root -> fs_info

2013-08-12 Thread Ilya Dryomov

find_next_devid() knows which root to search, so it should take an
fs_info instead of an arbitrary root.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 78b8717..ae1bcb0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1307,15 +1307,14 @@ static u64 find_next_chunk(struct btrfs_fs_info 
*fs_info)
return ret;
 }
 
-static noinline int find_next_devid(struct btrfs_root *root, u64 *objectid)
+static noinline int find_next_devid(struct btrfs_fs_info *fs_info,
+   u64 *devid_ret)
 {
int ret;
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_path *path;
 
-   root = root->fs_info->chunk_root;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -1324,20 +1323,21 @@ static noinline int find_next_devid(struct btrfs_root 
*root, u64 *objectid)
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = (u64)-1;
 
-   ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+   ret = btrfs_search_slot(NULL, fs_info->chunk_root, &key, path, 0, 0);
if (ret < 0)
goto error;
 
BUG_ON(ret == 0); /* Corruption */
 
-   ret = btrfs_previous_item(root, path, BTRFS_DEV_ITEMS_OBJECTID,
+   ret = btrfs_previous_item(fs_info->chunk_root, path,
+ BTRFS_DEV_ITEMS_OBJECTID,
  BTRFS_DEV_ITEM_KEY);
if (ret) {
-   *objectid = 1;
+   *devid_ret = 1;
} else {
btrfs_item_key_to_cpu(path->nodes[0], &found_key,
  path->slots[0]);
-   *objectid = found_key.offset + 1;
+   *devid_ret = found_key.offset + 1;
}
ret = 0;
 error:
@@ -1971,7 +1971,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
rcu_assign_pointer(device->name, name);
 
-   ret = find_next_devid(root, &device->devid);
+   ret = find_next_devid(root->fs_info, &device->devid);
if (ret) {
rcu_string_free(device->name);
kfree(device);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] Btrfs: add btrfs_alloc_device and switch to it

2013-08-23 Thread Ilya Dryomov

On Fri, Aug 23, 2013 at 10:35 AM, Stefan Behrens
 wrote:
> This WARN_ON(1) is triggered with the device replace procedure because
> BTRFS_DEV_REPLACE_DEVID is zero.

Ah, a dreaded 0 in C.  What a screw up.  This came from my rebuild
branch where btrfs_init_dev_replace_tgtdev is a bit different, and I
just auto-merged it.  I'll fix it up.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/4] Btrfs: add btrfs_alloc_device and switch to it

2013-08-23 Thread Ilya Dryomov

Currently btrfs_device is allocated ad-hoc in a few different places,
and as a result not all fields are initialized properly.  In particular,
readahead state is only initialized in device_list_add (at scan time),
and not in btrfs_init_new_device (when the new device is added with
'btrfs dev add').  Fix this by adding an allocation helper and switch
everybody but __btrfs_close_devices to it.  (__btrfs_close_devices is
dealt with in a later commit.)

Signed-off-by: Ilya Dryomov 
---
v2:
- fixed a dev-replace regression spotted by Stefan Behrens

 fs/btrfs/volumes.c |  151 
 fs/btrfs/volumes.h |3 ++
 2 files changed, 97 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ae1bcb0..100c21c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -101,6 +101,27 @@ void btrfs_cleanup_fs_uuids(void)
}
 }
 
+static struct btrfs_device *__alloc_device(void)
+{
+   struct btrfs_device *dev;
+
+   dev = kzalloc(sizeof(*dev), GFP_NOFS);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   INIT_LIST_HEAD(&dev->dev_list);
+   INIT_LIST_HEAD(&dev->dev_alloc_list);
+
+   spin_lock_init(&dev->io_lock);
+
+   spin_lock_init(&dev->reada_lock);
+   atomic_set(&dev->reada_in_flight, 0);
+   INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
+   INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
+
+   return dev;
+}
+
 static noinline struct btrfs_device *__find_device(struct list_head *head,
   u64 devid, u8 *uuid)
 {
@@ -414,17 +435,12 @@ static noinline int device_list_add(const char *path,
if (fs_devices->opened)
return -EBUSY;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device) {
+   device = btrfs_alloc_device(NULL, &devid,
+   disk_super->dev_item.uuid);
+   if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   return -ENOMEM;
+   return PTR_ERR(device);
}
-   device->devid = devid;
-   device->dev_stats_valid = 0;
-   device->work.func = pending_bios_fn;
-   memcpy(device->uuid, disk_super->dev_item.uuid,
-  BTRFS_UUID_SIZE);
-   spin_lock_init(&device->io_lock);
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
@@ -432,15 +448,6 @@ static noinline int device_list_add(const char *path,
return -ENOMEM;
}
rcu_assign_pointer(device->name, name);
-   INIT_LIST_HEAD(&device->dev_alloc_list);
-
-   /* init readahead state */
-   spin_lock_init(&device->reada_lock);
-   device->reada_curr_zone = NULL;
-   atomic_set(&device->reada_in_flight, 0);
-   device->reada_next = 0;
-   INIT_RADIX_TREE(&device->reada_zones, GFP_NOFS & ~__GFP_WAIT);
-   INIT_RADIX_TREE(&device->reada_extents, GFP_NOFS & ~__GFP_WAIT);
 
mutex_lock(&fs_devices->device_list_mutex);
list_add_rcu(&device->dev_list, &fs_devices->devices);
@@ -491,8 +498,9 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
list_for_each_entry(orig_dev, &orig->devices, dev_list) {
struct rcu_string *name;
 
-   device = kzalloc(sizeof(*device), GFP_NOFS);
-   if (!device)
+   device = btrfs_alloc_device(NULL, &orig_dev->devid,
+   orig_dev->uuid);
+   if (IS_ERR(device))
goto error;
 
/*
@@ -506,13 +514,6 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
}
rcu_assign_pointer(device->name, name);
 
-   device->devid = orig_dev->devid;
-   device->work.func = pending_bios_fn;
-   memcpy(device->uuid, orig_dev->uuid, sizeof(device->uuid));
-   spin_lock_init(&device->io_lock);
-   INIT_LIST_HEAD(&device->dev_list);
-   INIT_LIST_HEAD(&device->dev_alloc_list);
-
list_add(&device->dev_list, &fs_devices->devices);
device->fs_devices = fs_devices;
fs_devices->num_devices++;
@@ -1956,10 +1957,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
mutex_unlock(&root-&

[PATCH] Btrfs: stop refusing the relocation of chunk 0

2013-08-27 Thread Ilya Dryomov

AFAICT chunk 0 is no longer special, and so it should be restriped just
like every other chunk.  One reason for this change is us refusing the
relocation can lead to filesystems that can only be mounted ro, and
never rw -- see the bugzilla [1] for details.  The other reason is that
device removal code is already doing this: it will happily relocate
chunk 0 is part of shrinking the device.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=60594

Reported-by: Xavier Bassery 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c9a0977..e8325de 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2991,10 +2991,6 @@ again:
if (found_key.objectid != key.objectid)
break;
 
-   /* chunk zero is special */
-   if (found_key.offset == 0)
-   break;
-
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
 
if (!counting) {
@@ -3030,6 +3026,8 @@ again:
spin_unlock(&fs_info->balance_lock);
}
 loop:
+   if (found_key.offset == 0)
+   break;
key.offset = found_key.offset - 1;
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: do not add replace target to the alloc_list

2013-09-01 Thread Ilya Dryomov

If replace was suspended by the umount, replace target device is added
to the fs_devices->alloc_list during a later mount.  This is obviously
wrong.  ->is_tgtdev_for_dev_replace is supposed to guard against that,
but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized
*after* everything is opened and fs_devices lists are populated.  Fix
this by checking the devid instead: for replace targets it's always
equal to BTRFS_DEV_REPLACE_DEVID.

Cc: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
At first I thought this was caused by my btrfs_device rollback patch,
but no, it just made it easier to spot -- previously one had to reboot
or rmmod/insmod before mounting a suspended replace.

 fs/btrfs/volumes.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c9a0977..5b99f19 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -793,7 +793,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
fs_devices->rotating = 1;
 
fs_devices->open_devices++;
-   if (device->writeable && !device->is_tgtdev_for_dev_replace) {
+   if (device->writeable &&
+   device->devid != BTRFS_DEV_REPLACE_DEVID) {
fs_devices->rw_devices++;
list_add(&device->dev_alloc_list,
 &fs_devices->alloc_list);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 18/20] Btrfs-progs: fix magic return value in cmds-balance.c

2013-09-04 Thread Ilya Dryomov

Hi Wang,

Thank you for doing the grunt work, it's been a long standing todo.
See the comments below.

On Wed, Sep 4, 2013 at 6:22 PM, Wang Shilong  wrote:
> From: Wang Shilong 
>
> If there is no balance in progress.resume/pause/cancel will
> return 2. For usage or syntal errors will return 1. 0 means
> operations return successfully.

This needs to be reworded (spelling, punctuation).

>
> Signed-off-by: Wang Shilong 
> ---
>  cmds-balance.c | 93 
> --
>  1 file changed, 58 insertions(+), 35 deletions(-)
>
> diff --git a/cmds-balance.c b/cmds-balance.c
> index b7382ef..fd68051 100644
> --- a/cmds-balance.c
> +++ b/cmds-balance.c
> @@ -58,7 +58,7 @@ static int parse_one_profile(const char *profile, u64 
> *flags)
> *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE;
> } else {
> fprintf(stderr, "Unknown profile '%s'\n", profile);
> -   return 1;
> +   return -ENOENT;
> }
>
> return 0;
> @@ -68,12 +68,14 @@ static int parse_profiles(char *profiles, u64 *flags)
>  {
> char *this_char;
> char *save_ptr = NULL; /* Satisfy static checkers */
> +   int ret;
>
> for (this_char = strtok_r(profiles, "|", &save_ptr);
>  this_char != NULL;
>  this_char = strtok_r(NULL, "|", &save_ptr)) {
> -   if (parse_one_profile(this_char, flags))
> -   return 1;
> +   ret = parse_one_profile(this_char, flags);
> +   if (ret)
> +   return ret;
> }
>
> return 0;
> @@ -86,7 +88,7 @@ static int parse_u64(const char *str, u64 *result)
>
> val = strtoull(str, &endptr, 10);
> if (*endptr)
> -   return 1;
> +   return -EINVAL;
>
> *result = val;
> return 0;
> @@ -95,6 +97,7 @@ static int parse_u64(const char *str, u64 *result)
>  static int parse_range(const char *range, u64 *start, u64 *end)
>  {
> char *dots;
> +   int ret;
>
> dots = strstr(range, "..");
> if (dots) {
> @@ -107,29 +110,31 @@ static int parse_range(const char *range, u64 *start, 
> u64 *end)
> *end = (u64)-1;
> skipped++;
> } else {
> -   if (parse_u64(rest, end))
> -   return 1;
> +   ret = parse_u64(rest, end);
> +   if (ret)
> +   return ret;
> }
> if (dots == range) {
> *start = 0;
> skipped++;
> } else {
> +   ret = parse_u64(rest, end);
> if (parse_u64(range, start))
> -   return 1;
> +   return ret;
> }
>
> if (*start >= *end) {
> fprintf(stderr, "Range %llu..%llu doesn't make "
> "sense\n", (unsigned long long)*start,
> (unsigned long long)*end);
> -   return 1;
> +   return -EINVAL;
> }
>
> if (skipped <= 1)
> return 0;
> }
>
> -   return 1;
> +   return -EINVAL;
>  }
>
>  static int parse_filters(char *filters, struct btrfs_balance_args *args)
> @@ -137,6 +142,7 @@ static int parse_filters(char *filters, struct 
> btrfs_balance_args *args)
> char *this_char;
> char *value;
> char *save_ptr = NULL; /* Satisfy static checkers */
> +   int ret = 0;
>
> if (!filters)
> return 0;
> @@ -150,70 +156,72 @@ static int parse_filters(char *filters, struct 
> btrfs_balance_args *args)
> if (!value || !*value) {
> fprintf(stderr, "the profiles filter requires 
> "
>"an argument\n");
> -   return 1;
> +   return -EINVAL;
> }
> if (parse_profiles(value, &args->profiles)) {
> fprintf(stderr, "Invalid profiles 
> argument\n");
> -   return 1;
> +   return -EINVAL;
> }
> args->flags |= BTRFS_BALANCE_ARGS_PROFILES;
> } else if (!strcmp(this_char, "usage")) {
> if (!value || !*value) {
> fprintf(stderr, "the usage filter requires "
>"an argument\n");
> -   return 1;
> +   return -EINVAL;
> }
> if (parse_u64(value, &args->usage) ||
>

Re: [PATCH v2 18/20] Btrfs-progs: fix magic return value in cmds-balance.c

2013-09-10 Thread Ilya Dryomov

On Tue, Sep 10, 2013 at 5:28 AM, Wang Shilong
 wrote:
> If there is no balance in progress, resume/pause/cancel
> will return 2. Usage or syntax errors will return 1.
> And 0 means operations return successfully.
>
> Signed-off-by: Wang Shilong 
> ---
> v1->v2:
> Address comments from Ilya Dryomov.

Reviewed-by: Ilya Dryomov 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-15 Thread Ilya Dryomov

On Mon, Jan 14, 2013 at 03:18:14PM +0100, Arvin Schnell wrote:
> 
> Hi,
> 
> please find attached a patch to make the new libbtrfs usable from
> C++ (at least for the parts snapper will likely need).
> 
> Regards,
>   Arvin
> 
> -- 
> Arvin Schnell, 
> Senior Software Engineer, Research & Development
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
> HRB 16746 (AG Nürnberg)
> Maxfeldstraße 5
> 90409 Nürnberg
> Germany

> diff --git a/cmds-send.c b/cmds-send.c
> index 9b47e70..c51310a 100644
> --- a/cmds-send.c
> +++ b/cmds-send.c
> @@ -40,6 +40,10 @@
>  #include "send.h"
>  #include "send-utils.h"
>  
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  static int g_verbose = 0;
>  
>  struct btrfs_send {
> @@ -654,3 +658,7 @@ int cmd_send(int argc, char **argv)
>  {
>   return cmd_send_start(argc, argv);
>  }
> +
> +#ifdef __cplusplus
> +}
> +#endif

Hi Arvin,

Why exactly do we need this in a non-header file?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix crash of starting balance

2013-01-15 Thread Ilya Dryomov

On Tue, Jan 15, 2013 at 10:47:57PM +0800, Liu Bo wrote:
> We will crash on BUG_ON(ret == -EEXIST) when we do not resume the existing
> balance but attempt to start a new one.
> 
> The steps can be:
> 1. start balance
> 2. pause balance
> 3. start balance
> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/volumes.c |7 ++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 5cce6aa..3901654 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3100,7 +3100,12 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   goto out;
>  
>   if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
> - BUG_ON(ret == -EEXIST);
> + /*
> +  * This can happen when we do not resume the existing balance
> +  * but try to start a new one instead.
> +  */
> + if (ret == -EEXIST)
> + goto out;
>   set_balance_control(bctl);
>   } else {
>   BUG_ON(ret != -EEXIST);

OK, it seems balance pause/resume logic got broken by dev-replace code
(5ac00addc7ac09110995fe967071d191b5981cc1), which went into v3.8-rc1.
This is most certainly not the right way to fix it, that BUG_ON is there
for a reason.  I'll send a fix in a couple of days.

Thanks,

Ilya

> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] Btrfs: fix "mutually exclusive op is running" error code

2013-01-20 Thread Ilya Dryomov

The error code that is returned in response to starting a mutually
exclusive operation when there is one already running got silently
changed from EINVAL to EINPROGRESS by 5ac00add.  Returning EINPROGRESS
to, say, add_dev, when rm_dev is running is misleading.  Furthermore,
the operation itself may want to use EINPROGRESS for other purposes.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/ioctl.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ab82636..3aebb28 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1337,7 +1337,7 @@ static noinline int btrfs_ioctl_resize(struct file *file,
1)) {
pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
mnt_drop_write_file(file);
-   return -EINPROGRESS;
+   return -EINVAL;
}
 
mutex_lock(&root->fs_info->volume_mutex);
@@ -2193,7 +2193,7 @@ static int btrfs_ioctl_defrag(struct file *file, void 
__user *argp)
if (atomic_xchg(&root->fs_info->mutually_exclusive_operation_running,
1)) {
pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
-   return -EINPROGRESS;
+   return -EINVAL;
}
ret = mnt_want_write_file(file);
if (ret) {
@@ -2267,7 +2267,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_root *root, 
void __user *arg)
if (atomic_xchg(&root->fs_info->mutually_exclusive_operation_running,
1)) {
pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
-   return -EINPROGRESS;
+   return -EINVAL;
}
 
mutex_lock(&root->fs_info->volume_mutex);
@@ -2304,7 +2304,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void 
__user *arg)
1)) {
pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
mnt_drop_write_file(file);
-   return -EINPROGRESS;
+   return -EINVAL;
}
 
mutex_lock(&root->fs_info->volume_mutex);
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] Btrfs: fix unlock order in btrfs_ioctl_resize

2013-01-20 Thread Ilya Dryomov

Fix unlock order in btrfs_ioctl_resize().

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3aebb28..b9a0190 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1447,8 +1447,8 @@ out_free:
kfree(vol_args);
 out:
mutex_unlock(&root->fs_info->volume_mutex);
-   mnt_drop_write_file(file);
atomic_set(&root->fs_info->mutually_exclusive_operation_running, 0);
+   mnt_drop_write_file(file);
return ret;
 }
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] Btrfs: fix unlock order in btrfs_ioctl_rm_dev

2013-01-20 Thread Ilya Dryomov

Fix unlock order in btrfs_ioctl_rm_dev().

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b9a0190..9c079dd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2320,8 +2320,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void 
__user *arg)
kfree(vol_args);
 out:
mutex_unlock(&root->fs_info->volume_mutex);
-   mnt_drop_write_file(file);
atomic_set(&root->fs_info->mutually_exclusive_operation_running, 0);
+   mnt_drop_write_file(file);
return ret;
 }
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag

2013-01-20 Thread Ilya Dryomov

Operation-specific check (whether subvol is readonly or not) should go
after the mutual exclusiveness check.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/ioctl.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9c079dd..f198d5d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2187,19 +2187,20 @@ static int btrfs_ioctl_defrag(struct file *file, void 
__user *argp)
struct btrfs_ioctl_defrag_range_args *range;
int ret;
 
-   if (btrfs_root_readonly(root))
-   return -EROFS;
+   ret = mnt_want_write_file(file);
+   if (ret)
+   return ret;
 
if (atomic_xchg(&root->fs_info->mutually_exclusive_operation_running,
1)) {
pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
+   mnt_drop_write_file(file);
return -EINVAL;
}
-   ret = mnt_want_write_file(file);
-   if (ret) {
-   atomic_set(&root->fs_info->mutually_exclusive_operation_running,
-  0);
-   return ret;
+
+   if (btrfs_root_readonly(root)) {
+   ret = -EROFS;
+   goto out;
}
 
switch (inode->i_mode & S_IFMT) {
@@ -2251,8 +2252,8 @@ static int btrfs_ioctl_defrag(struct file *file, void 
__user *argp)
ret = -EINVAL;
}
 out:
-   mnt_drop_write_file(file);
atomic_set(&root->fs_info->mutually_exclusive_operation_running, 0);
+   mnt_drop_write_file(file);
return ret;
 }
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] Mutually exclusive ops fixes

2013-01-20 Thread Ilya Dryomov

Hello,

This is a set of fixes for mutually exclusive ops stuff added by Stefan
as part of the dev-replace patchset.  The main one here is the first
patch, it fixes a regression in balancing interface.  The second patch
brings EINVAL error code back.  The rest are relatively minor ordering
fixes.

Thanks,

Ilya


Ilya Dryomov (5):
  Btrfs: bring back balance pause/resume logic
  Btrfs: fix "mutually exclusive op is running" error code
  Btrfs: fix unlock order in btrfs_ioctl_resize
  Btrfs: fix unlock order in btrfs_ioctl_rm_dev
  Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag

 fs/btrfs/ioctl.c   |  107 ++-
 fs/btrfs/volumes.c |   10 +++-
 2 files changed, 86 insertions(+), 31 deletions(-)

-- 
1.7.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Mutually exclusive ops fixes

2013-01-20 Thread Ilya Dryomov

On Sun, Jan 20, 2013 at 05:22:27PM +0200, Ilya Dryomov wrote:
> Hello,
> 
> This is a set of fixes for mutually exclusive ops stuff added by Stefan
> as part of the dev-replace patchset.  The main one here is the first
> patch, it fixes a regression in balancing interface.  The second patch
> brings EINVAL error code back.  The rest are relatively minor ordering
> fixes.

You can pull from

git://github.com/idryomov/btrfs-unstable.git mutex-ops@next-master

top commit 2f68aafa90fedb3cdb80f0a8517a2130c5c035b0

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] Btrfs: bring back balance pause/resume logic

2013-01-20 Thread Ilya Dryomov

Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
as part of dev-replace merge).  Offending commit took a stab at making
mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
replace_dev) not block behind volume_mutex if another such operation is
in progress and instead return an error right away.  Balancing front-end
relied on the blocking behaviour, so the fix is ugly, but short of a
complete rework, it's the best we can do.

Reported-by: Liu Bo 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/ioctl.c   |   78 ++-
 fs/btrfs/volumes.c |   10 --
 2 files changed, 71 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 171786a..ab82636 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3441,8 +3441,8 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_ioctl_balance_args *bargs;
struct btrfs_balance_control *bctl;
+   bool need_unlock; /* for mut. excl. ops lock */
int ret;
-   int need_to_clear_lock = 0;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3451,14 +3451,61 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
if (ret)
return ret;
 
-   mutex_lock(&fs_info->volume_mutex);
+again:
+   if (!atomic_xchg(&fs_info->mutually_exclusive_operation_running, 1)) {
+   mutex_lock(&fs_info->volume_mutex);
+   mutex_lock(&fs_info->balance_mutex);
+   need_unlock = true;
+   goto locked;
+   }
+
+   /*
+* mut. excl. ops lock is locked.  Three possibilites:
+*   (1) some other op is running
+*   (2) balance is running
+*   (3) balance is paused -- special case (think resume)
+*/
mutex_lock(&fs_info->balance_mutex);
+   if (fs_info->balance_ctl) {
+   /* this is either (2) or (3) */
+   if (!atomic_read(&fs_info->balance_running)) {
+   mutex_unlock(&fs_info->balance_mutex);
+   if (!mutex_trylock(&fs_info->volume_mutex))
+   goto again;
+   mutex_lock(&fs_info->balance_mutex);
+
+   if (fs_info->balance_ctl &&
+   !atomic_read(&fs_info->balance_running)) {
+   /* this is (3) */
+   need_unlock = false;
+   goto locked;
+   }
+
+   mutex_unlock(&fs_info->balance_mutex);
+   mutex_unlock(&fs_info->volume_mutex);
+   goto again;
+   } else {
+   /* this is (2) */
+   mutex_unlock(&fs_info->balance_mutex);
+   ret = -EINPROGRESS;
+   goto out;
+   }
+   } else {
+   /* this is (1) */
+   mutex_unlock(&fs_info->balance_mutex);
+   pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
+   ret = -EINVAL;
+   goto out;
+   }
+
+locked:
+   BUG_ON(!atomic_read(&fs_info->mutually_exclusive_operation_running));
 
if (arg) {
bargs = memdup_user(arg, sizeof(*bargs));
if (IS_ERR(bargs)) {
ret = PTR_ERR(bargs);
-   goto out;
+   goto out_unlock;
}
 
if (bargs->flags & BTRFS_BALANCE_RESUME) {
@@ -3478,13 +3525,10 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
bargs = NULL;
}
 
-   if (atomic_xchg(&root->fs_info->mutually_exclusive_operation_running,
-   1)) {
-   pr_info("btrfs: dev add/delete/balance/replace/resize operation 
in progress\n");
+   if (fs_info->balance_ctl) {
ret = -EINPROGRESS;
goto out_bargs;
}
-   need_to_clear_lock = 1;
 
bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
if (!bctl) {
@@ -3505,11 +3549,17 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
}
 
 do_balance:
-   ret = btrfs_balance(bctl, bargs);
/*
-* bctl is freed in __cancel_balance or in free_fs_info if
-* restriper was paused all the way until unmount
+* Ownership of bctl and mutually_exclusive_operation_running
+* goes to to btrfs_balance.  bctl is freed in __cancel_balance,
+* or, if restriper was paused all the way until unmount, in
+* free_fs_info.  mutually_exclus

[PATCH] Btrfs: fix a regression in balance usage filter

2013-01-21 Thread Ilya Dryomov

Commit 3fed40cc ("Btrfs: cleanup duplicated division functions"), which
was merged into 3.8-rc1, has introduced a regression by removing logic
that was guarding us against bad user input.  Bring it back.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 86279c3..b577aee 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2614,7 +2614,14 @@ static int chunk_usage_filter(struct btrfs_fs_info 
*fs_info, u64 chunk_offset,
cache = btrfs_lookup_block_group(fs_info, chunk_offset);
chunk_used = btrfs_block_group_used(&cache->item);
 
-   user_thresh = div_factor_fine(cache->key.offset, bargs->usage);
+   if (bargs->usage == 0)
+   user_thresh = 0;
+   else if (bargs->usage > 100)
+   user_thresh = cache->key.offset;
+   else
+   user_thresh = div_factor_fine(cache->key.offset,
+ bargs->usage);
+
if (chunk_used < user_thresh)
ret = 0;
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] include btrfsck in btrfs - including "name check"

2013-01-30 Thread Ilya Dryomov

On Wed, Jan 30, 2013 at 10:11:44PM +0100, Ian Kumlien wrote:
> On Wed, Jan 30, 2013 at 12:33:42PM -0800, Filipe Brandenburger wrote:
> > Hi Ian,
> > 
> > On Tue, Jan 29, 2013 at 3:03 PM, Ian Kumlien  wrote:
> > > This patch includes fsck as a subcommand of btrfs, but if you rename
> > > the binary to btrfsck (or, preferably, use a symlink) it will act like
> > > the old btrfs command.
> > 
> > You can rename files in your git (there's "git mv" for that), only
> > thing is when you generate the patch with format-patch (or "git show",
> > "git diff" etc.) pass it the -M option to detect moves and act
> > appropriately.
> 
> git send-email seems to send the full diff, diffing against /dev/null =P
> This is why i skipped that part.
> 
> > Regarding your patches, I really like the idea of "btrfs fsck" but I
> > think I'd prefer to keep the external commands as wrapper scripts
> > instead of adding busybox-style name detection to btrfs... But then,
> > that's just my opinion.
> 
> Well, now both works.
> 
> > I guess I would have a "btrfsck" that would simply contain:
> > 
> > #! /bin/sh
> > exec btrfs fsck "$@"
> > 
> > Downside is that error reporting (e.g. invalid syntax, etc.) would
> > show "btrfs fsck" instead of the command the user actually typed...
> 
> Actually it still does, due to how btrfs handles things... It's a simple
> enough fix and it will make rescue cd's or dracut images, or just about
> anything.
> 
> I understand your point, but i think this is a simpler solution =)

FWIW I agree with Filipe, this name detection thing looks ugly to me.
The merge itself is a good idea, but I think we should stick with shell
wrappers for everything else.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] Btrfs-progs: add the rescue section to btrfs

2013-02-08 Thread Ilya Dryomov

On Fri, Feb 08, 2013 at 01:37:02AM +0100, Ian Kumlien wrote:
> the btrfs command now lists:
> btrfs rescue select-super -s  
> Select a superblock
> btrfs rescue dump-super 
> Dump a superblock to disk
> btrfs rescue debug-tree [options] 
> Debug the filesystem
> 
> btrfs-dump-super.c was imported in to cmds-rescue-super-ops.c
> 
> This patch integrates all the functionality...
> 
> cmds-rescue.c is used to glue cmds-rescue-debug-tree.c,
> cmds-rescue-restore.c and cmds-rescue-super-ops.c together to
> make the source files more managable.

I think btrfs-debug-tree should go under debug -- btrfs debug dump-tree.
Same goes for btrfs-dump-super -- btrfs debug dump-super.  These
commands won't help an average user to rescue a single byte, they are
there to help developers.  I suppose you can also import btrfs-image
under btrfs debug dump-image.

This leaves the question of whether we want select-super under rescue.
Given that it can easily do more harm than good, it might not be the
best place for it..

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: eliminate a use-after-free in btrfs_balance()

2013-02-12 Thread Ilya Dryomov

Commit 5af3e8cc introduced a use-after-free at volumes.c:3139: bctl is freed
above in __cancel_balance() in all cases except for balance pause.  Fix this
by moving the offending check a couple statements above, the meaning of the
check is preserved.

Reported-by: Chris Mason 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 15f6efd..5d25aed 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3126,6 +3126,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
mutex_lock(&fs_info->balance_mutex);
atomic_dec(&fs_info->balance_running);
 
+   if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
+   fs_info->num_tolerated_disk_barrier_failures =
+   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
+   }
+
if (bargs) {
memset(bargs, 0, sizeof(*bargs));
update_ioctl_balance_args(fs_info, 0, bargs);
@@ -3136,11 +3141,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
__cancel_balance(fs_info);
}
 
-   if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-   fs_info->num_tolerated_disk_barrier_failures =
-   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-   }
-
wake_up(&fs_info->balance_wait_q);
 
return ret;
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: allow for selecting only completely empty chunks

2013-02-12 Thread Ilya Dryomov

Enhance balance usage filter by making it possible to balance out only
completely empty chunks.  Today, usage filter properly acts on values
from 1 to 99 inclusive, usage=100 selects all chunks, and usage=0
selects no chunks.  This commit changes the usage=0 case: the new
meaning is to restripe only completely empty chunks and nothing else.

Suggested-by: David Sterba 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 15f6efd..7e41619 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2615,7 +2615,7 @@ static int chunk_usage_filter(struct btrfs_fs_info 
*fs_info, u64 chunk_offset,
chunk_used = btrfs_block_group_used(&cache->item);
 
if (bargs->usage == 0)
-   user_thresh = 0;
+   user_thresh = 1;
else if (bargs->usage > 100)
user_thresh = cache->key.offset;
else
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: accept zero for balance usage filter

2013-02-12 Thread Ilya Dryomov

On Mon, Feb 11, 2013 at 01:33:52PM +0100, David Sterba wrote:
> The condition can be relaxed to accept also 0 which will delete
> unoccupied chunks and does not need space for the actual data
> relocation. Until there is an automatic empty chunk reclaim, we
> can use this as a last resort option under enospc.
> 
> CC: Ilya Dryomov 
> Signed-off-by: David Sterba 

Josef, please don't pull this one into btrfs-next, it's been deprecated
by "Btrfs: allow for selecting only completely empty chunks" from me.
David and I discussed this on IRC, he is OK with it.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs-progs: make 0 a valid usage filter argument

2013-02-12 Thread Ilya Dryomov

This is a progs counterpart to a "Btrfs: allow for selecting only
completely empty chunks".  usage=0 now means "select only only
completely empty chunks and nothing else".

Signed-off-by: Ilya Dryomov 
---
 cmds-balance.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cmds-balance.c b/cmds-balance.c
index 38a7426..1f888db 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -159,7 +159,7 @@ static int parse_filters(char *filters, struct 
btrfs_balance_args *args)
return 1;
}
if (parse_u64(value, &args->usage) ||
-   args->usage < 1 || args->usage > 100) {
+   args->usage > 100) {
fprintf(stderr, "Invalid usage argument: %s\n",
   value);
return 1;
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs-progs: move crc32c optimization init

2013-02-12 Thread Ilya Dryomov

Don't call crc32c_optimization_init() until we know that a command is
actually going to be invoked.

Signed-off-by: Ilya Dryomov 
---
 btrfs.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 687acec..7752bd6 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -261,8 +261,6 @@ int main(int argc, char **argv)
 {
const struct cmd_struct *cmd;
 
-   crc32c_optimization_init();
-
argc--;
argv++;
handle_options(&argc, &argv);
@@ -278,6 +276,8 @@ int main(int argc, char **argv)
 
handle_help_options_next_level(cmd, argc, argv);
 
+   crc32c_optimization_init();
+
fixup_argv0(argv, cmd->token);
exit(cmd->fn(argc, argv));
 }
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep

2013-02-26 Thread Ilya Dryomov

On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote:
> The coverity runs had a false positive complaining that save_ptr
> is uninitialized in the call to strtok_r.
> 
> We could initialize it, but Zach points out that just using
> strsep is a lot simpler if there's only one delimiter,
> so just switch to that.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> V2: Remove accidentally-added debug printfs, thanks Geoffredo!
> 
> diff --git a/cmds-balance.c b/cmds-balance.c
> index b671e1d..cfbb8eb 100644
> --- a/cmds-balance.c
> +++ b/cmds-balance.c
> @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 
> *flags)
>  static int parse_profiles(char *profiles, u64 *flags)
>  {
>   char *this_char;
> - char *save_ptr;
>  
> - for (this_char = strtok_r(profiles, "|", &save_ptr);
> -  this_char != NULL;
> -  this_char = strtok_r(NULL, "|", &save_ptr)) {
> + while ((this_char = strsep(&profiles, "|"))) {
>   if (parse_one_profile(this_char, flags))
>   return 1;
>   }
> @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct 
> btrfs_balance_args *args)
>  {
>   char *this_char;
>   char *value;
> - char *save_ptr;
>  
>   if (!filters)
>   return 0;
>  
> - for (this_char = strtok_r(filters, ",", &save_ptr);
> -  this_char != NULL;
> -  this_char = strtok_r(NULL, ",", &save_ptr)) {
> + while ((this_char = strsep(&filters , ","))) {

  ^^^ whitespace


One of the differences between strtok() and strsep() is that the former
allows multiple delimiters between two tokens.  With strsep(), this

btrfs balance -dfoo1=bar1,,,foo2=bar2 

fails with error, whereas with strtok() it passes.  I don't have a
strong opinion here (this has been loosely modeled on the way mount(8)
handles -o options), but might it be better to just initialize save_ptr?
(And yes, I know that strsep() is better ;))

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 17/17 V2] btrfs-progs: replace strtok_r with strsep

2013-02-26 Thread Ilya Dryomov

On Tue, Feb 26, 2013 at 02:46:30PM -0600, Eric Sandeen wrote:
> On 2/26/13 2:40 PM, Ilya Dryomov wrote:
> > On Tue, Feb 26, 2013 at 02:20:30PM -0600, Eric Sandeen wrote:
> >> The coverity runs had a false positive complaining that save_ptr
> >> is uninitialized in the call to strtok_r.
> >>
> >> We could initialize it, but Zach points out that just using
> >> strsep is a lot simpler if there's only one delimiter,
> >> so just switch to that.
> >>
> >> Signed-off-by: Eric Sandeen 
> >> ---
> >>
> >> V2: Remove accidentally-added debug printfs, thanks Geoffredo!
> >>
> >> diff --git a/cmds-balance.c b/cmds-balance.c
> >> index b671e1d..cfbb8eb 100644
> >> --- a/cmds-balance.c
> >> +++ b/cmds-balance.c
> >> @@ -67,11 +67,8 @@ static int parse_one_profile(const char *profile, u64 
> >> *flags)
> >>  static int parse_profiles(char *profiles, u64 *flags)
> >>  {
> >>char *this_char;
> >> -  char *save_ptr;
> >>  
> >> -  for (this_char = strtok_r(profiles, "|", &save_ptr);
> >> -   this_char != NULL;
> >> -   this_char = strtok_r(NULL, "|", &save_ptr)) {
> >> +  while ((this_char = strsep(&profiles, "|"))) {
> >>if (parse_one_profile(this_char, flags))
> >>return 1;
> >>}
> >> @@ -136,14 +133,11 @@ static int parse_filters(char *filters, struct 
> >> btrfs_balance_args *args)
> >>  {
> >>char *this_char;
> >>char *value;
> >> -  char *save_ptr;
> >>  
> >>if (!filters)
> >>return 0;
> >>  
> >> -  for (this_char = strtok_r(filters, ",", &save_ptr);
> >> -   this_char != NULL;
> >> -   this_char = strtok_r(NULL, ",", &save_ptr)) {
> >> +  while ((this_char = strsep(&filters , ","))) {
> > 
> >   ^^^ whitespace
> > 
> > 
> > One of the differences between strtok() and strsep() is that the former
> > allows multiple delimiters between two tokens.  With strsep(), this
> > 
> > btrfs balance -dfoo1=bar1,,,foo2=bar2 
> > 
> > fails with error, whereas with strtok() it passes.  I don't have a
> > strong opinion here (this has been loosely modeled on the way mount(8)
> > handles -o options), but might it be better to just initialize save_ptr?
> > (And yes, I know that strsep() is better ;))
> 
> I don't really care much either way, TBH.  Initializing it seems a little
> bit magic, but with a comment as to why, it'd be fine.  If you did it this
> way intentionally to allow the above format, and changing it would break
> expectations, then I'll happily just initialize save_ptr.

Yeah, although I doubt anybody would notice, it's probably better to
keep the old behaviour.

> 
> (And, I realize that lots of these changes are pedantic and seemingly
> pointless, but we've gotten the static checker errors down from over 100
> to under 30 and dropping; the more noise we remove the more likely we are
> to pay attention to the output and catch actual errors.  At least that's
> my feeling; if people think this is getting to be pointless churn, I'm
> ok with that, too).

Not at all.  Btrfs-progs had rarely seen any cleanups, with you and Zach
patching it up and David's integration work it has gotten a whole new
life ;)  BTW, huge thanks for the kernel-style build output, I am
forever grateful..

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: return EPERM in btrfs_rm_device()

2013-03-02 Thread Ilya Dryomov

On Sat, Mar 02, 2013 at 12:13:59AM -0700, Jerry Snitselaar wrote:
> Currently there are error paths in btrfs_rm_device() where EINVAL is
> returned telling the user they passed an invalid argument even though
> they passed a valid device. Change to return EPERM instead as the
> operation is not permitted.
> 
> Signed-off-by: Jerry Snitselaar 
> ---
>  fs/btrfs/volumes.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 5cbb7f4..3e1586c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1392,14 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char 
> *device_path)
>   if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) {
>   printk(KERN_ERR "btrfs: unable to go below four devices "
>  "on raid10\n");
> - ret = -EINVAL;
> + ret = -EPERM;
>   goto out;
>   }
>  
>   if ((all_avail & BTRFS_BLOCK_GROUP_RAID1) && num_devices <= 2) {
>   printk(KERN_ERR "btrfs: unable to go below two "
>  "devices on raid1\n");
> - ret = -EINVAL;
> + ret = -EPERM;
>   goto out;
>   }
>  
> @@ -1449,14 +1449,14 @@ int btrfs_rm_device(struct btrfs_root *root, char 
> *device_path)
>  
>   if (device->is_tgtdev_for_dev_replace) {
>   pr_err("btrfs: unable to remove the dev_replace target dev\n");
> - ret = -EINVAL;
> + ret = -EPERM;
>   goto error_brelse;
>   }
>  
>   if (device->writeable && root->fs_info->fs_devices->rw_devices == 1) {
>   printk(KERN_ERR "btrfs: unable to remove the only writeable "
>  "device\n");
> - ret = -EINVAL;
> + ret = -EPERM;

I don't think returning EPERM in these cases is any better than EINVAL.
FWIW, many other btrfs ioctls, especially balance, suffer from this as
well.  What we really need is some kind of error message delivery
system, but that's not going to happen any time soon...

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix a mismerge in btrfs_balance()

2013-03-06 Thread Ilya Dryomov

Raid56 merge (merge commit e942f88) had mistakenly removed a call to
__cancel_balance(), which resulted in balance not cleaning up after itself
after a successful finish.  (Cleanup includes switching the state, removing
the balance item and releasing mut_ex_op testnset lock.)  Bring it back.

Reported-by: David Sterba 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 35bb2d4..33440da 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3230,6 +3230,11 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
update_ioctl_balance_args(fs_info, 0, bargs);
}
 
+   if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
+   balance_need_close(fs_info)) {
+   __cancel_balance(fs_info);
+   }
+
wake_up(&fs_info->balance_wait_q);
 
return ret;
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: clean snapshots one by one

2013-03-06 Thread Ilya Dryomov

On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote:
> The mail did not make it to the list on Monday, anyway, now I have a
> few testing results to share.
> 
> The umount responsiveness is much improved, it took a few seconds when
> there were many roots to clean scheduled.
> 
> With
> 
> $ echo 'func btrfs_drop_snapshot +pf' > 
> /sys/debug/kernel/dynamic_debug/control
> $ echo 'func btrfs_clean_one_deleted_snapshot +pf' > 
> /sys/debug/kernel/dynamic_debug/control
> 
> one can watch how it proceeds.
> 
> After umount and and mount, the cleaning process starts again, after 30
> seconds when transaction commit triggers.
> 
> Next test was to let it run with balance (just 'fi balance', no other
> options). Worked well, but I've hit an oops from balance cancel command that
> was triggered sometime during the balance. According to the log it happened
> right after the balance finished. CCing Ilya, I'm not sure if this is somehow
> induced by the patch.
> 
> 3407 BUG_ON(fs_info->balance_ctl || 
> atomic_read(&fs_info->balance_running));

Were you by any chance running this on top of a for-linus that included
a raid56 merge commit?  If so, this is result of a raid56 mismerge --
fs_info->balance_ctl is supposed to be cleared in __cancel_balance(), a
call to which was accidentally nuked.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: document mount options in Documentation/fs/btrfs.txt

2013-03-23 Thread Ilya Dryomov

On Sat, Mar 23, 2013 at 12:48:54PM -0500, Eric Sandeen wrote:
> Document all current btrfs mount options.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> please, Please, PLEASE review this and suggest improvements.
> I'm no btrfs wizard but I've done my best to get this all right
> based on commit logs, code reading, and wiki reading.  In cases where
> any of those 3 disagreed, I've done my best to document reality, but I'm
> sure it could use corrections, clarifications, etc.
> 
> In particular, some of these mount options could really use more
> description of why a user might want them, rather than simply stating
> what they do.
> 
> "notreelog" is particularly egregious, as I have nothing but self-referential
> documentation.
> 
> So this can use some iteration, I'm sure, but hopefully it's a decent
> start.
> 
> Thanks,
> -Eric
> 
> diff --git a/Documentation/filesystems/btrfs.txt 
> b/Documentation/filesystems/btrfs.txt
> index 7671352..02a19c8 100644
> --- a/Documentation/filesystems/btrfs.txt
> +++ b/Documentation/filesystems/btrfs.txt
> @@ -1,6 +1,6 @@
>  
> - BTRFS
> - =
> +BTRFS
> +=
>  
>  Btrfs is a new copy on write filesystem for Linux aimed at
>  implementing advanced features while focusing on fault tolerance,
> @@ -34,9 +34,173 @@ The main Btrfs features include:
>  * Online filesystem defragmentation
>  
>  
> -
> - MAILING LIST
> - 
> +Mount Options
> +=
> +
> +When mounting a btrfs filesystem, the following option are accepted.
> +Unless otherwise specified, all options default to off.
> +
> +  alloc_start=
> + Debugging option to force all block allocations above a threshold.
> + The value is specified in bytes, optionally with a K, M, or G suffix,
> + case insensitive.  Default is 1MB.
> +
> +  autodefrag
> + Detect small random writes into files and queue them up for the
> + defrag process.  Works best for small files; Not well suited for
> + large database workloads.
> +
> +  check_int
> +  check_int_data
> +  check_int_print_mask=
> + These debugging options control the behavior of the integrity checking
> + module (the BTRFS_FS_CHECK_INTEGRITY config option required).
> +
> + check_int enables the integrity checker module, which examines all
> + block write requests to ensure on-disk consistency, at a large
> + memory and CPU cost.  
> +
> + check_int_data includes extent data in the integrity checks, and
> + implies the check_int option.
> +
> + check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
> + as defined in fs/btrfs/check-integrity.c, to control the integrity
> + checker module behavior.
> +
> + See comments at the top of fs/btrfs/check-integrity.c for more info.
> +
> +  compress
> +  compress=
> +  compress-force
> +  compress-force=
> + Control BTRFS file data compression.  Type may be specified as "zlib"
> + "lzo" or "no" (for no compression, used for remounting).  If no type
> + is specified, zlib is used.  If compress-force is specified,
> + all files will be compressed, whether or not they compress well.
> + If compression is enabled, nodatacow and nodatasum are disabled.
> +
> +  degraded
> + Allow mounts to continue with missing devices.  A read-write mount may
> + fail with too many devices missing, for example if a stripe member
> + is completely missing.
> +
> +  device=
> + Specify a device during mount so that ioctls on the control device
> + can be avoided.  Especialy useful when trying to mount a multi-device
> + setup as root.  May be specified multiple times for multiple devices.
> +
> +  discard
> + Issue command to let the block device reclaim space freed by the
> + filesystem.  This is useful for SSD devices, thinly provisioned
> + LUNs and virtual machine images, but may have a performance
> + impact.
> +
> +  enospc_debug
> + Debugging option to be more verbose in some ENOSPC conditions.
> +
> +  fatal_errors=
> + Action to take when encountering a fatal error: 
> +   "bug" - BUG() on a fatal error.  This is the default.
> +   "panic" - panic() on a fatal error.
> +
> +  flushoncommit
> + The 'flushoncommit' mount option forces any data dirtied by a write in a
> + prior transaction to commit as part of the current commit.  This makes
> + the committed state a fully consistent view of the file system from the
> + application's perspective (i.e., it includes all completed file system
> + operations).  This was previously the behavior only when a snapshot is
> + created.
> +
> +  inode_cache
> + Enable free inode number caching.   Defaults to off due to an overflow
> + problem when the free space crcs don't fit inside a single page.
> +
> +  max_inline=
> + Specify the maximum amount of space, in bytes, that can be inlined in
> + a metadata B-tree leaf.  The value is specified in bytes, optionally 
> + w

Re: [RFC] How to enable btrfs reserve block space from specific device?

2013-03-27 Thread Ilya Dryomov

On Wed, Mar 27, 2013 at 09:45:59AM +0800, Zhi Yong Wu wrote:
> HI,
> 
> When i work on btrfs hot relocation feature, i hit one question
> about block reservation. btrfs hot relocation need to reserve block
> space from specific devices such as SSD, but current btrfs reserving
> code doesn't differentiate if block space is reserved from SSD or HDD.
> In order to make btrfs support this, I thought that we can introduce
> one new block group or flag, but this will maybe make large impact on
> current existing other functions. For this, does any guy have some
> better idea? thanks.

Hi,

Sorry for the late reply.  I have a lot on my plate right now, so I
haven't looked at your WIP code.  However, based on my knowledge, I can
tell you that block reservation problem is completely orthogonal to the
way you choose to handle hot data storage.  A separate block group is
one way to do it, and probably the easiest one.  Block groups -> chunks
is a 1:1 mapping, so, for now, it might make sense to have one giant
block group spanning over the entire hot data device.  OTOH, IIRC the
original implementation from IBM just used the rotation flag to detect
"hot data" devices.  You have a lot of choices here, but, if you are
planning on implementing mirrored/striped "hot data" devices in future,
I'd definitely go with block groups, since that'll give you some of the
infrastructure for that for free.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xfstests: unify apostrophes in output files

2013-09-19 Thread Ilya Dryomov

On Thu, Sep 19, 2013 at 7:20 PM, Eric Sandeen  wrote:
> xfstests: unify apostrophes in output files
>
> From: Tomas Racek 
>
> With coreutils v8.16 the style of apostrophes changed from `word' to
> 'word'. This is breaking some tests which use the older form.
>
> This commit introduces function changes the golden output of the
> affected tests and introduces a filter for the older style output.
>
> [dchinner: modified to use a global filter in check rather than
> per-test filters]
>
> Signed-off-by: Tomas Racek 
> Signed-off-by: Dave Chinner 
> Reviewed-by: Eric Sandeen 
> ---
>
> (Resending as proper top-level, modified, reviewed patch --Eric)
>
>  check |  4 
>  common/config |  1 +
>  tests/generic/193.out | 16 
>  tests/generic/230.out |  8 
>  tests/generic/235.out |  2 +-
>  tests/generic/245.out |  2 +-
>  tests/generic/294.out |  8 
>  tests/generic/306.out |  2 +-
>  tests/xfs/103.out |  2 +-
>  tests/xfs/200.out |  2 +-
>  10 files changed, 26 insertions(+), 21 deletions(-)
>
> diff --git a/check b/check
> index 4085eae..ba7fd21 100755
> --- a/check
> +++ b/check
> @@ -478,6 +478,10 @@ do
> echo " - no qualified output"
> err=true
> else
> +
> +   # coreutils 2.16+ changed quote formats in error messages from

Should that be 8.16+?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: eliminate races in worker stopping code

2013-10-02 Thread Ilya Dryomov

The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop.  The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

   btrfs_stop_workers():check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
  working list
- removes the first worker from the
  working list
- releases the lock to wait for
  its kthread's completion
  - grabs the lock
  - if aworker is on the working list,
moves aworker from the working list
to the idle list
  - releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
  working list
  ..
btrfs_stop_workers returns, aworker is on the idle list
 FS is umounted, memory is freed
  ..
  aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/async-thread.c |   25 +++--
 fs/btrfs/async-thread.h |2 ++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 58b7d14..08cc08f 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -107,7 +107,8 @@ static void check_idle_worker(struct btrfs_worker_thread 
*worker)
worker->idle = 1;
 
/* the list may be empty if the worker is just starting */
-   if (!list_empty(&worker->worker_list)) {
+   if (!list_empty(&worker->worker_list) &&
+   !worker->workers->stopping) {
list_move(&worker->worker_list,
 &worker->workers->idle_list);
}
@@ -127,7 +128,8 @@ static void check_busy_worker(struct btrfs_worker_thread 
*worker)
spin_lock_irqsave(&worker->workers->lock, flags);
worker->idle = 0;
 
-   if (!list_empty(&worker->worker_list)) {
+   if (!list_empty(&worker->worker_list) &&
+   !worker->workers->stopping) {
list_move_tail(&worker->worker_list,
  &worker->workers->worker_list);
}
@@ -412,6 +414,7 @@ void btrfs_stop_workers(struct btrfs_workers *workers)
int can_stop;
 
spin_lock_irq(&workers->lock);
+   workers->stopping = 1;
list_splice_init(&workers->idle_list, &workers->worker_list);
while (!list_empty(&workers->worker_list)) {
cur = workers->worker_list.next;
@@ -455,6 +458,7 @@ void btrfs_init_workers(struct btrfs_workers *workers, char 
*name, int max,
workers->ordered = 0;
workers->atomic_start_pending = 0;
workers->atomic_worker_start = async_helper;
+   workers->stopping = 0;
 }
 
 /*
@@ -480,15 +484,19 @@ static int __btrfs_start_workers(struct btrfs_workers 
*workers)
atomic_set(&worker->num_pending, 0);
atomic_set(&worker->refs, 1);
worker->workers = workers;
-   worker->task = kthread_run(worker_loop, worker,
-  "btrfs-%s-%d", workers->name,
-  workers->num_workers + 1);
+   worker->task = kthread_create(worker_loop, worker,
+ "btrfs-%s-%d", workers->name,
+ workers->num_workers + 1);
if (IS_ERR(worker->task)) {
ret = PTR_ERR(worker->task);
-   kfree(worker);
goto fail;
}
+
spin_lock_irq(&workers->lock);
+   if (workers->stopping) {
+   spin_unlock_irq(&workers->lock);
+   goto fail_kthread;
+   }
list_add_tail(&worker->worker_list, &workers->idle_list);
worker->idle = 1;
workers->num_workers++;
@@ -496,8 +504,13 @@ static int __btrfs_start_workers(struct btrfs_workers 
*workers)
WARN_ON(workers->num_workers_starting < 0);
spin_unlock_irq(&workers->lock);
 
+   wake_up_process(worker->task);
return 0;
+
+fail_kthread:
+   kthread_stop(worker->task);
 fail:
+   kfree(worker);
spin_lock_irq

[PATCH] Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing

2013-10-02 Thread Ilya Dryomov

free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents.  Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/dev-replace.c |5 +
 fs/btrfs/volumes.c |7 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index a644353..5aa0718 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -535,10 +535,7 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list);
 
btrfs_rm_dev_replace_srcdev(fs_info, src_device);
-   if (src_device->bdev) {
-   /* zero out the old super */
-   btrfs_scratch_superblock(src_device);
-   }
+
/*
 * this is again a consistent state where no dev_replace procedure
 * is running, the target device is part of the filesystem, the
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0052ca8..a9ac68a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1715,6 +1715,7 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
 struct btrfs_device *srcdev)
 {
WARN_ON(!mutex_is_locked(&fs_info->fs_devices->device_list_mutex));
+
list_del_rcu(&srcdev->dev_list);
list_del_rcu(&srcdev->dev_alloc_list);
fs_info->fs_devices->num_devices--;
@@ -1724,9 +1725,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
}
if (srcdev->can_discard)
fs_info->fs_devices->num_can_discard--;
-   if (srcdev->bdev)
+   if (srcdev->bdev) {
fs_info->fs_devices->open_devices--;
 
+   /* zero out the old super */
+   btrfs_scratch_superblock(srcdev);
+   }
+
call_rcu(&srcdev->rcu, free_device);
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix the dev-replace suspend sequence

2013-10-07 Thread Ilya Dryomov

Replace progresses strictly from lower to higher offsets, and the
progress is tracked in chunks, by storing the physical offset of the
dev_extent which is being copied in the cursor_left field of
btrfs_dev_replace_item.  When we are done copying the chunk,
left_cursor is updated to point one byte past the dev_extent, so that
on resume we can skip the dev_extents that have already been copied.

There is a major bug (which goes all the way back to the inception of
dev-replace in 3.8) in the way left_cursor is bumped: the bump is done
unconditionally, without any regard to the scrub_chunk return value.
On suspend (and also on any kind of error) scrub_chunk returns early,
i.e. without completing the copy.  This leads to us skipping the chunk
that hasn't been fully copied yet when resuming.

Fix this by doing the cursor_left update only if scrub_chunk ret is 0.
(On suspend scrub_chunk returns with -ECANCELED, so this fix covers
both suspend and error cases.)

Cc: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/scrub.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a18e0e2..f21e2df 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2717,8 +2717,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
mutex_unlock(&fs_info->scrub_lock);
wake_up(&fs_info->scrub_pause_wait);
 
-   dev_replace->cursor_left = dev_replace->cursor_right;
-   dev_replace->item_needs_writeback = 1;
btrfs_put_block_group(cache);
if (ret)
break;
@@ -2732,6 +2730,9 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
break;
}
 
+   dev_replace->cursor_left = dev_replace->cursor_right;
+   dev_replace->item_needs_writeback = 1;
+
key.offset = found_key.offset + length;
btrfs_release_path(path);
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: nuke a bogus rw_devices decrement in __btrfs_close_devices

2013-10-10 Thread Ilya Dryomov

On mount failures, __btrfs_close_devices can be called well before
dev-replace state is read and ->is_tgtdev_for_dev_replace is set.  This
leads to a bogus decrement of ->rw_devices and sets off a WARN_ON in
__btrfs_close_devices if replace target device happens to be on the
lists and we fail early in the mount sequence.  Fix this by checking
the devid instead of ->is_tgtdev_for_dev_replace before the decrement:
for replace targets devid is always equal to BTRFS_DEV_REPLACE_DEVID.

Cc: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 043b215..a306db9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -666,7 +666,8 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
if (device->bdev)
fs_devices->open_devices--;
 
-   if (device->writeable && !device->is_tgtdev_for_dev_replace) {
+   if (device->writeable &&
+   device->devid != BTRFS_DEV_REPLACE_DEVID) {
list_del_init(&device->dev_alloc_list);
fs_devices->rw_devices--;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: don't leak ioctl args in btrfs_ioctl_dev_replace

2013-10-10 Thread Ilya Dryomov

struct btrfs_ioctl_dev_replace_args memory is leaked if replace is
requested on a read-only filesystem.  Fix it.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/ioctl.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..17bb98e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3679,9 +3679,10 @@ static long btrfs_ioctl_dev_replace(struct btrfs_root 
*root, void __user *arg)
 
switch (p->cmd) {
case BTRFS_IOCTL_DEV_REPLACE_CMD_START:
-   if (root->fs_info->sb->s_flags & MS_RDONLY)
-   return -EROFS;
-
+   if (root->fs_info->sb->s_flags & MS_RDONLY) {
+   ret = -EROFS;
+   goto out;
+   }
if (atomic_xchg(
&root->fs_info->mutually_exclusive_operation_running,
1)) {
@@ -3707,7 +3708,7 @@ static long btrfs_ioctl_dev_replace(struct btrfs_root 
*root, void __user *arg)
 
if (copy_to_user(arg, p, sizeof(*p)))
ret = -EFAULT;
-
+out:
kfree(p);
return ret;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts

2013-10-10 Thread Ilya Dryomov

For both balance and replace, cancelling involves changing the on-disk
state and committing a transaction, which is not a good thing to do on
read-only filesystems.

Cc: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/dev-replace.c |3 +++
 fs/btrfs/volumes.c |3 +++
 2 files changed, 6 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9efb94e..98df261 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -650,6 +650,9 @@ static u64 __btrfs_dev_replace_cancel(struct btrfs_fs_info 
*fs_info)
u64 result;
int ret;
 
+   if (fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
btrfs_dev_replace_lock(dev_replace);
switch (dev_replace->replace_state) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a306db9..2630f38 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3424,6 +3424,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 {
+   if (fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
mutex_lock(&fs_info->balance_mutex);
if (!fs_info->balance_ctl) {
mutex_unlock(&fs_info->balance_mutex);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts

2013-10-11 Thread Ilya Dryomov

On Fri, Oct 11, 2013 at 4:13 AM, Wang Shilong
 wrote:
> On 10/11/2013 01:40 AM, Ilya Dryomov wrote:
>
> I have a question in my mind.
>
> Can we reach a state that there is operation in progress when filesystem
> has been readonly?If we do cancel operations on a ro filesystem, we should
> get "No operations in progress" .

So if I understand you correctly you are saying that we should return
the "Not in progress" error code if the filesystem is mounted ro and
there is nothing to cancel.  I actually did think about it, but decided
against it because one can extend this argument to starting commands
with something like "If I order a start on a read-only fs and that
operation has already been started, I should get EINPROGRESS.", and
that's not what we are currently doing.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts

2013-10-11 Thread Ilya Dryomov

On Fri, Oct 11, 2013 at 12:28 PM, Remco Hosman - Yerf-IT
 wrote:
> i would expect a RO mount never to write anything to a filesystem. not even
> replay a journal (or a seperate option for that).
> Its possible that the device is not writable at all, if its a snapshot or a
> RO iscsi device of some kind.

I agree, and that's exactly the point of this patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] Btrfs: fix __btrfs_start_workers retval

2013-11-03 Thread Ilya Dryomov

__btrfs_start_workers returns 0 in case it raced with
btrfs_stop_workers and lost the race.  This is wrong because worker in
this case is not allowed to start and is in fact destroyed.  Return
-EINVAL instead.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/async-thread.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 08cc08f..1d5f3d7 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -495,6 +495,7 @@ static int __btrfs_start_workers(struct btrfs_workers 
*workers)
spin_lock_irq(&workers->lock);
if (workers->stopping) {
spin_unlock_irq(&workers->lock);
+   ret = -EINVAL;
goto fail_kthread;
}
list_add_tail(&worker->worker_list, &workers->idle_list);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] Btrfs: do not inc uncorrectable_errors counter on ro scrubs

2013-11-03 Thread Ilya Dryomov

Currently if we discover an error when scrubbing in ro mode we a)
blindly increment the uncorrectable_errors counter, and b) spam the
dmesg with the 'unable to fixup (regular) error at ...' message, even
though a) we haven't tried to determine if the error is correctable or
not, and b) we haven't tried to fixup anything.  Fix this.

Cc: Stefan Behrens 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/scrub.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f21e2df..f94b98d 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -938,8 +938,10 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
BTRFS_DEV_STAT_CORRUPTION_ERRS);
}
 
-   if (sctx->readonly && !sctx->is_dev_replace)
-   goto did_not_correct_error;
+   if (sctx->readonly) {
+   ASSERT(!sctx->is_dev_replace);
+   goto out;
+   }
 
if (!is_metadata && !have_csum) {
struct scrub_fixup_nodatasum *fixup_nodatasum;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] Btrfs: disable online raid-repair on ro mounts

2013-11-03 Thread Ilya Dryomov

This disables the "if needed, write the good copy back before the read
is completed" part of the read sequence for read-only mounts.

Cc: Jan Schmidt 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/extent_io.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c09a40d..a0b7403 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1979,6 +1979,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
start,
struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
int ret;
 
+   ASSERT(!(fs_info->sb->s_flags & MS_RDONLY));
BUG_ON(!mirror_num);
 
/* we can't repair anything in raid56 yet */
@@ -2035,6 +2036,9 @@ int repair_eb_io_failure(struct btrfs_root *root, struct 
extent_buffer *eb,
unsigned long i, num_pages = num_extent_pages(eb->start, eb->len);
int ret = 0;
 
+   if (root->fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
for (i = 0; i < num_pages; i++) {
struct page *p = extent_buffer_page(eb, i);
ret = repair_io_failure(root->fs_info, start, PAGE_CACHE_SIZE,
@@ -2056,12 +2060,12 @@ static int clean_io_failure(u64 start, struct page 
*page)
u64 private;
u64 private_failure;
struct io_failure_record *failrec;
-   struct btrfs_fs_info *fs_info;
+   struct inode *inode = page->mapping->host;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
struct extent_state *state;
int num_copies;
int did_repair = 0;
int ret;
-   struct inode *inode = page->mapping->host;
 
private = 0;
ret = count_range_bits(&BTRFS_I(inode)->io_failure_tree, &private,
@@ -2084,6 +2088,8 @@ static int clean_io_failure(u64 start, struct page *page)
did_repair = 1;
goto out;
}
+   if (fs_info->sb->s_flags & MS_RDONLY)
+   goto out;
 
spin_lock(&BTRFS_I(inode)->io_tree.lock);
state = find_first_extent_bit_state(&BTRFS_I(inode)->io_tree,
@@ -2093,7 +2099,6 @@ static int clean_io_failure(u64 start, struct page *page)
 
if (state && state->start <= failrec->start &&
state->end >= failrec->start + failrec->len - 1) {
-   fs_info = BTRFS_I(inode)->root->fs_info;
num_copies = btrfs_num_copies(fs_info, failrec->logical,
  failrec->len);
if (num_copies > 1)  {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] Btrfs-progs: add btrfsck functionality to btrfs

2013-11-14 Thread Ilya Dryomov

On Wed, Nov 13, 2013 at 7:13 PM, David Sterba  wrote:
> Hi,
>
> On Sun, Jun 02, 2013 at 05:47:38PM +0200, Dieter Ries wrote:
>> For this to have any effect, 'h' must be added to getopt_long(), see
>> attached patch 1.
>>
>> However, this results in btrfsck -h and --help doing different things:
>>
>> --help prints the usage message to stdout and exits with exit(0).
>> -h prints the usage message to stderr and exits with exit(129).
>>
>> I made a patch to fix this, see attached patch 2.
>> What it doesn't fix though is, that -h/--help and -? don't do the same
>> thing. This is more complicated, as getop_long returns '?' for unknown
>> options.
>
> FYI, both patchess added to integration.

Hi David,

FWIW, I think none of the btrfs sub-commands treat -h as a help option.
(This is an artifact that was inherited from the the old btrfs-progs
utility.)  -h vs --help is actually consistent: -h results in a "btrfs
check: invalid option -- 'h'" message, and therefore exits with 129.
Since 'btrfs check -h' has clearly never worked we might want to keep
the status quo.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Allow forced conversion of metadata to dup profile on multiple devices

2014-02-24 Thread Ilya Dryomov

On Thu, Feb 20, 2014 at 6:57 PM, David Sterba  wrote:
> On Wed, Feb 19, 2014 at 11:10:41AM -0500, Austin S Hemmelgarn wrote:
>> Currently, btrfs balance start fails when trying to convert metadata or
>> system chunks to dup profile on filesystems with multiple devices.  This
>> requires that a conversion from a multi-device filesystem to a single
>> device filesystem use the following methodology:
>> 1. btrfs balance start -dconvert=single -mconvert=single \
>>-sconvert=single -f /
>> 2. btrfs device delete / /dev/sdx
>> 3. btrfs balance start -mconvert=dup -sconvert=dup /
>> This results in a period of time (possibly very long if the devices are
>> big) where you don't have the protection guarantees of multiple copies
>> of metadata chunks.
>>
>> After applying this patch, one can instead use the following methodology
>> for conversion from a multi-device filesystem to a single device
>> filesystem:
>> 1. btrfs balance start -dconvert=single -mconvert=dup \
>>-sconvert=dup -f /
>> 2. btrfs device delete / /dev/sdx
>> This greatly reduces the chances of the operation causing data loss due
>> to a read error during the device delete.
>>
>> Signed-off-by: Austin S. Hemmelgarn 
> Reviewed-by: David Sterba 
>
> Sounds useful. The muliple devices + DUP is allowed setup when the
> device is added, this patch only adds the 'delete' counterpart. The
> imroved data loss protection during the process is a good thing.

Hi,

Have you actually tried to queue it?  Unless I'm missing something, it won't
compile, and on top of that, it seems to be corrupted too..

IIRC muliple devices + DUP is "allowed" only until the first balance, has that
changed?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Allow forced conversion of metadata to dup profile on multiple devices

2014-02-24 Thread Ilya Dryomov

On Mon, Feb 24, 2014 at 3:44 PM, Austin S Hemmelgarn
 wrote:
> On 2014-02-24 08:37, Ilya Dryomov wrote:
>> On Thu, Feb 20, 2014 at 6:57 PM, David Sterba  wrote:
>>> On Wed, Feb 19, 2014 at 11:10:41AM -0500, Austin S Hemmelgarn wrote:
>>>> Currently, btrfs balance start fails when trying to convert metadata or
>>>> system chunks to dup profile on filesystems with multiple devices.  This
>>>> requires that a conversion from a multi-device filesystem to a single
>>>> device filesystem use the following methodology:
>>>> 1. btrfs balance start -dconvert=single -mconvert=single \
>>>>-sconvert=single -f /
>>>> 2. btrfs device delete / /dev/sdx
>>>> 3. btrfs balance start -mconvert=dup -sconvert=dup /
>>>> This results in a period of time (possibly very long if the devices are
>>>> big) where you don't have the protection guarantees of multiple copies
>>>> of metadata chunks.
>>>>
>>>> After applying this patch, one can instead use the following methodology
>>>> for conversion from a multi-device filesystem to a single device
>>>> filesystem:
>>>> 1. btrfs balance start -dconvert=single -mconvert=dup \
>>>>-sconvert=dup -f /
>>>> 2. btrfs device delete / /dev/sdx
>>>> This greatly reduces the chances of the operation causing data loss due
>>>> to a read error during the device delete.
>>>>
>>>> Signed-off-by: Austin S. Hemmelgarn 
>>> Reviewed-by: David Sterba 
>>>
>>> Sounds useful. The muliple devices + DUP is allowed setup when the
>>> device is added, this patch only adds the 'delete' counterpart. The
>>> imroved data loss protection during the process is a good thing.
>>
>> Hi,
>>
>> Have you actually tried to queue it?  Unless I'm missing something, it won't
>> compile, and on top of that, it seems to be corrupted too..
> The patch itself was made using git, AFAICT it should be fine.  I've
> personally built and tested it using UML.

It doesn't look fine.  It was generated with git, but it got corrupted
on the way: either how you pasted it or the email client you use is the
problem.

On Wed, Feb 19, 2014 at 6:10 PM, Austin S Hemmelgarn
 wrote:
> Currently, btrfs balance start fails when trying to convert metadata or
> system chunks to dup profile on filesystems with multiple devices.  This
> requires that a conversion from a multi-device filesystem to a single
> device filesystem use the following methodology:
> 1. btrfs balance start -dconvert=single -mconvert=single \
>-sconvert=single -f /
> 2. btrfs device delete / /dev/sdx
> 3. btrfs balance start -mconvert=dup -sconvert=dup /
> This results in a period of time (possibly very long if the devices are
> big) where you don't have the protection guarantees of multiple copies
> of metadata chunks.
>
> After applying this patch, one can instead use the following methodology
> for conversion from a multi-device filesystem to a single device
> filesystem:
> 1. btrfs balance start -dconvert=single -mconvert=dup \
>-sconvert=dup -f /
> 2. btrfs device delete / /dev/sdx
> This greatly reduces the chances of the operation causing data loss due
> to a read error during the device delete.
>
> Signed-off-by: Austin S. Hemmelgarn 
> ---
>  fs/btrfs/volumes.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 07629e9..38a9522 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3152,10 +3152,8 @@ int btrfs_balance(struct btrfs_balance_control
> *bctl,

^^^, that should be a single line

> num_devices--;
> }
> btrfs_dev_replace_unlock(&fs_info->dev_replace);
> -   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
> -   if (num_devices == 1)
> -   allowed |= BTRFS_BLOCK_GROUP_DUP;
> -   else if (num_devices > 1)
> +   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
> +   if (num_devices > 1)
> allowed |= (BTRFS_BLOCK_GROUP_RAID0 | 
> BTRFS_BLOCK_GROUP_RAID1);
> if (num_devices > 2)
> allowed |= BTRFS_BLOCK_GROUP_RAID5;
> @@ -3221,6 +3219,21 @@ int btrfs_balance(struct btrfs_balance_control
> *bctl,

^^^, ditto

> goto out;
> }
> }
> +   if (((bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) &&
> +   (bct

[PATCH] Btrfs: fix max chunk size check in chunk allocator

2012-04-13 Thread Ilya Dryomov

Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.

Cc: Arne Jansen 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a872b48..c462d84 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3324,12 +3324,14 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
stripe_size = devices_info[ndevs-1].max_avail;
num_stripes = ndevs * dev_stripes;
 
-   if (stripe_size * num_stripes > max_chunk_size * ncopies) {
+   if (stripe_size * ndevs > max_chunk_size * ncopies) {
stripe_size = max_chunk_size * ncopies;
-   do_div(stripe_size, num_stripes);
+   do_div(stripe_size, ndevs);
}
 
do_div(stripe_size, dev_stripes);
+
+   /* align to BTRFS_STRIPE_LEN */
do_div(stripe_size, BTRFS_STRIPE_LEN);
stripe_size *= BTRFS_STRIPE_LEN;
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'filesystem resize max' tries to use devid 1

2012-04-24 Thread Ilya Dryomov

On Tue, Apr 24, 2012 at 11:31:15AM -0400, Josef Bacik wrote:
> On Sun, Apr 22, 2012 at 11:03:06PM -0400, Jeremy Atkins wrote:
> > Back story:
> > I started my pool with a 200gb partition at the end of my drive (sdc5)
> > , until I was able to clear out the data at the beginning of my drive.
> >  When I was ready, I ran `btrfs dev add /dev/sdc4 /` then `btrfs dev
> > del /dev/sdc5 /`,
> > 
> > $ sudo btrfs fi resize max /
> > Resize '/' of 'max'
> > ERROR: unable to resize '/' - Invalid argument
> > 
> > in dmesg i see:
> > [   72.729685] btrfs: resizer unable to find device 1
> > 
> > $ sudo btrfs fi df /
> > Data: total=34.00GB, used=17.79GB
> > System, DUP: total=64.00MB, used=20.00KB
> > Metadata, DUP: total=33.50GB, used=394.93MB
> > 
> > $ sudo btrfs fi sh
> > Label: none  uuid: b0ad55e2-09e0-4658-8cab-d2e11ba03753
> >         Total devices 1 FS bytes used 17.18GB
> >         devid    2 size 1.62TB used 101.12GB path /dev/sdc4
> > 
> > 
> > $ uname -r
> > 3.3.2-1.fc16.x86_64
> > 
> > btrfs-progs version is current git master ( commit 1957076 )
> > 
> > After writing this email, and searching around the wiki some, I
> > discovered the command to resize specific devids,
> > 
> > [antrat@tbox ~]$ sudo btrfs fi resize 2:max /
> > Resize '/' of '2:max'
> > 
> > and in dmesg:
> > [ 1661.933884] btrfs: resizing devid 2
> > [ 1661.933895] btrfs: new size for /dev/sdc4 is 1995564908544
> 
> Hrm, if we're using filesystem resize max we should probably just max out all
> the devices and not require one specific device, what do you think Ilya?

I'd keep the devid:max syntax for maxing out a single device and add a
new fs:max or similar to resize all devices.  ABI here is just a string,
so I think it can be easily done.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'filesystem resize max' tries to use devid 1

2012-04-24 Thread Ilya Dryomov

On Tue, Apr 24, 2012 at 11:42:26AM -0400, Josef Bacik wrote:
> On Tue, Apr 24, 2012 at 06:39:03PM +0300, Ilya Dryomov wrote:
> > On Tue, Apr 24, 2012 at 11:31:15AM -0400, Josef Bacik wrote:
> > > On Sun, Apr 22, 2012 at 11:03:06PM -0400, Jeremy Atkins wrote:
> > > > Back story:
> > > > I started my pool with a 200gb partition at the end of my drive (sdc5)
> > > > , until I was able to clear out the data at the beginning of my drive.
> > > >  When I was ready, I ran `btrfs dev add /dev/sdc4 /` then `btrfs dev
> > > > del /dev/sdc5 /`,
> > > > 
> > > > $ sudo btrfs fi resize max /
> > > > Resize '/' of 'max'
> > > > ERROR: unable to resize '/' - Invalid argument
> > > > 
> > > > in dmesg i see:
> > > > [   72.729685] btrfs: resizer unable to find device 1
> > > > 
> > > > $ sudo btrfs fi df /
> > > > Data: total=34.00GB, used=17.79GB
> > > > System, DUP: total=64.00MB, used=20.00KB
> > > > Metadata, DUP: total=33.50GB, used=394.93MB
> > > > 
> > > > $ sudo btrfs fi sh
> > > > Label: none  uuid: b0ad55e2-09e0-4658-8cab-d2e11ba03753
> > > >         Total devices 1 FS bytes used 17.18GB
> > > >         devid    2 size 1.62TB used 101.12GB path /dev/sdc4
> > > > 
> > > > 
> > > > $ uname -r
> > > > 3.3.2-1.fc16.x86_64
> > > > 
> > > > btrfs-progs version is current git master ( commit 1957076 )
> > > > 
> > > > After writing this email, and searching around the wiki some, I
> > > > discovered the command to resize specific devids,
> > > > 
> > > > [antrat@tbox ~]$ sudo btrfs fi resize 2:max /
> > > > Resize '/' of '2:max'
> > > > 
> > > > and in dmesg:
> > > > [ 1661.933884] btrfs: resizing devid 2
> > > > [ 1661.933895] btrfs: new size for /dev/sdc4 is 1995564908544
> > > 
> > > Hrm, if we're using filesystem resize max we should probably just max out 
> > > all
> > > the devices and not require one specific device, what do you think Ilya?
> > 
> > I'd keep the devid:max syntax for maxing out a single device and add a
> > new fs:max or similar to resize all devices.  ABI here is just a string,
> > so I think it can be easily done.
> > 
> 
> Right I think the devid:max thing is handy for people who want to do specific
> things, but I think just a normal btrfs fi resize max should do all devices 
> for
> simplicity and easy of use, no need for a fs:max.  Thanks,

Sorry, I misread.  'btrfs fi resize max ' should of course resize
the entire FS.  I think this popped up before, at least a couple times.
I'll do it if nobody else wants to.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is it possible to reclaim block groups once they are allocated to data or metadata?

2012-05-06 Thread Ilya Dryomov

On Sun, May 06, 2012 at 01:07:06PM +1000, Mike Sampson wrote:
> On Sat, May 5, 2012 at 10:52 PM, Hugo Mills  wrote:
> > On Sat, May 05, 2012 at 10:37:01PM +1000, Mike Sampson wrote:
> >> Hello list,
> >>
> >> recently reformatted my home partition from XFS to RAID1 btrfs. I used
> >> the default options to mkfs.btrfs except for enabling raid1 for data
> >> as well as metadata. Filesystem is made up of two 1TB drives.
> >>
> >> mike@mercury (0) pts/3 ~ $ sudo btrfs filesystem show
> >> Label: none  uuid: f08a8896-e03e-4064-9b94-9342fb547e47
> >>       Total devices 2 FS bytes used 888.06GB
> >>       devid    1 size 931.51GB used 931.51GB path /dev/sdb1
> >>       devid    2 size 931.51GB used 931.49GB path /dev/sdc1
> >>
> >> Btrfs Btrfs v0.19
> >>
> >> mike@mercury (0) pts/3 ~ $ btrfs filesystem df /home
> >> Data, RAID1: total=893.48GB, used=886.60GB
> >> Data: total=8.00MB, used=0.00
> >> System, RAID1: total=8.00MB, used=136.00KB
> >> System: total=4.00MB, used=0.00
> >> Metadata, RAID1: total=38.00GB, used=2.54GB
> >> Metadata: total=8.00MB, used=0.00
> >>
> >> As can be seen I don't have a lot of free space left and while I am
> >> planning on adding more storage soon I would like to gain a little
> >> breathing room until I can do this. While I don't have a lot of space
> >> remaining in Data, RAID1 I do have a good chunk in Metadata, RAID1.
> >> 2.5GB used out of 38GB. Does this excess become available
> >> automatically to the file system when the block groups in Data, RAID1
> >> are exhausted or, if not, is there a way to manually reallocate them?
> >
> >   Youre best bet at the moment is to try a partial balance of
> > metadata chunks:
> >
> > # btrfs balance start -m /home
> >
> >   That will rewrite all of your metadata, putting it through the
> > allocator again, and removing the original allocated chunks. This
> > should have the effect of reducing the allocation of metadata chunks.
> >
> >   You will need a 3.3 kernel, or later, and an up-to-date userspace
> > from cmason's git repository.
> 
> Gave this a shot and it did help.
> 
> mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ uname -r
> 3.3.4-2-ARCH
> 
> mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ sudo ./btrfs
> balance start -m /home
> 
> Done, had to relocate 40 out of 934 chunks
> 
> mike@mercury (0) pts/3 ~/.../btrfs-progs git:master $ ./btrfs
> filesystem df /home
> Data, RAID1: total=900.97GB, used=880.06GB
> Data: total=8.00MB, used=0.00
> System, RAID1: total=32.00MB, used=136.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=30.50GB, used=2.54GB
> 
> There is now 8GB less in Metadata and I was able to delete some files
> as well to free up space. There is still a lot of wasted space in the
> metadata block groups. It seems that it allocates more metadata block
> groups than required for my filesystem. This will do until I am able
> to add a couple of devices to the system. Is there anyway to adjust
> the block group allocation strategy at filesystem creation?

No.  Chunk allocator currently allocates a lot more chunks than actually
needed, and it impacts both balancing and normal operation.  Try this:

# btrfs balance start -musage=10 /home

This is suboptimal, but it should get rid of more chunks.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is it possible to reclaim block groups once they are allocated to data or metadata?

2012-05-06 Thread Ilya Dryomov

On Sun, May 06, 2012 at 11:37:27AM +0100, Hugo Mills wrote:
> On Sun, May 06, 2012 at 01:26:45PM +0300, Ilya Dryomov wrote:
> > On Sun, May 06, 2012 at 01:07:06PM +1000, Mike Sampson wrote:
> > > There is now 8GB less in Metadata and I was able to delete some files
> > > as well to free up space. There is still a lot of wasted space in the
> > > metadata block groups. It seems that it allocates more metadata block
> > > groups than required for my filesystem. This will do until I am able
> > > to add a couple of devices to the system. Is there anyway to adjust
> > > the block group allocation strategy at filesystem creation?
> > 
> > No.  Chunk allocator currently allocates a lot more chunks than actually
> > needed, and it impacts both balancing and normal operation.  Try this:
> > 
> > # btrfs balance start -musage=10 /home
> > 
> > This is suboptimal, but it should get rid of more chunks.
> 
>While we're talking about it, what is the parameter to the "usage"
> option? I'm assuming it selects chunks which are less than some amount
> full -- but is the value a percentage, or a quantity in megabytes
> (power-of-10 or power-of-2), or something else?

It's a percentage, so the command above will balance out chunks that are
less than 10 percent full.  I'll update btrfs man page and the wiki page
you started as soon as I can.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: balancing metadata fails with no space left on device

2012-05-06 Thread Ilya Dryomov

On Sun, May 06, 2012 at 01:19:38PM +0200, Martin Steigerwald wrote:
> Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald:
> > Am Freitag, 4. Mai 2012 schrieb Martin Steigerwald:
> > > Hi!
> > > 
> > > merkaba:~> btrfs balance start -m /
> > > ERROR: error during balancing '/' - No space left on device
> > > There may be more info in syslog - try dmesg | tail
> > > merkaba:~#19> dmesg | tail -22
> > > [   62.918734] CPU0: Package power limit normal
> > > [  525.229976] btrfs: relocating block group 20422066176 flags 1
> > > [  526.940452] btrfs: found 3048 extents
> > > [  528.803778] btrfs: found 3048 extents
> […]
> > > [  635.906517] btrfs: found 1 extents
> > > [  636.038096] btrfs: 1 enospc errors during balance
> > > 
> > > 
> > > merkaba:~> btrfs filesystem show
> > > failed to read /dev/sr0
> > > Label: 'debian'  uuid: […]
> > > 
> > > Total devices 1 FS bytes used 7.89GB
> > > devid1 size 18.62GB used 17.58GB path /dev/dm-0
> > > 
> > > Btrfs Btrfs v0.19
> > > merkaba:~> btrfs filesystem df /
> > > Data: total=15.52GB, used=7.31GB
> > > System, DUP: total=32.00MB, used=4.00KB
> > > System: total=4.00MB, used=0.00
> > > Metadata, DUP: total=1.00GB, used=587.83MB
> > 
> > I thought data tree might have been to big, so out of curiousity I
> > tried a full balance. It shrunk the data tree but it failed as well:
> > 
> > merkaba:~> btrfs balance start /
> > ERROR: error during balancing '/' - No space left on device
> > There may be more info in syslog - try dmesg | tail
> > merkaba:~#19> dmesg | tail -63
> > [   89.306718] postgres (2876): /proc/2876/oom_adj is deprecated,
> > please use /proc/2876/oom_score_adj instead.
> > [  159.939728] btrfs: relocating block group 21994930176 flags 34
> > [  160.010427] btrfs: relocating block group 21860712448 flags 1
> > [  161.188104] btrfs: found 6 extents
> > [  161.507388] btrfs: found 6 extents
> […]
> > [  335.897953] btrfs: relocating block group 1103101952 flags 1
> > [  347.888295] btrfs: found 28458 extents
> > [  352.736987] btrfs: found 28458 extents
> > [  353.099659] btrfs: 1 enospc errors during balance
> > 
> > merkaba:~> btrfs filesystem df /
> > Data: total=10.00GB, used=7.31GB
> > System, DUP: total=64.00MB, used=4.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=1.12GB, used=587.20MB
> > 
> > merkaba:~> btrfs filesystem show
> > failed to read /dev/sr0
> > Label: 'debian'  uuid: […]
> > Total devices 1 FS bytes used 7.88GB
> > devid1 size 18.62GB used 12.38GB path /dev/dm-0
> > 
> > 
> > For the sake of it I tried another time. It failed again:
> > 
> > martin@merkaba:~> dmesg | tail -32
> > [  353.099659] btrfs: 1 enospc errors during balance
> > [  537.057375] btrfs: relocating block group 32833011712 flags 36
> […]
> > [  641.479140] btrfs: relocating block group 22062039040 flags 34
> > [  641.695614] btrfs: relocating block group 22028484608 flags 34
> > [  641.840179] btrfs: found 1 extents
> > [  641.965843] btrfs: 1 enospc errors during balance
> > 
> > 
> > merkaba:~#19> btrfs filesystem df /
> > Data: total=10.00GB, used=7.31GB
> > System, DUP: total=32.00MB, used=4.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=1.12GB, used=586.74MB
> > merkaba:~> btrfs filesystem show
> > failed to read /dev/sr0
> > Label: 'debian'  uuid: […]
> > Total devices 1 FS bytes used 7.88GB
> > devid1 size 18.62GB used 12.32GB path /dev/dm-0
> > 
> > Btrfs Btrfs v0.19
> > 
> > 
> > Well, in order to be gentle to the SSD again I stop my experiments now
> > ;).
> 
> I had subjective impression that the speed of the BTRFS filesystem 
> decreased after all these
> 
> Anyway, after reading the a -musage hint by Ilya in thread
> 
> Is it possible to reclaim block groups once they ar allocated to data or 
> metadata?

Currently there is no way to reclaim block groups other than performing
a balance.  We will add a kernel thread for this in future, but a couple
of things have to be fixed before that can happen.

> 
> 
> I tried:
> 
> merkaba:~> btrfs filesystem df /
> Data: total=10.00GB, used=7.34GB
> System, DUP: total=32.00MB, used=4.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=1.12GB, used=586.39MB
> 
> merkaba:~> btrfs balance start -musage=1 / 
> Done, had to relocate 2 out of 13 chunks
> 
> merkaba:~> btrfs filesystem df /  
> Data: total=10.00GB, used=7.34GB
> System, DUP: total=32.00MB, used=4.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=1.00GB, used=586.39MB
> 
> So this worked.
> 
> But I wasn´t able to specify less than a Gig:

A follow up to the -musage hint says that the argument it takes is the
percentage.  That is -musage=X will balance out block groups that are
less than X percent used.

> 
> merkaba:~> btrfs balance start -musage=0.8 /
> Invalid usage argument: 0.8
> merkaba:~#1> btrfs balance start -musage=700M /
> Invalid usage argument: 700M
> 
> 
> When I try without usage I get the old behavior back:
> 
>

Re: failed disk (was: kernel 3.3.4 damages filesystem (?))

2012-05-09 Thread Ilya Dryomov

On Wed, May 09, 2012 at 03:37:35PM +0100, Hugo Mills wrote:
> On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote:
> > Du meintest am 07.05.12:
> > 
> > [...]
> > 
> > >> With a file system like ext2/3/4 I can work with several directories
> > >> which are mounted together, but (as said before) one broken disk
> > >> doesn't disturb the others.
> > 
> > >mkfs.btrfs -m raid1 -d single should give you that.
> > 
> > Just a small bug, perhaps:
> > 
> > created a system with
> > 
> > mkfs.btrfs -m raid1 -d single /dev/sdl1
> > mount /dev/sdl1 /mnt/Scsi
> > btrfs device add /dev/sdk1 /mnt/Scsi
> > btrfs device add /dev/sdm1 /mnt/Scsi
> > (filling with data)
> > 
> > and
> > 
> > btrfs fi df /mnt/Scsi
> > 
> > now tells
> > 
> > Data, RAID0: total=183.18GB, used=76.60GB
> > Data: total=80.01GB, used=79.83GB
> > System, DUP: total=8.00MB, used=32.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=1.00GB, used=192.74MB
> > Metadata: total=8.00MB, used=0.00
> > 
> > --
> > 
> > "Data, RAID0" confuses me (not very much ...), and the system for  
> > metadata (RAID1) is not told.
> 
>DUP is two copies of each block, but it allows the two copies to
> live on the same device. It's done this because you started with a
> single device, and you can't do RAID-1 on one device. The first bit of

What Hugo said.  Newer mkfs.btrfs will error out if you try to do this.

> metadata you write to it should automatically upgrade the DUP chunk to
> RAID-1.

We don't "upgrade" chunks in place, only during balance.

> 
>As to the spurious "upgrade" of single to RAID-0, I thought Ilya
> had stopped it doing that. What kernel version are you running?

I did, but again, we were doing it only as part of balance, not as part
of normal operation.

Helmut, do you have any additional data points - the output of btrfs fi
df right after you created FS or somewhere in the middle of filling it ?

Also could you please paste the output of btrfs fi show and tell us what
kernel version you are running ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] Btrfs: count the chunks which will be relocated at first

2012-05-17 Thread Ilya Dryomov

On Thu, May 17, 2012 at 07:56:53PM +0800, Miao Xie wrote:
> the balance function should count the chunks which will be relocated at first,
> and then relocate those chunks one by one.
> 
> Signed-off-by: Miao Xie 
> ---
>  fs/btrfs/volumes.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 759d024..91da8a2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2580,7 +2580,7 @@ again:
>  
>   chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
>  
> - if (!counting) {
> + if (counting) {
>   spin_lock(&fs_info->balance_lock);
>   bctl->stat.considered++;
>   spin_unlock(&fs_info->balance_lock);

__btrfs_balance() already calculates the approximate number of chunks
that will be relocated and stores that value in bctl->stat.expected.
The stat.considered counter OTOH is supposed to reflect the number of
chunks processed through balance filters and it is meant to be updated
at relocation pass, so AFAICS if (!counting) is the right test.

What exactly are you trying to fix here ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] Btrfs: pause/recover the space balance when doing remount

2012-05-17 Thread Ilya Dryomov

On Thu, May 17, 2012 at 07:57:40PM +0800, Miao Xie wrote:
> pause the space balance threads when remounting the fs to be readonly,
> and recover it when remounting it from r/o to r/w
> 
> Signed-off-by: Miao Xie 
> ---
>  fs/btrfs/super.c   |9 -
>  fs/btrfs/volumes.c |8 +++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 7deb00e..ea17f0a 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1148,6 +1148,9 @@ static int btrfs_remount(struct super_block *sb, int 
> *flags, char *data)
>   if (*flags & MS_RDONLY) {
>   sb->s_flags |= MS_RDONLY;
>  
> + /* pause restriper - we want to resume on remount to r/w */
> + btrfs_pause_balance(root->fs_info);
> +
>   ret = btrfs_commit_super(root);
>   if (ret)
>   goto restore;
> @@ -1174,7 +1177,10 @@ static int btrfs_remount(struct super_block *sb, int 
> *flags, char *data)
>   if (ret)
>   goto restore;
>  
> - sb->s_flags &= ~MS_RDONLY;
> + if (sb->s_flags & MS_RDONLY) {
> + sb->s_flags &= ~MS_RDONLY;
> + btrfs_recover_balance(fs_info->tree_root);
> + }
>   }
>  
>   return 0;
> @@ -1190,6 +1196,7 @@ restore:
>   fs_info->alloc_start = old_alloc_start;
>   fs_info->thread_pool_size = old_thread_pool_size;
>   fs_info->metadata_ratio = old_metadata_ratio;
> +
>   return ret;
>  }
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 91da8a2..c536d52 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2833,7 +2833,13 @@ static int balance_kthread(void *data)
>   mutex_lock(&fs_info->volume_mutex);
>   mutex_lock(&fs_info->balance_mutex);
>  
> - set_balance_control(bctl);
> + if (fs_info->balance_ctl) {
> + kfree(bctl);
> + bctl = fs_info->balance_ctl;
> + bctl->flags = bctl->flags | BTRFS_BALANCE_RESUME;
> + } else {
> + set_balance_control(bctl);
> + }
>  
>   if (btrfs_test_opt(fs_info->tree_root, SKIP_BALANCE)) {
>   printk(KERN_INFO "btrfs: force skipping balance\n");

This is a known bug.  There is a deeper problem here, related to the
fact that we restore balancing state not early enough and that we don't
restore it on ro mounts at all.  I have a patch in the works to fix that
problem, and it also fixes this one the right way.  I'm backed up with
other things right now, but I'll post it as soon as I get a chance.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] Btrfs: fix memory leak in btrfs_pause_balance()

2012-05-17 Thread Ilya Dryomov

On Thu, May 17, 2012 at 07:58:53PM +0800, Miao Xie wrote:
> We forget to free fs_info->balance_ctl in the btrfs_pause_balance()
> when umounting the fs.
> 
> Signed-off-by: Miao Xie 
> ---
>  fs/btrfs/volumes.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c536d52..fd7fe80 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2937,6 +2937,9 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
>   ret = -ENOTCONN;
>   }
>  
> + if (btrfs_fs_closing(fs_info) && fs_info->balance_ctl)
> + unset_balance_control(fs_info);
> +
>   mutex_unlock(&fs_info->balance_mutex);
>   return ret;
>  }

It is kfree()'d in free_fs_info(), which should be called on unmount.
Am I missing something here ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: do not return EINVAL instead of ENOMEM from open_ctree()

2012-06-22 Thread Ilya Dryomov

When bailing from open_ctree() err is returned, not ret.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/disk-io.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e1890b1..f06db81 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2244,7 +2244,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(&fs_info->caching_workers);
ret |= btrfs_start_workers(&fs_info->readahead_workers);
if (ret) {
-   ret = -ENOMEM;
+   err = -ENOMEM;
goto fail_sb_buffer;
}
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting

2012-06-22 Thread Ilya Dryomov

There used to be a BUG_ON(ret) there before EH patch (79787eaa) went in.
Bail out with EINVAL.

Cc: David Sterba 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/disk-io.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f06db81..77872da 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2461,8 +2461,8 @@ retry_root_backup:
 
if (!(sb->s_flags & MS_RDONLY)) {
ret = btrfs_cleanup_fs_roots(fs_info);
-   if (ret) {
-   }
+   if (ret)
+   goto fail_trans_kthread;
 
ret = btrfs_recover_relocation(tree_root);
if (ret < 0) {
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] Btrfs: resume balance on rw (re)mounts properly

2012-06-22 Thread Ilya Dryomov

This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/disk-io.c |   24 +++-
 fs/btrfs/super.c   |4 
 fs/btrfs/volumes.c |   36 +++-
 fs/btrfs/volumes.h |1 +
 4 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dae7cd6..e863f58 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2490,17 +2490,23 @@ retry_root_backup:
goto fail_trans_kthread;
}
 
-   if (!(sb->s_flags & MS_RDONLY)) {
-   down_read(&fs_info->cleanup_work_sem);
-   err = btrfs_orphan_cleanup(fs_info->fs_root);
-   if (!err)
-   err = btrfs_orphan_cleanup(fs_info->tree_root);
+   if (sb->s_flags & MS_RDONLY)
+   return 0;
+
+   down_read(&fs_info->cleanup_work_sem);
+   if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
+   (ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
up_read(&fs_info->cleanup_work_sem);
+   close_ctree(tree_root);
+   return ret;
+   }
+   up_read(&fs_info->cleanup_work_sem);
 
-   if (err) {
-   close_ctree(tree_root);
-   return err;
-   }
+   ret = btrfs_resume_balance_async(fs_info);
+   if (ret) {
+   printk(KERN_WARNING "btrfs: failed to resume balance\n");
+   close_ctree(tree_root);
+   return ret;
}
 
return 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0eb9a4d..e239915 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1187,6 +1187,10 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
if (ret)
goto restore;
 
+   ret = btrfs_resume_balance_async(fs_info);
+   if (ret)
+   goto restore;
+
sb->s_flags &= ~MS_RDONLY;
}
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f7649b9..0f1778c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2845,28 +2845,46 @@ out:
 
 static int balance_kthread(void *data)
 {
-   struct btrfs_balance_control *bctl =
-   (struct btrfs_balance_control *)data;
-   struct btrfs_fs_info *fs_info = bctl->fs_info;
+   struct btrfs_fs_info *fs_info = data;
int ret = 0;
 
mutex_lock(&fs_info->volume_mutex);
mutex_lock(&fs_info->balance_mutex);
 
-   set_balance_control(bctl);
-
-   if (btrfs_test_opt(fs_info->tree_root, SKIP_BALANCE)) {
-   printk(KERN_INFO "btrfs: force skipping balance\n");
-   } else {
+   if (fs_info->balance_ctl) {
printk(KERN_INFO "btrfs: continuing balance\n");
-   ret = btrfs_balance(bctl, NULL);
+   ret = btrfs_balance(fs_info->balance_ctl, NULL);
}
 
mutex_unlock(&fs_info->balance_mutex);
mutex_unlock(&fs_info->volume_mutex);
+
return ret;
 }
 
+int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info)
+{
+   struct task_struct *tsk;
+
+   spin_lock(&fs_info->balance_lock);
+   if (!fs_info->balance_ctl) {
+   spin_unlock(&fs_info->balance_lock);
+   return 0;
+   }
+   spin_unlock(&fs_info->balance_lock);
+
+   if (btrfs_test_opt(fs_info->tree_root, SKIP_BALANCE)) {
+   printk(KERN_INFO "btrfs: force skipping balance\n");
+   return 0;
+   }
+
+   tsk = kthread_run(balance_kthread, fs_info, "btrfs-balance");
+   if (IS_ERR(tsk))
+   return PTR_ERR(tsk);
+
+   return 0;
+}
+
 int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 {
struct btrfs_balance_control *bctl;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index e1b1a64..95f6637 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -281,6 +281,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 
new_size);
 int btrfs_init_new_device(struct btrfs_root *root, char *path);
 int btrfs_balance(struct btrfs_balance_control *bctl,
  struct btrfs_ioctl_balance_args *bargs);
+int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info);
 int btrfs_recover_balance(struct btrfs_fs_info *fs_info);
 int btrfs_pause_balance(struct btrfs_fs_info *fs_info);
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] Btrfs: restore restriper state on all mounts

2012-06-22 Thread Ilya Dryomov

Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts.  This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/disk-io.c |   15 ++-
 fs/btrfs/volumes.c |   39 +++
 fs/btrfs/volumes.h |2 +-
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 77872da..dae7cd6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2354,17 +2354,22 @@ retry_root_backup:
  BTRFS_CSUM_TREE_OBJECTID, csum_root);
if (ret)
goto recovery_tree_root;
-
csum_root->track_dirty = 1;
 
fs_info->generation = generation;
fs_info->last_trans_committed = generation;
 
+   ret = btrfs_recover_balance(fs_info);
+   if (ret) {
+   printk(KERN_WARNING "btrfs: failed to recover balance\n");
+   goto fail_tree_roots;
+   }
+
ret = btrfs_init_dev_stats(fs_info);
if (ret) {
printk(KERN_ERR "btrfs: failed to init dev_stats: %d\n",
   ret);
-   goto fail_block_groups;
+   goto fail_balance_ctl;
}
 
ret = btrfs_init_space_info(fs_info);
@@ -2492,9 +2497,6 @@ retry_root_backup:
err = btrfs_orphan_cleanup(fs_info->tree_root);
up_read(&fs_info->cleanup_work_sem);
 
-   if (!err)
-   err = btrfs_recover_balance(fs_info->tree_root);
-
if (err) {
close_ctree(tree_root);
return err;
@@ -2518,6 +2520,9 @@ fail_cleaner:
 fail_block_groups:
btrfs_free_block_groups(fs_info);
 
+fail_balance_ctl:
+   kfree(fs_info->balance_ctl);
+
 fail_tree_roots:
free_root_pointers(fs_info, 1);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8a3d259..f7649b9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2867,9 +2867,8 @@ static int balance_kthread(void *data)
return ret;
 }
 
-int btrfs_recover_balance(struct btrfs_root *tree_root)
+int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 {
-   struct task_struct *tsk;
struct btrfs_balance_control *bctl;
struct btrfs_balance_item *item;
struct btrfs_disk_balance_args disk_bargs;
@@ -2882,29 +2881,30 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
if (!path)
return -ENOMEM;
 
-   bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
-   if (!bctl) {
-   ret = -ENOMEM;
-   goto out;
-   }
-
key.objectid = BTRFS_BALANCE_OBJECTID;
key.type = BTRFS_BALANCE_ITEM_KEY;
key.offset = 0;
 
-   ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0);
+   ret = btrfs_search_slot(NULL, fs_info->tree_root, &key, path, 0, 0);
if (ret < 0)
-   goto out_bctl;
+   goto out;
if (ret > 0) { /* ret = -ENOENT; */
ret = 0;
-   goto out_bctl;
+   goto out;
+   }
+
+   bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
+   if (!bctl) {
+   ret = -ENOMEM;
+   goto out;
}
 
leaf = path->nodes[0];
item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_balance_item);
 
-   bctl->fs_info = tree_root->fs_info;
-   bctl->flags = btrfs_balance_flags(leaf, item) | BTRFS_BALANCE_RESUME;
+   bctl->fs_info = fs_info;
+   bctl->flags = btrfs_balance_flags(leaf, item);
+   bctl->flags |= BTRFS_BALANCE_RESUME;
 
btrfs_balance_data(leaf, item, &disk_bargs);
btrfs_disk_balance_args_to_cpu(&bctl->data, &disk_bargs);
@@ -2913,14 +2913,13 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
btrfs_balance_sys(leaf, item, &disk_bargs);
btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
 
-   tsk = kthread_run(balance_kthread, bctl, "btrfs-balance");
-   if (IS_ERR(tsk))
-   ret = PTR_ERR(tsk);
-   else
-   goto out;
+   mutex_lock(&fs_info->volume_mutex);
+   mutex_lock(&fs_info->balance_mutex);
 
-out_bctl:
-   kfree(bctl);
+   set_balance_control(bctl);
+
+   mutex_unlock(&fs_info->balance_mutex);
+   mutex_unlock(&fs_info->volume_mutex);
 out:
btrfs_free_path(path);
return ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 74366f2..e1b1a64 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -281,7 +281,7 @@ int btrfs_shrink_device(struct bt

[PATCH 0/2] Restriper restore bugfix

2012-06-22 Thread Ilya Dryomov

This fixes restriper restore bug which triggered asserts for users with
interrupted balances on rootfs btrfs.

Thanks,

Ilya


Ilya Dryomov (2):
  Btrfs: restore restriper state on all mounts
  Btrfs: resume balance on rw (re)mounts properly

 fs/btrfs/disk-io.c |   39 ++--
 fs/btrfs/super.c   |4 +++
 fs/btrfs/volumes.c |   73 
 fs/btrfs/volumes.h |3 +-
 4 files changed, 76 insertions(+), 43 deletions(-)

-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Btrfs: restore restriper state on all mounts

2012-06-26 Thread Ilya Dryomov

First of all, thanks for reviewing!

On Tue, Jun 26, 2012 at 06:17:39PM +0200, David Sterba wrote:
> On Fri, Jun 22, 2012 at 09:24:12PM +0300, Ilya Dryomov wrote:
> > Fix a bug that triggered asserts in btrfs_balance() in both normal and
> > resume modes -- restriper state was not properly restored on read-only
> > mounts.  This factors out resuming code from btrfs_restore_balance(),
> > which is now also called earlier in the mount sequence to avoid the
> > problem of some early writes getting the old profile.
> > 
> > Signed-off-by: Ilya Dryomov 
> > ---
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 77872da..dae7cd6 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -2492,9 +2497,6 @@ retry_root_backup:
> > err = btrfs_orphan_cleanup(fs_info->tree_root);
> > up_read(&fs_info->cleanup_work_sem);
> >  
> > -   if (!err)
> > -   err = btrfs_recover_balance(fs_info->tree_root);
> > -
> > if (err) {
> > close_ctree(tree_root);
> > return err;
> > @@ -2518,6 +2520,9 @@ fail_cleaner:
> >  fail_block_groups:
> > btrfs_free_block_groups(fs_info);
> >  
> > +fail_balance_ctl:
> > +   kfree(fs_info->balance_ctl);
> 
> I think you need to set fs_info->balance_ctl to NULL, otherwise this
> could lead to double free from free_fs_info. I was looking along the

Yes, I do.  I meant to call unset_balance_control(fs_info) there, but
changed it to kfree(), because of the BUG_ON() in the former.

unset_balance_control(), of course, sets ->balance_ctl to NULL ;)

> call paths and didn't see free_fs_info called on the mount failure path:
> 
> vfs->mount
>   btrfs_mount
> btrfs_fill_super
>   open_ctree
> (recover balance fails, frees ctl)
> 
> error is propagated back to vfs, no other fs callback is done (like
> kill_super which does call free_fs_info).
> 
> The only exit path that is not going through free_fs_info is after error
> from btrfs_fill_super, and this can fail from various reasons.
> 
> Either I'm missing something, or we leak a btrfs_fs_info every time a
> mount fails ...

No, we don't, you just missed it.  It's freed from btrfs_kill_super(),
which is called from deactivate_locked_super() after btrfs_fill_super()
errors out.

> 
> 
> Back to your patch, apart from the balance_ctl pointer reset, both are
> ok and given the number of bug reports [useless padding text here]
> 
>   this should go to 3.5-rc.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] Btrfs-progs: add support to set subvolume/snapshot readonly

2012-06-29 Thread Ilya Dryomov

On Fri, Jun 29, 2012 at 06:00:36PM +0800, Liu Bo wrote:
> Setting subvolume/snapshot readonly has been missing for a long time.
> 
> With this patch, we can set a subvolume/snapshot readonly via:
> 
> obtrfs subvolume set-ro 

Alexander's 'btrfs property' patches do exactly this, but in a much more
generic and extensible way.  'btrfs property' subgroup provides a
uniform interface for getting and setting properties of filesystem
objects in general, not only those of subvolumes and snapshots.  It
provides a much better user interface, and it also allows us to easily
rethink kernel-user interface for generic get/set in future.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] Btrfs-progs: make get default report correctly

2012-06-29 Thread Ilya Dryomov

On Fri, Jun 29, 2012 at 06:00:37PM +0800, Liu Bo wrote:
> We have both set-default and get-default, but actually our get-default
> does not show which one is the default one.
> 
> This patch plus the kernel side patch fix this.

Are you referring to the fact that get-default in Chris' btrfs-progs
master lists all subvolumes instead of showing the default one?  If so,
I fixed that problem a while ago, patch is in Hugo's integration branch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] Btrfs-progs: make get default report correctly

2012-07-02 Thread Ilya Dryomov

On Mon, Jul 02, 2012 at 09:39:45AM +0800, Liu Bo wrote:
> On 06/29/2012 06:36 PM, Ilya Dryomov wrote:
> 
> > On Fri, Jun 29, 2012 at 06:00:37PM +0800, Liu Bo wrote:
> >> We have both set-default and get-default, but actually our get-default
> >> does not show which one is the default one.
> >>
> >> This patch plus the kernel side patch fix this.
> > 
> > Are you referring to the fact that get-default in Chris' btrfs-progs
> > master lists all subvolumes instead of showing the default one?  If so,
> > I fixed that problem a while ago, patch is in Hugo's integration branch.
> > 
> 
> 
> So can you please show me a link about your patch?

I'm sorry, it turns out Hugo hasn't pushed it yet.  Here is a link:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/16187/

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] Btrfs-progs: add support to set subvolume/snapshot readonly

2012-07-02 Thread Ilya Dryomov

On Mon, Jul 02, 2012 at 10:07:42AM +0800, Liu Bo wrote:
> On 06/29/2012 06:21 PM, Ilya Dryomov wrote:
> 
> > On Fri, Jun 29, 2012 at 06:00:36PM +0800, Liu Bo wrote:
> >> Setting subvolume/snapshot readonly has been missing for a long time.
> >>
> >> With this patch, we can set a subvolume/snapshot readonly via:
> >>
> >> obtrfs subvolume set-ro 
> > 
> > Alexander's 'btrfs property' patches do exactly this, but in a much more
> > generic and extensible way.  'btrfs property' subgroup provides a
> > uniform interface for getting and setting properties of filesystem
> > objects in general, not only those of subvolumes and snapshots.  It
> > provides a much better user interface, and it also allows us to easily
> > rethink kernel-user interface for generic get/set in future.
> > 
> 
> 
> Thanks for the explanation!
> 
> But I prefer keeping the current categories {subvolume,filesystem,device,...}:
> 
> o Compatibility, we cannot remove the old commands until we make sure that no 
> users will
>   use them.

We are not going to remove old commands any time soon.  However, adding
new ones that clearly fall into get/set category, is not a good idea.
Especially when there is a generic interface on its way.

> 
> o We've three properties {default, readonly, lable}, is it worthy making 
> another new interface?

It's not just about subvolumes.  There will be a lot more properties on
the table as filesystem matures, for example device speeds, subvolume
profiles, quotas.

> 
> o Current categories are clear and clean.

Once again, it's not just about subvolumes.  Current categories are
indeed clear, but adding two commands for each non-trivial property that
comes up in future does not seem practical to me.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: no dev_stats entry found / OK on first mount after mkfs

2012-07-02 Thread Ilya Dryomov

On Mon, Jul 02, 2012 at 02:46:11PM -0700, Christian Kujau wrote:
> Hi,
> 
> after upgrading from 3.4.0 to 3.5.0-rc5 on this powerpc machine, the 
> following is printed during bootup:
> 
>   [   18.630750] device fsid ce8c9df5-0a93-47c6-adf6-25084f352a4f devid 1 
> transid 11061 /dev/hda7
>   [   18.637193] btrfs: disk space caching is enabled
>   [   18.706423] btrfs: no dev_stats entry found for device /dev/hda7 (devid 
> 1) (OK on first mount after mkfs)
> 
> The btrfs on hda7 has been created many months ago and has been mounted 
> several times since then. Assuming "first mount after mkfs" does not apply 
> here, is it then NOT OK that no dev_stats entry has been found?
> 
> IOW: should I worry about this message?

No, you should not.  If this is your first mount after upgrading your
kernel to the one with btrfs dev stats feature, "first mount after mkfs"
sort of applies.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-05 Thread Ilya Dryomov

On Thu, Jul 05, 2012 at 10:20:16AM -0700, Zach Brown wrote:
> On 07/05/2012 10:14 AM, Alexander Block wrote:
> >On Thu, Jul 5, 2012 at 7:08 PM, Zach Brown  wrote:
> >>
> >>Careful, timespec will be different sizes in 32bit userspace and a 64bit
> >>kernel.  I'd use btrfs_timespec to get a fixed size timespec and avoid
> >>all the compat_timespec noise.  (I'd then also worry about padding and
> >>might pack the struct.. I always lose track of the best practice across
> >>all archs.)
> 
> >Hmm we currently don't have ctree.h in ioctl.h. Can I include it there
> >or are there problems with that? As an alternative I could define my
> >own struct for that.
> 
> Hmm, yeah, it looks like ioctl.h is well isolated and doesn't really
> have a precedent for pulling in format bits from the kernel
> implementation.
> 
> I'd do as you suggested and just make its own ioctl_timespec with a
> comment that its duplicating other similar structures to keep ioctl.h
> from getting tangled up in the kernel-side includes.

This has been done for restriper, see struct btrfs_balance_args vs
struct btrfs_disk_balance_args.  You could do the same thing:

struct btrfs_ioctl_timespec {
__u64 sec;
__u32 nsec;
} __attribute__ ((__packed__));

and take endianess into account with le{64,32}_to_cpu and
cpu_to_le{64,32} macros.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-05 Thread Ilya Dryomov

On Thu, Jul 05, 2012 at 11:37:40AM -0700, Zach Brown wrote:
> 
> >and take endianess into account with le{64,32}_to_cpu and
> >cpu_to_le{64,32} macros.
> 
> The kernel doesn't support system calls from userspace of a different
> endianness, no worries there :)

What if you are on a big-endian machine with a big-endian kernel and
userspace?  Everything on-disk should be little-endian, so if you are
going to write stuff you got from userspace to disk, at some point you
have to make sure you are writing out bytes in the right order.

Alex already does that, so my remarks are moot ;)

+   root_item->stime.sec = cpu_to_le64(sa->stime.tv_sec);
+   root_item->stime.nsec = cpu_to_le64(sa->stime.tv_nsec);
+   root_item->rtime.sec = cpu_to_le64(sa->rtime.tv_sec);
+   root_item->rtime.nsec = cpu_to_le64(sa->rtime.tv_nsec);

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-05 Thread Ilya Dryomov

On Thu, Jul 05, 2012 at 09:18:41PM +0200, Alexander Block wrote:
> On Thu, Jul 5, 2012 at 9:01 PM, Zach Brown  wrote:
> > On 07/05/2012 11:59 AM, Ilya Dryomov wrote:
> >
> >> What if you are on a big-endian machine with a big-endian kernel and
> >> userspace?  Everything on-disk should be little-endian, so if you are
> >> going to write stuff you got from userspace to disk, at some point you
> >> have to make sure you are writing out bytes in the right order.
> >>
> >> Alex already does that, so my remarks are moot ;)
> >
> >
> > Yeah, indeed, we were only talking about the ioctl interface crossing
> > the user<->kernel barrier :).
> >
> I decided to not use __leXX variables in the new btrfs_ioctl_timespec
> structure because as most arguments found ioctls are currently in cpu
> dependent endianess. The kernel will then do the endianess conversion
> as it already did with struct timespec.

That's exactly the point of adding btrfs_ioctl_timespec instead of just
copying btrfs_timespec definition.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs fi df won't show total=

2012-07-09 Thread Ilya Dryomov

On Mon, Jul 09, 2012 at 10:06:24PM +0200, Jan Engelhardt wrote:
> 
> On Monday 2012-07-09 21:25, Hugo Mills wrote:
> >On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote:
> >> 
> >> On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be
> >> observed: after a change of the profiles, total=,used= is no
> >> longer shown:
> >> 
> >> 20:49 mmsrv1:~ # btrfs fi df /top.srv/
> >> Data, RAID10: total=152.00GiB, used=121.07GiB
> >> System, RAID1: total=40.00MiB, used=44.00KiB
> >> System: total=4.00MiB, used=0.00
> >> Metadata, RAID1: total=112.00GiB, used=1.30GiB
> >> Metadata: total=8.00MiB, used=0.00
> >>[...]
> >> 21:10 mmsrv1:~ # btrfs fi df /top.srv/
> >> Data, RAID10: total=156.00GiB, used=124.35GiB
> >> System, RAID10: total=128.00MiB, used=48.00KiB
> >> System: total=4.00MiB, used=0.00
> >> Metadata, RAID10: total=112.00GiB, used=1.38GiB
> >
> >   What's the problem here? You no longer have any RAID1 chunks, so
> >it's not showing them.
> 
> Rather tha a 4-line output, I would have expected this 6-line output
> that I would also get when mkfs'ing a new fresh btrfs volume with
> raid10 from the start:
> 
> Data, RAID10: total=156.00GiB, used=124.35GiB
> Data: total=foo, used=bar
> System, RAID10: total=128.00MiB, used=48.00KiB
> System: total=4.00MiB, used=0.00
> Metadata, RAID10: total=112.00GiB, used=1.38GiB
> Metadata: total=foo, used=bar

mkfs creates those non-raid chunks for a pretty stupid reason, they
really shouldn't be there if you create a raid10 fs.  Balance does the
right thing and removes them.  Fixing this along with another mkfs
annoyance related to this one is on my TODO list.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Change "small" filesystem to normal

2012-07-22 Thread Ilya Dryomov

On Sun, Jul 22, 2012 at 05:06:24PM +0200, Swâmi Petaramesh wrote:
> Hi,
> 
> I've created a "small" BTRFS filesystem, where metadata+data are mixed
> (and metadata are not DUP'ed).
> 
> Then I've enlarged the FS to 1 GB ; now I'd like to make it "normal"
> with separate data and metadata, and "DUP'ed" metadata.
> 
> Is there a way tp do this without reformatting the FS ?

No, currently there is no way to do this.  You'll have to create a new
filesystem with mkfs.btrfs.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Btrfs: cleanup duplicated division functions

2012-09-17 Thread Ilya Dryomov

On Mon, Sep 17, 2012 at 10:21:00AM +0800, Miao Xie wrote:
> On fri, 14 Sep 2012 15:54:18 +0200, David Sterba wrote:
> > On Thu, Sep 13, 2012 at 06:51:36PM +0800, Miao Xie wrote:
> >> div_factor{_fine} has been implemented for two times, cleanup it.
> >> And I move them into a independent file named math.h because they are
> >> common math functions.
> > 
> > You removed the sanity checks:
> > 
> > -   if (factor <= 0)
> > -   return 0;
> > -   if (factor >= 100)
> > -   return num;
> 
> As inline functions, they should not contain complex checks, the caller should
> make sure the parameters are right. I think.

div_factor_fine() in volumes.c is not inline, and is called from
chunk_usage_filter() on unvalidated user input.  If you think the caller
should do those checks, you should move them to the caller as part of
your patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: btrfs subvolume delete could delete subvolumes

2012-09-21 Thread Ilya Dryomov

On Fri, Sep 21, 2012 at 02:54:08PM +0800, Anand jain wrote:
> From: Anand Jain 
> 
> With this user will be able to provide more than one subvolume
> to delete.
> eg: btrfs subvolume delete  
> 
> Signed-off-by: Anand Jain 
> ---
>  cmds-subvolume.c |   36 
>  man/btrfs.8.in   |4 ++--
>  2 files changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/cmds-subvolume.c b/cmds-subvolume.c
> index f4aa80f..cfeaa8d 100644
> --- a/cmds-subvolume.c
> +++ b/cmds-subvolume.c
> @@ -188,31 +188,34 @@ int test_issubvolume(char *path)
>  }
>  
>  static const char * const cmd_subvol_delete_usage[] = {
> - "btrfs subvolume delete ",
> - "Delete a subvolume",
> + "btrfs subvolume delete  [...]",
> + "Delete subvolume(s)",
>   NULL
>  };
>  
>  static int cmd_subvol_delete(int argc, char **argv)
>  {
> - int res, fd, len, e;
> + int res, fd, len, e, cnt = 1, ret = 0;
>   struct btrfs_ioctl_vol_args args;
>   char*dname, *vname, *cpath;
>   char*path;
>  
> - if (check_argc_exact(argc, 2))
> + if (argc < 2)
>   usage(cmd_subvol_delete_usage);

check_argc_min(argc, 2)

Haven't looked at the rest.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions

2012-09-23 Thread Ilya Dryomov

On Fri, Sep 21, 2012 at 05:07:46PM +0800, Miao Xie wrote:
> div_factor{_fine} has been implemented for two times, and these two functions
> are very similar, so cleanup the reduplicate implement and drop the original
> div_factor(), and then rename div_factor_fine() to div_factor(). So the 
> divisor
> of the new div_factor() is 100, not 10.
> 
> And I move div_factor into a independent file named math.h because it is a
> common math function, may be used by every composition of btrfs.
> 
> Because these functions are mostly used on the hot path, and we are sure
> the parameters are right in the most cases, we don't add complex checks
> for the parameters. But in the other place, we must check and make sure
> the parameters are right. So besides the code cleanup, this patch also
> add a check for the usage of the space balance, it is the only place that
> we need add check to make sure the parameters of div_factor are right till 
> now.
> 
> Signed-off-by: Miao Xie 
> ---
> Changelog v2 -> v3:
> - drop the original div_factor and rename div_factor_fine to div_factor
> - drop the check of the factor
> 
> Changelog v1 -> v2:
> - add missing check
> ---
>  fs/btrfs/extent-tree.c |   29 ++---
>  fs/btrfs/ioctl.c   |   18 ++
>  fs/btrfs/math.h|   33 +
>  fs/btrfs/relocation.c  |2 +-
>  fs/btrfs/transaction.c |2 +-
>  fs/btrfs/volumes.c |   30 +-
>  6 files changed, 64 insertions(+), 50 deletions(-)
>  create mode 100644 fs/btrfs/math.h
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a010234..bcb9ced 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -33,6 +33,7 @@
>  #include "volumes.h"
>  #include "locking.h"
>  #include "free-space-cache.h"
> +#include "math.h"
>  
>  #undef SCRAMBLE_DELAYED_REFS
>  
> @@ -648,24 +649,6 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info 
> *info)
>   rcu_read_unlock();
>  }
>  
> -static u64 div_factor(u64 num, int factor)
> -{
> - if (factor == 10)
> - return num;
> - num *= factor;
> - do_div(num, 10);
> - return num;
> -}
> -
> -static u64 div_factor_fine(u64 num, int factor)
> -{
> - if (factor == 100)
> - return num;
> - num *= factor;
> - do_div(num, 100);
> - return num;
> -}
> -
>  u64 btrfs_find_block_group(struct btrfs_root *root,
>  u64 search_start, u64 search_hint, int owner)
>  {
> @@ -674,7 +657,7 @@ u64 btrfs_find_block_group(struct btrfs_root *root,
>   u64 last = max(search_hint, search_start);
>   u64 group_start = 0;
>   int full_search = 0;
> - int factor = 9;
> + int factor = 90;
>   int wrapped = 0;
>  again:
>   while (1) {
> @@ -708,7 +691,7 @@ again:
>   if (!full_search && factor < 10) {
>   last = search_start;
>   full_search = 1;
> - factor = 10;
> + factor = 100;
>   goto again;
>   }
>  found:
> @@ -3513,7 +3496,7 @@ static int should_alloc_chunk(struct btrfs_root *root,
>   if (force == CHUNK_ALLOC_LIMITED) {
>   thresh = btrfs_super_total_bytes(root->fs_info->super_copy);
>   thresh = max_t(u64, 64 * 1024 * 1024,
> -div_factor_fine(thresh, 1));
> +div_factor(thresh, 1));
>  
>   if (num_bytes - num_allocated < thresh)
>   return 1;
> @@ -3521,12 +3504,12 @@ static int should_alloc_chunk(struct btrfs_root *root,
>   thresh = btrfs_super_total_bytes(root->fs_info->super_copy);
>  
>   /* 256MB or 2% of the FS */
> - thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2));
> + thresh = max_t(u64, 256 * 1024 * 1024, div_factor(thresh, 2));
>   /* system chunks need a much small threshold */
>   if (sinfo->flags & BTRFS_BLOCK_GROUP_SYSTEM)
>   thresh = 32 * 1024 * 1024;
>  
> - if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 8))
> + if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 80))
>   return 0;
>   return 1;
>  }
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9384a2a..d8d53f7 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3335,6 +3335,24 @@ static long btrfs_ioctl_balance(struct file *file, 
> void __user *arg)
>  
>   goto do_balance;
>   }
> +
> + if ((bargs->data.flags & BTRFS_BALANCE_ARGS_USAGE) &&
> + (bargs->data.usage < 0 || bargs->data.usage > 100)) {
> + ret = -EINVAL;
> + goto out_bargs;
> + }
> +
> + if ((bargs->meta.flags & BTRFS_BALANCE_ARGS_USAGE) &&
> + (bargs->meta.usage < 0 || bargs->meta.usage > 100)) {
> + ret = -EINVAL;
> + goto out_bargs;
> +

Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions

2012-09-24 Thread Ilya Dryomov

On Mon, Sep 24, 2012 at 06:47:42PM +0200, David Sterba wrote:
> On Mon, Sep 24, 2012 at 10:05:02AM +0800, Miao Xie wrote:
> > Generally, we should check the value when it is input. If not, we might
> > run our program with the wrong value, and it is possible to cause unknown
> > problems. So I think the above chunk is necessary.
> 
> The difference is that giving an invalid value will exit early (your
> version), while Ilya's version will clamp the 'usage' to the allowed
> range during processing. From usability POV, I'd prefer to let the
> command fail early with a verbose error eg. that the range is invalid.

I think that's the job for progs, and that's where this and a few other
checks are currently implemented.

There is no possibility for "unknown problems", that's exactly how it's
been implemented before the cleanup.  The purpose of balance filters
(and the usage filter in particular) is to lower the number of chunks we
have to relocate.  If someone decides to use raw ioctls, and supplies us
with the invalid value, we just cut it down to a 100 and proceed.
usage=100 is the equivalent of not using the usage filter at all, so the
raw ioctl user will get what he deserves.

This is very similar to what we currently do with other filters.  For
example, "soft" can only be used with "convert", and this is checked by
progs.  But, if somebody were to set a "soft" flag without setting
"convert" we would simply ignore that "soft".  Same goes for "drange"
and "devid", invalid ranges, invalid devids, etc.  Pulling all these
checks into the kernel is questionable at best, and, if enough people
insist on it, should be discussed separately.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4 1/2] Btrfs: cleanup duplicated division functions

2012-09-27 Thread Ilya Dryomov

Hi Miao,

You haven't addressed any of my concerns with v3.

On Thu, Sep 27, 2012 at 06:19:58PM +0800, Miao Xie wrote:

(snipped)

> the parameters are right. So besides the code cleanup, this patch also
> add a check for the usage of the space balance, it is the only place that
> we need add check to make sure the parameters of div_factor are right till
> now. Besides that, the old kernel may hold the wrong usage value, so we
> must rectify it.

Cleaning up/unifying duplicated functions and changing the existing
logic are two very different things.  If you, in the course of writing
this patch, became unhappy with the way balancing ioctl deals with
"invalid" input, please send a separate patch.

Before your patch, volumes.c had its own copy of div_factor_fine():

static u64 div_factor_fine(u64 num, int factor)
{
if (factor <= 0)
return 0;
if (factor >= 100)
return num;

num *= factor;
do_div(num, 100);
return num;
}

which was called from chunk_usage_filter() on unvalidated user input.
As far as the cleanup part of your patch goes, you've dropped
factor <= 0 / factor >= 100 logic, merged volumes.c's copy with
extent-tree.c's copy and renamed div_factor_fine() to div_factor().  To
make chunk_usage_filter() happy again, it's enough to move the dropped
logic directly to the call site:

static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
  struct btrfs_balance_args *bargs)
{
...

-   user_thresh = div_factor_fine(cache->key.offset, bargs->usage);
+   if (bargs->usage == 0)
+   user_thresh = 0;
+   else if (bargs->usage >= 100)
+   user_thresh = cache->key.offset;
+   else
+   user_thresh = div_factor(cache->key.offset, bargs->usage);

...
}

So I would suggest you drop all hunks related to changing the way
balancing ioctl works and make the above change to chunk_usage_filter()
instead.  Once again, if you are unhappy with usage filter argument
handling, send a separate patch.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote:
> This serie of patches updated the command "btrfs filesystem
> df". I update this command because it is not so easy to get
> the information about the disk usage from the command "fi df" and "fi show".
> 
> >>From the man page (see 2nd patch):
> 
> [...]
> The  command  btrfs  filesystem  df is used to query the
> status of the chunks, how many space on the disk(s) are used  by
> the  chunks,  how many space are available in the chunks, and an
> estimation of the free space of the filesystem.
> [...]
> 
> $ ./btrfs filesystem df --help
> usage: btrfs filesystem disk-usage [-d][-s][-k]  [..]
> 
> Show space usage information for a mount point(s).
> 
> -k  Set KB (1024 bytes) as unit
> -s  Don't show the summary section
> -d  Don't show the detail section

Hi Goffredo,

Why not show just the summary section by default and have a flag (-v)
for the details section?

> $ ./btrfs filesystem df /
> Path: /
> Summary:
>   Disk_size: 72.57GB
>   Disk_allocated:25.10GB
>   Disk_unallocated:  47.48GB
>   Logical_size:  23.06GB
>   Used:  11.01GB
>   Free_(Estimated):  55.66GB(Max: 59.52GB, Min: 35.78GB)
>   Data_to_disk_ratio:   92 %
> 
> Details:
>   Chunk-type  Mode  Chunk-size Logical-sizeUsed

"Type" for the first column is probably enough.

Why is the third column called Chunk-size?  If my understanding is
correct, it's just a break down of Disk_allocated from the summary
section.  If so, why not call it Disk_allocated to avoid confusion?

Also, why do you use dashes instead of underbars for table headers?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 01:43:15PM +0200, Goffredo Baroncelli wrote:
> The  command  btrfs  filesystem  df is used to query the
> status of the chunks, how many space on the disk(s) are used  by
> the  chunks,  how many space are available in the chunks, and an
> estimation of the free space of the filesystem.
> ---
>  cmds-filesystem.c |  264 
> +++--
>  1 file changed, 215 insertions(+), 49 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index b1457de..6c3ebdc 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c

(snipped)

> +static int cmd_disk_free(int argc, char **argv)
> +{
> +
> + int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT;
> + int i, more_than_one=0;
> +
> + if (check_argc_min(argc, 2))
> + usage(cmd_disk_free_usage);
> +
> + for(i=1; i< argc ; i++){
> + if(!strcmp(argv[i],"-d"))
> + flags &= ~DF_SHOW_DETAIL;
> + else if(!strcmp(argv[i],"-s"))
> + flags &= ~DF_SHOW_SUMMARY;
> + else if(!strcmp(argv[i],"-k"))
> + flags &= ~DF_HUMAN_UNIT;
> + else{
> + int r;
> + if(more_than_one)
> + printf("\n");
> + r = _cmd_disk_free(argv[i], flags);
> + if( r ) return r;
> + more_than_one=1;
> + }

Is there any reason getopt(), or better yet, getopt_long() won't work?

> +
> + }
> +
> + return 0;
> +}
> + 
> +
> + 
>  static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search)
>  {
>   char uuidbuf[37];
> @@ -529,7 +695,7 @@ static int cmd_label(int argc, char **argv)
>  
>  const struct cmd_group filesystem_cmd_group = {
>   filesystem_cmd_group_usage, NULL, {
> - { "df", cmd_df, cmd_df_usage, NULL, 0 },
> + { "df", cmd_disk_free, cmd_disk_free_usage, NULL, 0 },

If this command is going to replace df, you should change the function
name back to cmd_df.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 06:34:38PM +0200, Goffredo Baroncelli wrote:
> On 10/03/2012 05:02 PM, Ilya Dryomov wrote:
> >On Wed, Oct 03, 2012 at 01:43:15PM +0200, Goffredo Baroncelli wrote:
> >>The  command  btrfs  filesystem  df is used to query the
> >>status of the chunks, how many space on the disk(s) are used  by
> >>the  chunks,  how many space are available in the chunks, and an
> >>estimation of the free space of the filesystem.
> >>---
> >>  cmds-filesystem.c |  264 
> >> +++--
> >>  1 file changed, 215 insertions(+), 49 deletions(-)
> >>
> >>diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> >>index b1457de..6c3ebdc 100644
> >>--- a/cmds-filesystem.c
> >>+++ b/cmds-filesystem.c
> >
> >(snipped)
> >
> >>+static int cmd_disk_free(int argc, char **argv)
> >>+{
> >>+
> >>+   int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT;
> >>+   int i, more_than_one=0;
> >>+
> >>+   if (check_argc_min(argc, 2))
> >>+   usage(cmd_disk_free_usage);
> >>+
> >>+   for(i=1; i<  argc ; i++){
> >>+   if(!strcmp(argv[i],"-d"))
> >>+   flags&= ~DF_SHOW_DETAIL;
> >>+   else if(!strcmp(argv[i],"-s"))
> >>+   flags&= ~DF_SHOW_SUMMARY;
> >>+   else if(!strcmp(argv[i],"-k"))
> >>+   flags&= ~DF_HUMAN_UNIT;
> >>+   else{
> >>+   int r;
> >>+   if(more_than_one)
> >>+   printf("\n");
> >>+   r = _cmd_disk_free(argv[i], flags);
> >>+   if( r ) return r;
> >>+   more_than_one=1;
> >>+   }
> >
> >Is there any reason getopt(), or better yet, getopt_long() won't work?
> 
> In the beginning there were also the switches +d, +s, +k, then I
> dropped them. So I leaved this style. The code is simple enough to
> not justify a change.

It's not a matter of style.  It is generally expected that several short
options can be specified after one dash.  Further, if somebody else were
to add an option with an optional parameter, they'd have to spend time
rewriting your stuff.

In addition to that, other parts of progs already use getopt() in
similar situations.  One example would be the scrub subgroup: scrub
commands are equally simple (a few short switches, no parameters), and
yet getopt() is used.

> 
> >
> >>+
> >>+   }
> >>+
> >>+   return 0;
> >>+}  
> >>+   
> >>+
> >>+   
> >>  static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search)
> >>  {
> >>char uuidbuf[37];
> >>@@ -529,7 +695,7 @@ static int cmd_label(int argc, char **argv)
> >>
> >>  const struct cmd_group filesystem_cmd_group = {
> >>filesystem_cmd_group_usage, NULL, {
> >>-   { "df", cmd_df, cmd_df_usage, NULL, 0 },
> >>+   { "df", cmd_disk_free, cmd_disk_free_usage, NULL, 0 },
> >
> >If this command is going to replace df, you should change the function
> >name back to cmd_df.
> 
> I was never convinced to use 'df'. At the beginning when I wrote the
> first parser of btrfs, was suggested (not by me) to use "long"
> command and allow the parser to match a contracted command until
> there was any ambiguity. I suggested to use disk-free, but everybody
> were confortable with df.. so I leaved it as "official name". But I
> prefer for the internal code a more verbose name.

Well, all your patch is doing is extending the functionality of an
existing command.  The "official name" for that command is "df", and you
are not changing its name.  Why change the name of an existing function?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 06:46:00PM +0200, Goffredo Baroncelli wrote:
> On 10/03/2012 05:01 PM, Ilya Dryomov wrote:
> >"Type" for the first column is probably enough.
> >
> >Why is the third column called Chunk-size?  If my understanding is
> >correct, it's just a break down of Disk_allocated from the summary
> >section.  If so, why not call it Disk_allocated to avoid confusion?
> 
> Using everywhere Disk_ was my first attempt. But after
> some thoughts I decided that these are two different kind of
> information. It is true that Disk_allocated is the sum of
> Chunk-Sizes... But my feels is that this is a kind of
> "implementation details". If some other type of allocation unit will
> be added to BTRFS, then these will be added to Disk_allocated, but
> not to Chunk list...
> I prefer to not change the wording until an enough critical mass of
> people converge to a unique solution .

It is the chunks that is the implementation detail that we want to hide.
Average Btrfs user wouldn't want to know anything about chunks, the only
thing he'd be interested in is Disk_allocated and similar fields.

Moreover, I am pretty sure "Chunk-Size" would actually confuse people.
I stared at your example output for a few seconds trying to comprehend a
21GB Chunk-Size on a 72GB partition.  What you call "Chunk-Size" is
actually a sum of sizes of chunks of that particular type, and a few
lines above you call the same exact sum (only this time over all types
of chunks) "Disk_allocated".  So I think it's only logical to rename the
column in question to "Disk_allocated" to match the summary section.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 10:01:00PM +0200, Goffredo Baroncelli wrote:
> On 10/03/2012 07:46 PM, Ilya Dryomov wrote:
> >On Wed, Oct 03, 2012 at 06:46:00PM +0200, Goffredo Baroncelli wrote:
> >>On 10/03/2012 05:01 PM, Ilya Dryomov wrote:
> >>>"Type" for the first column is probably enough.
> >>>
> >>>Why is the third column called Chunk-size?  If my understanding is
> >>>correct, it's just a break down of Disk_allocated from the summary
> >>>section.  If so, why not call it Disk_allocated to avoid confusion?
> >>
> >>Using everywhere Disk_  was my first attempt. But after
> >>some thoughts I decided that these are two different kind of
> >>information. It is true that Disk_allocated is the sum of
> >>Chunk-Sizes... But my feels is that this is a kind of
> >>"implementation details". If some other type of allocation unit will
> >>be added to BTRFS, then these will be added to Disk_allocated, but
> >>not to Chunk list...
> >>I prefer to not change the wording until an enough critical mass of
> >>people converge to a unique solution .
> >
> >It is the chunks that is the implementation detail that we want to hide.
> >Average Btrfs user wouldn't want to know anything about chunks, the only
> >thing he'd be interested in is Disk_allocated and similar fields.
> 
> The "df" standard tool id sufficient for the "average user".
> We need only to export these information via the standard syscall
> stat[v]fs. Basically we should try to implement the algorithm of the
> Free_(Estimated) space for the statfs(2) syscall.
> Who uses btrfs tools, is an user with knowledge of btrfs higher than
> the average.
> 
> >Moreover, I am pretty sure "Chunk-Size" would actually confuse people.
> >I stared at your example output for a few seconds trying to comprehend a
> >21GB Chunk-Size on a 72GB partition.  What you call "Chunk-Size" is
> >actually a sum of sizes of chunks of that particular type, and a few
> >lines above you call the same exact sum (only this time over all types
> >of chunks) "Disk_allocated".  So I think it's only logical to rename the
> >column in question to "Disk_allocated" to match the summary section.
> 
> What about
> [...]
> Details:
>   Chunk_type  Mode  Size_(disk) Size_(logical) Used
>   DataSingle   21.01GB  21.01GB 10.53GB
>   System  DUP  80.00MB  40.00MB  4.00KB
> [...]
> 
> ?

This is definitely better.  Can you also drop "Chunk_type" in favor of
"Type"?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 07:22:31PM +0200, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli 
> 
> The  command  btrfs  filesystem  df is used to query the
> status of the chunks, how many space on the disk(s) are used  by
> the  chunks,  how many space are available in the chunks, and an
> estimation of the free space of the filesystem.
> ---
>  cmds-filesystem.c |  284 
> -
>  1 file changed, 235 insertions(+), 49 deletions(-)
> 
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index b1457de..2701904 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "kerncompat.h"
>  #include "ctree.h"
> @@ -39,25 +40,55 @@ static const char * const filesystem_cmd_group_usage[] = {
>   NULL
>  };
>  
> -static const char * const cmd_df_usage[] = {
> - "btrfs filesystem df ",
> - "Show space usage information for a mount point",
> - NULL
> -};
> +static u64 disk_size( char *path){
> + struct statfs   sfs;
> +
> + if( statfs(path, &sfs) < 0 )
> + return 0;
> + else
> + return sfs.f_bsize * sfs.f_blocks;
> +
> +}
> +
> +static void print_string(char *s, int w)
> +{
> + int i;
> +
> + printf("%s", s);
> + for( i = strlen(s) ; i < w ; i++ ) 
> + putchar(' ');
> +}
> +
> +#define DF_SHOW_SUMMARY  (1<<1)
> +#define DF_SHOW_DETAIL   (1<<2)
> +#define DF_HUMAN_UNIT(1<<3)
> +
> +static void pretty_sizes_r(u64 size, int w, int mode)
> +{
> + if( mode & DF_HUMAN_UNIT ){
> + char *s = pretty_sizes(size);
> + printf("%*s", w, s);
> + free(s);
> + } else {
> + printf("%*llu", w, size/1024);
> +
> + }
> +}
>  
> -static int cmd_df(int argc, char **argv)
> +static int _cmd_disk_free(char *path, int mode)
>  {
>   struct btrfs_ioctl_space_args *sargs;
>   u64 count = 0, i;
>   int ret;
>   int fd;
> - int e;
> - char *path;
> -
> - if (check_argc_exact(argc, 2))
> - usage(cmd_df_usage);
> -
> - path = argv[1];
> + int e, width;
> + u64  total_disk;/* filesystem size == sum of
> +disks sizes */
> + u64  total_chunks;  /* sum of chunks sizes on disk(s) */
> + u64  total_used;/* logical space used */
> + u64  total_free;/* logical space un-used */
> + double K;
> + 
>  
>   fd = open_file_or_dir(path);
>   if (fd < 0) {
> @@ -103,56 +134,211 @@ static int cmd_df(int argc, char **argv)
>   return ret;
>   }
>  
> - for (i = 0; i < sargs->total_spaces; i++) {
> - char description[80];
> - char *total_bytes;
> - char *used_bytes;
> - int written = 0;
> - u64 flags = sargs->spaces[i].flags;
> + total_disk = disk_size(path);
> + e = errno;
> + if( total_disk == 0 ){
> + fprintf(stderr, "ERROR: couldn't get space info on '%s' - %s\n",
> + path, strerror(e));
> + close(fd);
> + free(sargs);
> + return 19;
> + }
> + 
> + total_chunks = total_used = total_free = 0;
>  
> - memset(description, 0, 80);
> + for (i = 0; i < sargs->total_spaces; i++) {
> + int  ratio=1;
> + u64  allocated;
>  
> - if (flags & BTRFS_BLOCK_GROUP_DATA) {
> - if (flags & BTRFS_BLOCK_GROUP_METADATA) {
> - snprintf(description, 14, "%s",
> -  "Data+Metadata");
> - written += 13;
> - } else {
> - snprintf(description, 5, "%s", "Data");
> - written += 4;
> - }
> - } else if (flags & BTRFS_BLOCK_GROUP_SYSTEM) {
> - snprintf(description, 7, "%s", "System");
> - written += 6;
> - } else if (flags & BTRFS_BLOCK_GROUP_METADATA) {
> - snprintf(description, 9, "%s", "Metadata");
> - written += 8;
> - }
> + u64 flags = sargs->spaces[i].flags;
>  
>   if (flags & BTRFS_BLOCK_GROUP_RAID0) {
> - snprintf(description+written, 8, "%s", ", RAID0");
> - written += 7;
> + ratio=1;
>   } else if (flags & BTRFS_BLOCK_GROUP_RAID1) {
> - snprintf(description+written, 8, "%s", ", RAID1");
> - written += 7;
> + ratio=2;
>   } else if (flags & BTRFS_BLOCK_GROUP_DUP) {
> - snprintf(description+written, 6, "%s", ", DUP");
> - written += 5;
> +

Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Ilya Dryomov

On Wed, Oct 03, 2012 at 09:38:52PM +0100, Hugo Mills wrote:
> On Wed, Oct 03, 2012 at 11:34:00PM +0300, Ilya Dryomov wrote:
> > On Wed, Oct 03, 2012 at 07:22:31PM +0200, Goffredo Baroncelli wrote:
> [snip]
> > > +static const char * const cmd_disk_free_usage[] = {
> > > + "btrfs filesystem df [-d|-s][-k]  [..]",
> > > + "Show space usage information for a mount point(s).",
> > > + "",
> > > + "-k\tSet KB (1024 bytes) as unit",
> > > + "-s\tShow the summary section only",
> > > + "-d\tShow the detail section only",
> > > + NULL
> > > +};
> > > +
> > > +static int cmd_disk_free(int argc, char **argv)
> > > +{
> > > +
> > > + int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT;
> > > + int i, more_than_one=0;
> > > +
> > > + optind = 1;
> > > + while(1){
> > > + charc = getopt(argc, argv, "dsk");
> > > + if(c<0)
> > > + break;
> > > + switch(c){
> > > + case 'd':
> > > + flags &= ~DF_SHOW_SUMMARY;
> > > + break;
> > > + case 's':
> > > + flags &= ~DF_SHOW_DETAIL;
> > > + break;
> > > + case 'k':
> > > + flags &= ~DF_HUMAN_UNIT;
> > > + break;
> > > + default:
> > > + usage(cmd_disk_free_usage);
> > > + }
> > > + }
> > > +
> > > + if( !(flags & (DF_SHOW_SUMMARY|DF_SHOW_DETAIL)) ){
> > > + fprintf(stderr, "btrfs filesystem df: it is not possible to 
> > > specify -s AND -d\n");
> > 
> > This doesn't look right at all.  You are adding two switches and
> > specifying both of them is an error?  A little too much for a command
> > whose job is to do some basic math and pretty-print the result.
> > 
> > How about displaying just the summary by default and then adding a
> > *single* switch (-v or whatever) for summary+details?
> 
>I'd prefer to see both sections by default. The reason for this is
> that without both sections, people tend to get confused because they
> don't know they're looking at half the story (e.g. some numbers change
> twice as fast as they think they should).

If we want both sections by default, there is no need for any switches
whatsoever, I think.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix memory leak in btrfs_parse_early_options()

2011-01-17 Thread Ilya Dryomov

strsep() modifies the string pointer, therefore freeing it instead of
original one results in a small memory leak.  Fixes kmemleak warning.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/super.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 883c6fa..291b630 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -277,7 +277,7 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
struct btrfs_fs_devices **fs_devices)
 {
substring_t args[MAX_OPT_ARGS];
-   char *opts, *p;
+   char *opts, *opts_orig, *p;
int error = 0;
int intarg;
 
@@ -288,7 +288,7 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
 * strsep changes the string, duplicate it because parse_options
 * gets called twice
 */
-   opts = kstrdup(options, GFP_KERNEL);
+   opts = opts_orig = kstrdup(options, GFP_KERNEL);
if (!opts)
return -ENOMEM;
 
@@ -326,7 +326,7 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
}
 
  out_free_opts:
-   kfree(opts);
+   kfree(opts_orig);
  out:
/*
 * If no subvolume name is specified we use the default one.  Allocate
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: remove unneeded allocation in load_free_space_cache()

2011-01-19 Thread Ilya Dryomov

The checksums array is unused, remove it.  We only need disk_crcs array
to verify checksums.

There is no need to allocate first_page_offset bytes for disk_crcs
array.  It's enough to allocate num_checksums bytes because disk_crcs
only holds checksums while gen pointer is computed and used
independently.

cur_crc is initialized later in the code.  Don't initialize it in a
declaration.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/free-space-cache.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 60d6842..258e45d 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -215,14 +215,14 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
struct extent_buffer *leaf;
struct page *page;
struct btrfs_path *path;
-   u32 *checksums = NULL, *crc;
+   u32 *crc = NULL;
char *disk_crcs = NULL;
struct btrfs_key key;
struct list_head bitmaps;
u64 num_entries;
u64 num_bitmaps;
u64 generation;
-   u32 cur_crc = ~(u32)0;
+   u32 cur_crc;
pgoff_t index = 0;
unsigned long first_page_offset;
int num_checksums;
@@ -298,11 +298,8 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
 
/* Setup everything for doing checksumming */
num_checksums = i_size_read(inode) / PAGE_CACHE_SIZE;
-   checksums = crc = kzalloc(sizeof(u32) * num_checksums, GFP_NOFS);
-   if (!checksums)
-   goto out;
first_page_offset = (sizeof(u32) * num_checksums) + sizeof(u64);
-   disk_crcs = kzalloc(first_page_offset, GFP_NOFS);
+   disk_crcs = kzalloc(num_checksums, GFP_NOFS);
if (!disk_crcs)
goto out;
 
@@ -352,7 +349,7 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
if (index == 0) {
u64 *gen;
 
-   memcpy(disk_crcs, addr, first_page_offset);
+   memcpy(disk_crcs, addr, num_checksums);
gen = addr + (sizeof(u32) * num_checksums);
if (*gen != BTRFS_I(inode)->generation) {
printk(KERN_ERR "btrfs: space cache generation"
@@ -467,7 +464,6 @@ next:
 
ret = 1;
 out:
-   kfree(checksums);
kfree(disk_crcs);
iput(inode);
return ret;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] Btrfs: remove unneeded allocation in load_free_space_cache()

2011-01-19 Thread Ilya Dryomov

I messed up size argument of kzalloc() and consequently memcpy().  Here
is an updated version.


The checksums array is unused, remove it.  We only need disk_crcs array
to verify checksums.

There is no need to allocate first_page_offset bytes for disk_crcs
array.  It's enough to allocate (sizeof(u32) * num_checksums) bytes
because disk_crcs should only hold checksums.  gen is fetched
directly from the page.

cur_crc is initialized later in the code.  Don't initialize it in a
declaration.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/free-space-cache.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 60d6842..ee0af41 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -215,14 +215,14 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
struct extent_buffer *leaf;
struct page *page;
struct btrfs_path *path;
-   u32 *checksums = NULL, *crc;
+   u32 *crc = NULL;
char *disk_crcs = NULL;
struct btrfs_key key;
struct list_head bitmaps;
u64 num_entries;
u64 num_bitmaps;
u64 generation;
-   u32 cur_crc = ~(u32)0;
+   u32 cur_crc;
pgoff_t index = 0;
unsigned long first_page_offset;
int num_checksums;
@@ -298,11 +298,8 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
 
/* Setup everything for doing checksumming */
num_checksums = i_size_read(inode) / PAGE_CACHE_SIZE;
-   checksums = crc = kzalloc(sizeof(u32) * num_checksums, GFP_NOFS);
-   if (!checksums)
-   goto out;
first_page_offset = (sizeof(u32) * num_checksums) + sizeof(u64);
-   disk_crcs = kzalloc(first_page_offset, GFP_NOFS);
+   disk_crcs = kzalloc(sizeof(u32) * num_checksums, GFP_NOFS);
if (!disk_crcs)
goto out;
 
@@ -352,7 +349,7 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
if (index == 0) {
u64 *gen;
 
-   memcpy(disk_crcs, addr, first_page_offset);
+   memcpy(disk_crcs, addr, sizeof(u32) * num_checksums);
gen = addr + (sizeof(u32) * num_checksums);
if (*gen != BTRFS_I(inode)->generation) {
printk(KERN_ERR "btrfs: space cache generation"
@@ -467,7 +464,6 @@ next:
 
ret = 1;
 out:
-   kfree(checksums);
kfree(disk_crcs);
iput(inode);
return ret;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: use checksums_len instead of first_page_offset

2011-01-19 Thread Ilya Dryomov

Minimizes the number of (sizeof(u32) * num_checksums) we have to do,
removes now useless first_page_offset variable.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/free-space-cache.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ee0af41..46f7541 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -224,7 +224,7 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
u64 generation;
u32 cur_crc;
pgoff_t index = 0;
-   unsigned long first_page_offset;
+   unsigned long checksums_len;
int num_checksums;
int ret = 0;
 
@@ -298,8 +298,8 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
 
/* Setup everything for doing checksumming */
num_checksums = i_size_read(inode) / PAGE_CACHE_SIZE;
-   first_page_offset = (sizeof(u32) * num_checksums) + sizeof(u64);
-   disk_crcs = kzalloc(sizeof(u32) * num_checksums, GFP_NOFS);
+   checksums_len = sizeof(u32) * num_checksums;
+   disk_crcs = kzalloc(checksums_len, GFP_NOFS);
if (!disk_crcs)
goto out;
 
@@ -321,7 +321,7 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
break;
 
if (index == 0) {
-   start_offset = first_page_offset;
+   start_offset = checksums_len + sizeof(u64);
offset = start_offset;
}
 
@@ -349,8 +349,8 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
if (index == 0) {
u64 *gen;
 
-   memcpy(disk_crcs, addr, sizeof(u32) * num_checksums);
-   gen = addr + (sizeof(u32) * num_checksums);
+   memcpy(disk_crcs, addr, checksums_len);
+   gen = addr + checksums_len;
if (*gen != BTRFS_I(inode)->generation) {
printk(KERN_ERR "btrfs: space cache generation"
   " (%llu) does not match inode (%llu) "
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs - Fix memory leak in btrfs_init_new_device()

2011-02-06 Thread Ilya Dryomov

Memory allocated by calling kstrdup() should be freed.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d158530..9649cdd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1601,6 +1601,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
 
ret = find_next_devid(root, &device->devid);
if (ret) {
+   kfree(device->name);
kfree(device);
goto error;
}
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs - use %pU to print fsid

2011-02-09 Thread Ilya Dryomov

Get rid of FIXME comment.  Uuids from dmesg are now the same as uuids
given by btrfs-progs.

Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/volumes.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2636a05..83b789c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -714,12 +714,8 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, 
void *holder,
transid = btrfs_super_generation(disk_super);
if (disk_super->label[0])
printk(KERN_INFO "device label %s ", disk_super->label);
-   else {
-   /* FIXME, make a readl uuid parser */
-   printk(KERN_INFO "device fsid %llx-%llx ",
-  *(unsigned long long *)disk_super->fsid,
-  *(unsigned long long *)(disk_super->fsid + 8));
-   }
+   else
+   printk(KERN_INFO "device fsid %pU ", disk_super->fsid);
printk(KERN_CONT "devid %llu transid %llu %s\n",
   (unsigned long long)devid, (unsigned long long)transid, path);
ret = device_list_add(path, disk_super, devid, fs_devices_ret);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 252 matches

Mail list logo