Re: Confining scrub to a subvolume

2015-12-30 Thread David Sterba
On Wed, Dec 30, 2015 at 01:00:34AM +0100, Sree Harsha Totakura wrote:
> Is it possible to confine scrubbing to a subvolume instead of the whole
> file system?

No. Srub reads the blocks from devices (without knowing which files own
them) and compares them to the stored checksums.

> [...]  Therefore, I would like to scrub the photos
> and documents subvolumes more often than the backups subvolume.  Would
> this be possible with the current tools?

The closest would be to read the files and look for any reported errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Confining scrub to a subvolume

2015-12-30 Thread Duncan
David Sterba posted on Wed, 30 Dec 2015 18:39:49 +0100 as excerpted:

> On Wed, Dec 30, 2015 at 01:00:34AM +0100, Sree Harsha Totakura wrote:
>> Is it possible to confine scrubbing to a subvolume instead of the whole
>> file system?
> 
> No. Srub reads the blocks from devices (without knowing which files own
> them) and compares them to the stored checksums.

Of course if like me you prefer not to have all your data eggs in one 
filesystem basket and have used partitions (or LVM) and multiple 
independent btrfs, in which case you scrub the filesystem you want, and 
don't worry about the others. =:^)

It definitely helps with maintenance time -- on SSDs with all under 50 
GiB partitions, scrub times per btrfs are typically under a minute, and 
btrfs check and balance times are similarly short.  Plus, arranging to 
have additional partitions exactly the same size to use as backups works 
pretty nicely as well.  =:^)  OTOH, people routinely report days for 
multi-terabyte btrfs maintenance commands on spinning rust. =:^(

Tho I do still have my media partition, along with backups, on reiserfs 
on spinning rust.  I should think about switching that over one of these 
days...

>> [...]  Therefore, I would like to scrub the photos and documents
>> subvolumes more often than the backups subvolume.  Would this be
>> possible with the current tools?
> 
> The closest would be to read the files and look for any reported errors.

That should work.  Cat the files to /dev/null and check dmesg.  For 
single mode it should check the only copy.  For raid1/10 or dup, running 
two checks, ensuring one is even-PID while the other is odd-PID, should 
work to check both copies, since the read-scheduler assigns copy based on 
even/odd PID.  Errors will show up in dmesg, as well as cat's STDERR.

Pretty clever thinking there. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs scrub failure for raid 6 kernel 4.3

2015-12-30 Thread Waxhead

Chris Murphy wrote:

Well all the generations on all devices are now the same, and so are
the chunk trees. I haven't looked at them in detail to see if there
are any discrepancies among them.

If you don't care much for this file system, then you could try btrfs
check --repair, using btrfs-progs 4.3.1 or integration branch. I have
no idea where btrfsck repair is at with raid56.

On the one hand, corruption should be fixed by scrub. But scrub fails
with a kernel trace. Maybe btrfs check --repair can fix the tree block
corruption since scrub can't, and then if that corruption is fixed,
possibly scrub will work.

I could not care less about this particular filesystem as I wrote in the 
original post. It's just for having some fun with btrfs. What I find 
troublesome is that corrupting one (or even two) drives in a Raid6 
config fails. Granted the filesystem "works" e.g. I can mount it and 
access files, but I get a input/output error on a file on this 
filesystem and btrfs only shows warning (not errors) on device sdg1 
where the csum failed.
A raid6 setup should work fine even if two missing disks (or in this 
case chunks of data) is missing and even if I don't care about this 
filesystem I care about btrfs getting stable ;) so if I can help I'll 
keep this filesystem around for a little longer!


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread David Sterba
On Wed, Dec 30, 2015 at 06:15:23AM -0500, Sanidhya Solanki wrote:
> Only one problem. I do not run BTRFS on my systems nor do I have a
> RAID setup, due to possessing a limited number of free drives. So, while
> I may be able to code for it, I will not be able to test it. I will need
> the community's help to do the testing.

Multiple devices can be simulated by loop devices or one physical device
partitioned. I'd expect at least some testing on your side, the
community will help with testing, but that's nothing specific to this
patch. This happens all the time.

> I will get started tomorrow.
> 
> To-do (so far):
> - Implement RAID Stripe length as a compile and runtime option.

I was trying to explain that it's not a compile time option.

> - Implement a way to do an in-place Stripe Length change.

How are you going to implement that? I've suggested the balance filter
style of conversion, which is not in-place so I'm curious what do you
mean by in-place.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix race between free space endio workers and space cache writeout

2015-12-30 Thread fdmanana
From: Filipe Manana 

While running a stress test I ran into the following trace/transaction
abort:

[471626.672243] [ cut here ]
[471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 
btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
[471626.675492] BTRFS: Transaction aborted (error -2)
[471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc i2c_piix
[471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: GW   
4.3.0-rc5-btrfs-next-17+ #1
[471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[471626.691901]   880016037cf0 812566f4 
880016037d38
[471626.695009]  880016037d28 8104d0a6 a040c84e 
fffe
[471626.697490]  88011fe855f8 88000c484cb0 88000d195000 
880016037d90
[471626.699201] Call Trace:
[471626.699804]  [] dump_stack+0x4e/0x79
[471626.701049]  [] warn_slowpath_common+0x9f/0xb8
[471626.702542]  [] ? 
btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.704326]  [] warn_slowpath_fmt+0x48/0x50
[471626.705636]  [] ? write_one_cache_group.isra.32+0x77/0x82 
[btrfs]
[471626.707048]  [] 
btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.708616]  [] commit_cowonly_roots+0x1d7/0x25a [btrfs]
[471626.709950]  [] btrfs_commit_transaction+0x4c4/0x991 
[btrfs]
[471626.711286]  [] ? signal_pending_state+0x31/0x31
[471626.712611]  [] btrfs_sync_fs+0x145/0x1ad [btrfs]
[471626.715610]  [] ? SyS_tee+0x226/0x226
[471626.716718]  [] sync_fs_one_sb+0x20/0x22
[471626.717672]  [] iterate_supers+0x75/0xc2
[471626.718800]  [] sys_sync+0x52/0x80
[471626.719990]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
[471626.721835] ---[ end trace baf57f43d76693f4 ]---
[471626.722954] BTRFS: error (device sdc) in 
btrfs_write_dirty_block_groups:3740: errno=-2 No such entry

This is a very rare situation and it happened due to a race between a free
space endio worker and writing the space caches for dirty block groups at
a transaction's commit critical section. The steps leading to this are:

1) A task calls btrfs_commit_transaction() and starts the writeout of the
   space caches for all currently dirty block groups (i.e. it calls
   btrfs_start_dirty_block_groups());

2) The previous step starts writeback for space caches;

3) When the writeback finishes it queues jobs for free space endio work
   queue (fs_info->endio_freespace_worker) that execute
   btrfs_finish_ordered_io();

4) The task committing the transaction sets the transaction's state
   to TRANS_STATE_COMMIT_DOING and shortly after calls
   btrfs_write_dirty_block_groups();

5) A free space endio job joins the transaction, through
   btrfs_join_transaction_nolock(), and updates a free space inode item
   in the root tree through btrfs_update_inode_fallback();

6) Updating the free space inode item resulted in COWing one or more
   nodes/leaves of the root tree, and that resulted in creating a new
   metadata block group, which gets added to the transaction's list
   of dirty block groups (this is a very rare case);

7) The free space endio job has not released yet its transaction handle
   at this point, so the new metadata block group was not yet fully
   created (didn't go through btrfs_create_pending_block_groups() yet);

8) The transaction commit task sees the new metadata block group in
   the transaction's list of dirty block groups and processes it.
   When it attempts to update the block group's block group item in
   the extent tree, through write_one_cache_group(), it isn't able
   to find it and aborts the transaction with error -ENOENT - this
   is because the free space endio job hasn't yet released its
   transaction handle (which calls btrfs_create_pending_block_groups())
   and therefore the block group item was not yet added to the extent
   tree.

Fix this waiting for free space endio jobs if we fail to find a block
group item in the extent tree and then retry once updating the block
group item.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent-tree.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0617cb7..9bca90d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3763,6 +3763,25 @@ int btrfs_write_dirty_block_groups(struct 
btrfs_trans_handle *trans,
}
if (!ret) {
ret = write_one_cache_group(trans, root, path, cache);
+   /*
+* One of the free space endio workers might have
+* created a new block group while updating a free space
+* cache's inode (at inode.c:btrfs_finish_ordered_io())
+* 

Re: [PATCH] Btrfs: fix race between free space endio workers and space cache writeout

2015-12-30 Thread Chris Mason
On Wed, Dec 30, 2015 at 04:02:04PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> While running a stress test I ran into the following trace/transaction
> abort:
> 
> [471626.672243] [ cut here ]
> [471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 
> btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
> [471626.675492] BTRFS: Transaction aborted (error -2)
> [471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor 
> raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
> loop fuse parport_pc i2c_piix
> [471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: GW   
> 4.3.0-rc5-btrfs-next-17+ #1
> [471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
> [471626.691901]   880016037cf0 812566f4 
> 880016037d38
> [471626.695009]  880016037d28 8104d0a6 a040c84e 
> fffe
> [471626.697490]  88011fe855f8 88000c484cb0 88000d195000 
> 880016037d90
> [471626.699201] Call Trace:
> [471626.699804]  [] dump_stack+0x4e/0x79
> [471626.701049]  [] warn_slowpath_common+0x9f/0xb8
> [471626.702542]  [] ? 
> btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
> [471626.704326]  [] warn_slowpath_fmt+0x48/0x50
> [471626.705636]  [] ? 
> write_one_cache_group.isra.32+0x77/0x82 [btrfs]
> [471626.707048]  [] 
> btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
> [471626.708616]  [] commit_cowonly_roots+0x1d7/0x25a [btrfs]
> [471626.709950]  [] btrfs_commit_transaction+0x4c4/0x991 
> [btrfs]
> [471626.711286]  [] ? signal_pending_state+0x31/0x31
> [471626.712611]  [] btrfs_sync_fs+0x145/0x1ad [btrfs]
> [471626.715610]  [] ? SyS_tee+0x226/0x226
> [471626.716718]  [] sync_fs_one_sb+0x20/0x22
> [471626.717672]  [] iterate_supers+0x75/0xc2
> [471626.718800]  [] sys_sync+0x52/0x80
> [471626.719990]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
> [471626.721835] ---[ end trace baf57f43d76693f4 ]---
> [471626.722954] BTRFS: error (device sdc) in 
> btrfs_write_dirty_block_groups:3740: errno=-2 No such entry
> 
> This is a very rare situation and it happened due to a race between a free
> space endio worker and writing the space caches for dirty block groups at
> a transaction's commit critical section. The steps leading to this are:
> 

Ugh, thanks Filipe.  I'll get this one into integration after I get back
from vacation (I'm out next week).

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread David Sterba
On Wed, Dec 30, 2015 at 10:10:44PM +0800, Qu Wenruo wrote:
> Now I am on the same side of David.
> Which means a runtime interface to change them. (along with mkfs option)
> 
> If provide some configurable features, then it should be able to be 
> tuned at both right time and mkfs time.
> Or, just don't touch it until there is really enough user demand.
> (In stripe_len case, it's also a possible choice, as configurable stripe 
> length doesn't really affect much except RAID5/6)

I think that we need configurable stripe size regardless. The
performance drop is measurable if the stripe size used by filesystem
does not match the hardware.

> I totally understand that implement will cost you a lot of more time, 
> not only kernel part but also user-tool part.
> 
> But this also means more patches.
> No matter what the motivation for you to contribute to btrfs, more 
> patches (except the more time spent) are always good.
> 
> More patches, more reputation built in community, and more patches also 
> means better split code structures for easier review.

Let me note that a good reputation is also built from patch reviews
(hint hint).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs scrub failure for raid 6 kernel 4.3

2015-12-30 Thread Waxhead

Waxhead wrote:

Chris Murphy wrote:

Well all the generations on all devices are now the same, and so are
the chunk trees. I haven't looked at them in detail to see if there
are any discrepancies among them.

If you don't care much for this file system, then you could try btrfs
check --repair, using btrfs-progs 4.3.1 or integration branch. I have
no idea where btrfsck repair is at with raid56.

On the one hand, corruption should be fixed by scrub. But scrub fails
with a kernel trace. Maybe btrfs check --repair can fix the tree block
corruption since scrub can't, and then if that corruption is fixed,
possibly scrub will work.

I could not care less about this particular filesystem as I wrote in 
the original post. It's just for having some fun with btrfs. What I 
find troublesome is that corrupting one (or even two) drives in a 
Raid6 config fails. Granted the filesystem "works" e.g. I can mount it 
and access files, but I get a input/output error on a file on this 
filesystem and btrfs only shows warning (not errors) on device sdg1 
where the csum failed.
A raid6 setup should work fine even if two missing disks (or in this 
case chunks of data) is missing and even if I don't care about this 
filesystem I care about btrfs getting stable ;) so if I can help I'll 
keep this filesystem around for a little longer!


For your information I tried a balance on the filesystem - a new stack 
trace below (the system is still working).
Sorry for flooding the mailinglist with the stack trace - this is what I 
got from dmesg , hope it is of some use... / gets used... :)


[  243.603661] CPU: 0 PID: 1182 Comm: btrfs Tainted: G W   
4.3.0-1-686-pae #1 Debian 4.3.3-2

[  243.603664] Hardware name: Acer AOA150/, BIOS v0.3310 10/06/2008
[  243.603676]   09f7a8eb eef57990 c12ae3c5  c106685d 
c1614e20 
[  243.603687]  049e f86df010 190a f86350ff 0009 f86350ff 
f1dd8b18 
[  243.603697]  0078 eef579a0 c1066962 0009  eef57a6c 
f86350ff 

[  243.603699] Call Trace:
[  243.603716]  [] ? dump_stack+0x3e/0x59
[  243.603724]  [] ? warn_slowpath_common+0x8d/0xc0
[  243.603763]  [] ? __btrfs_free_extent+0xbbf/0xec0 [btrfs]
[  243.603798]  [] ? __btrfs_free_extent+0xbbf/0xec0 [btrfs]
[  243.603806]  [] ? warn_slowpath_null+0x22/0x30
[  243.603837]  [] ? __btrfs_free_extent+0xbbf/0xec0 [btrfs]
[  243.603877]  [] ? __btrfs_run_delayed_refs+0x96e/0x11a0 [btrfs]
[  243.603889]  [] ? __percpu_counter_add+0x8e/0xb0
[  243.603930]  [] ? btrfs_run_delayed_refs+0x6d/0x250 [btrfs]
[  243.603969]  [] ? btrfs_should_end_transaction+0x3c/0x60 
[btrfs]

[  243.604003]  [] ? btrfs_drop_snapshot+0x426/0x850 [btrfs]
[  243.604110]  [] ? merge_reloc_roots+0xee/0x260 [btrfs]
[  243.604152]  [] ? remove_backref_node+0x67/0xe0 [btrfs]
[  243.604198]  [] ? relocate_block_group+0x28f/0x750 [btrfs]
[  243.604242]  [] ? btrfs_relocate_block_group+0x1d8/0x2e0 
[btrfs]
[  243.604282]  [] ? btrfs_relocate_chunk.isra.29+0x3d/0xf0 
[btrfs]

[  243.604326]  [] ? btrfs_balance+0x97c/0x12e0 [btrfs]
[  243.604338]  [] ? __alloc_pages_nodemask+0x13b/0x850
[  243.604345]  [] ? get_page_from_freelist+0x3dd/0x5c0
[  243.604391]  [] ? btrfs_ioctl_balance+0x385/0x390 [btrfs]
[  243.604430]  [] ? btrfs_ioctl+0x793/0x2c50 [btrfs]
[  243.604437]  [] ? __alloc_pages_nodemask+0x13b/0x850
[  243.604443]  [] ? terminate_walk+0x69/0xc0
[  243.604453]  [] ? anon_vma_prepare+0xdf/0x130
[  243.604460]  [] ? page_add_new_anon_rmap+0x6c/0x90
[  243.604468]  [] ? handle_mm_fault+0xa63/0x14f0
[  243.604476]  [] ? __rb_insert_augmented+0xf3/0x1c0
[  243.604520]  [] ? update_ioctl_balance_args+0x1c0/0x1c0 [btrfs]
[  243.604527]  [] ? do_vfs_ioctl+0x2e2/0x500
[  243.604534]  [] ? do_brk+0x113/0x2b0
[  243.604542]  [] ? __do_page_fault+0x1a0/0x460
[  243.604549]  [] ? SyS_ioctl+0x68/0x80
[  243.604557]  [] ? sysenter_do_call+0x12/0x12
[  243.604563] ---[ end trace eb3e6200cba2a564 ]---
[  243.604654] [ cut here ]
[  243.604695] WARNING: CPU: 0 PID: 1182 at 
/build/linux-P8Ifgy/linux-4.3.3/fs/btrfs/extent-tree.c:6410 
__btrfs_free_extent+0xbbf/0xec0 [btrfs]()
[  243.604813] Modules linked in: cpufreq_stats cpufreq_conservative 
cpufreq_userspace bnep cpufreq_powersave zram zsmalloc lz4_compress nfsd 
auth_rpcgss oid_registry nfs_acl lockd grace sunrpc joydev iTCO_wdt 
iTCO_vendor_support sparse_keymap arc4 acerhdf coretemp pcspkr evdev 
psmouse serio_raw i2c_i801 uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_core v4l2_common videodev media lpc_ich mfd_core btusb btrtl 
btbcm btintel rng_core bluetooth ath5k ath snd_hda_codec_realtek 
snd_hda_codec_generic mac80211 jmb38x_ms snd_hda_intel i915 cfg80211 
memstick snd_hda_codec rfkill snd_hda_core snd_hwdep drm_kms_helper 
snd_pcm snd_timer shpchp snd soundcore drm i2c_algo_bit wmi battery 
video ac button acpi_cpufreq processor sg loop autofs4 uas usb_storage 
ext4 crc16 mbcache jbd2 crc32c_generic btrfs xor
[  243.604837]  raid6_pq 

Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Christoph Anton Mitterer
On Tue, 2015-12-29 at 19:06 +0100, David Sterba wrote:
> > Both open of course many questions (how to deal with crashes,
> > etc.)...
> > maybe having a look at how mdadm handles similar problems could be
> > worth.
> 
> The crash consistency should remain, other than that we'd have to
> enhance the balance filters to process only the unconverted chunks to
> continue.

What about nodatacow'ed files? I'd expect that in case of a crash
during reshaping, these files are (likely) garbage then right?
Not particularly desirable...


But probably that just goes in the direction of the issues/questions I
brought up in the other thread where I've asked the devs for
possibilities in terms of checksumming on nodatacowed areas:

i.e. stability/integrity of such files


For me, speaking with the sysadmin's hat, that was always the main
reason not to do reshapes so far, especially when data is quite
precious, which at least one typical use case for nodatacow is, namely
DBs.

So having crash resistance for CoW + nodataCoW during RAID reshape
should be desired.

Time for a journal in btrfs? O;-)


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Duncan
Christoph Anton Mitterer posted on Wed, 30 Dec 2015 21:00:11 +0100 as
excerpted:

> On Tue, 2015-12-29 at 19:06 +0100, David Sterba wrote:
>> > Both open of course many questions (how to deal with crashes,
>> > etc.)...
>> > maybe having a look at how mdadm handles similar problems could be
>> > worth.
>> 
>> The crash consistency should remain, other than that we'd have to
>> enhance the balance filters to process only the unconverted chunks to
>> continue.
> 
> What about nodatacow'ed files? I'd expect that in case of a crash during
> reshaping, these files are (likely) garbage then right?
> Not particularly desirable...

For something like that, it'd pretty much /have/ to be done as COW, at 
least at the chunk level, tho the address from the outside may stay the 
same.  That's what balance already does, after all.




-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 2/2] btrfs:Fix error handling in the function btrfs_dev_replace_kthread

2015-12-30 Thread Chris Mason
On Tue, Dec 29, 2015 at 08:10:07PM -0500, Nicholas Krause wrote:
> This fixes error handling in the function btrfs_dev_replace_kthread
> by checking if the call to the function btrfs_dev_replace_continue_on_mount
> has failed and if so return the error code to this function's caller in
> order to signal a failure has occurred when calling this particular
> function.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  fs/btrfs/dev-replace.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 38ffd73..b26f68c 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -803,6 +803,7 @@ static int btrfs_dev_replace_kthread(void *data)
>   struct btrfs_dev_replace *dev_replace = _info->dev_replace;
>   struct btrfs_ioctl_dev_replace_args *status_args;
>   u64 progress;
> + int ret;
>  
>   status_args = kzalloc(sizeof(*status_args), GFP_NOFS);
>   if (status_args) {
> @@ -820,7 +821,9 @@ static int btrfs_dev_replace_kthread(void *data)
>   "",
>   (unsigned int)progress);
>   }
> - btrfs_dev_replace_continue_on_mount(fs_info);
> + ret = btrfs_dev_replace_continue_on_mount(fs_info);
> + if (ret)
> + return ret;
>   atomic_set(_info->mutually_exclusive_operation_running, 0);

This atomic_set that you're skipping is really important.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Confining scrub to a subvolume

2015-12-30 Thread Christoph Anton Mitterer
On Wed, 2015-12-30 at 18:26 +, Duncan wrote:
> That should work.  Cat the files to /dev/null and check dmesg.  For 
> single mode it should check the only copy.  For raid1/10 or dup,
> running 
> two checks, ensuring one is even-PID while the other is odd-PID,
> should 
> work to check both copies, since the read-scheduler assigns copy
> based on 
> even/odd PID.  Errors will show up in dmesg, as well as cat's STDERR.
That doesn't seem very reliable to me, to be honest... plus it wouldn't
work in any RAID56 or dupN (with n!=2) case, when that gets sooner or
later implemented.

Also, I'd kinda guess (or better said: hope) that the kernel's cache
would destroy these efforts, at least when the two reads happen mostly
in parallel.


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Christoph Anton Mitterer
On Wed, 2015-12-30 at 22:10 +0800, Qu Wenruo wrote:
> Or, just don't touch it until there is really enough user demand.
I definitely think that there is demand... as I've written previously,
when I did some benchmarking tests (though on MD and HW raid) then
depending on the RAID chunk size, one got different kind of IO patterns
tuned.

> (In stripe_len case, it's also a possible choice, as configurable
> stripe 
> length doesn't really affect much except RAID5/6)
Sure about that? Admittedly I haven't checked it for the not parity
RAIDs, but I'd expect that for the same reasons you get out different
performance for sequential/random/vector reads/writes, you'd also get
that more or less at least at RAID0.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: Confining scrub to a subvolume

2015-12-30 Thread Duncan
Christoph Anton Mitterer posted on Wed, 30 Dec 2015 20:28:00 +0100 as
excerpted:

> On Wed, 2015-12-30 at 18:26 +, Duncan wrote:
>> That should work.  Cat the files to /dev/null and check dmesg.  For
>> single mode it should check the only copy.  For raid1/10 or dup,
>> running two checks, ensuring one is even-PID while the other is
>> odd-PID, should work to check both copies, since the read-scheduler
>> assigns copy based on even/odd PID.  Errors will show up in dmesg, as
>> well as cat's STDERR.

> That doesn't seem very reliable to me, to be honest... plus it wouldn't
> work in any RAID56 or dupN (with n!=2) case, when that gets sooner or
> later implemented.

Well, yes, but right now except on raid56... and there's a good chance 
it'll work for a year at least, as I've seen no first-patches yet to 
implement n-way (which I'm sure looking forward to), after which perhaps 
he'll have implemented the multi-btrfs on partitions or lvm thing that I 
actually prefer, myself.

Meanwhile, it's a pretty clever solution, I think. =:^)

> Also, I'd kinda guess (or better said: hope) that the kernel's cache
> would destroy these efforts, at least when the two reads happen mostly
> in parallel.

I was thinking run them in parallel, but you're right, you'd have to run 
them serially and dumpcache between runs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] btrfs:Fix incorrect return statement if failure occurs in the function btrfs_mark_extent_written

2015-12-30 Thread Chris Mason
On Tue, Dec 29, 2015 at 08:20:47PM -0500, Nicholas Krause wrote:
> This fixes the incorrect return statement if failure occurs by
> returning 0 rather then the variable ret which may hold a error
> code to signal when a failure has occurred in the function
> btrfs_mark_extent_written to its callers rather then always
> making this function appear to run successfully to its callers
> by always returning zero.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  fs/btrfs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index b823fac..7a9ab8e 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1276,7 +1276,7 @@ again:
>   }
>  out:
>   btrfs_free_path(path);
> - return 0;
> + return ret;
>  }

We're checking ret higher up and aborting the transaction properly.
There is at least one place ret will be non-zero above that isn't an
error, but you're passing it to the caller here making them think it has
gone wrong.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Christoph Anton Mitterer
On Wed, 2015-12-30 at 21:02 +, Duncan wrote:
> For something like that, it'd pretty much /have/ to be done as COW,
> at 
> least at the chunk level, tho the address from the outside may stay
> the 
> same.  That's what balance already does, after all.
Ah... of course,... it would be basically CoW1 again... sorry for not
having thought about that :-)

Sounds like a good and stable solution then.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: Confining scrub to a subvolume

2015-12-30 Thread Christoph Anton Mitterer
On Wed, 2015-12-30 at 18:39 +0100, David Sterba wrote:
> The closest would be to read the files and look for any reported
> errors.
Doesn't that fail for any multi-device setup, in which case btrfs reads
the blocks only from one device, and if that verifies, doesn't check
the other?


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: Confining scrub to a subvolume

2015-12-30 Thread Christoph Anton Mitterer
On Wed, 2015-12-30 at 20:57 +, Duncan wrote:
> Meanwhile, it's a pretty clever solution, I think. =:^)
Well the problem with such workaround-solutions is... end-users get
used to it, rely on it, and suddenly they don't work anymore (which the
user wouldn't probably notice, though).

> I was thinking run them in parallel, but you're right, you'd have to
> run 
> them serially and dumpcache between runs.
And, you'd need to clear the cache, at least for filesystems < memory
but even if not, I wouldn't be 100% that one can safely do without
clearing the cache.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


BTRFS in 4.4.0-rc7 keeping system from hibernating

2015-12-30 Thread Jon Christopherson

Hello,

Ever since 4.4.0-rc1 or so, BTRFS and XFS hasn't played well with 
hibernation. It may be deeper down as both filesystems seem to have 
issues with not being able to commit/freeze as can be seen below:


[81167.893207] PM: Syncing filesystems ... done.
[81168.194298] Freezing user space processes ... (elapsed 0.032 seconds) 
done.

[81168.226832] PM: Marking nosave pages: [mem 0x-0x0fff]
[81168.226839] PM: Marking nosave pages: [mem 0x00058000-0x00058fff]
[81168.226843] PM: Marking nosave pages: [mem 0x0009-0x00090fff]
[81168.226846] PM: Marking nosave pages: [mem 0x0009e000-0x000f]
[81168.226853] PM: Marking nosave pages: [mem 0x8f68f000-0x8f6d9fff]
[81168.226860] PM: Marking nosave pages: [mem 0x8f71e000-0x9022efff]
[81168.226997] PM: Marking nosave pages: [mem 0x95295000-0x97ffefff]
[81168.227549] PM: Marking nosave pages: [mem 0x9800-0x]
[81168.229676] PM: Basic memory bitmaps created
[81168.230414] PM: Preallocating image memory... done (allocated 1559693 
pages)

[81169.667742] PM: Allocated 6238772 kbytes in 1.43 seconds (4362.77 MB/s)
[81169.667743] Freezing remaining freezable tasks ...
[81189.679101] Freezing of tasks failed after 20.010 seconds (2 tasks 
refusing to freeze, wq_busy=0):
[81189.679299] btrfs-cleaner   D 88008841fad8 0  2141  2 
0x
[81189.679308]  88008841fad8 880453bd4a08 81c11500 
880416ab3b00
[81189.679314]  88008842 880466416b00 7fff 
88008841fc48
[81189.679319]  817d4650 88008841faf0 817d3ef5 


[81189.679325] Call Trace:
[81189.679338]  [] ? bit_wait+0x60/0x60
[81189.679343]  [] schedule+0x35/0x80
[81189.679348]  [] schedule_timeout+0x189/0x250
[81189.679391]  [] ? __set_extent_bit+0x430/0x550 [btrfs]
[81189.679398]  [] ? ktime_get+0x37/0xa0
[81189.679427]  [] ? bit_wait+0x60/0x60
[81189.679431]  [] io_schedule_timeout+0xa4/0x110
[81189.679436]  [] bit_wait_io+0x1b/0x70
[81189.679440]  [] __wait_on_bit_lock+0x4e/0xb0
[81189.679474]  [] ? __clear_extent_bit+0x2ec/0x3b0 
[btrfs]

[81189.679481]  [] __lock_page+0xb0/0xc0
[81189.679488]  [] ? autoremove_wake_function+0x40/0x40
[81189.679494]  [] pagecache_get_page+0x17d/0x1c0
[81189.679528]  [] btrfs_defrag_file+0x33b/0xcd0 [btrfs]
[81189.679536]  [] ? put_prev_entity+0x33/0x7e0
[81189.679567]  [] btrfs_run_defrag_inodes+0x1ef/0x300 
[btrfs]

[81189.679593]  [] cleaner_kthread+0xd0/0x200 [btrfs]
[81189.679617]  [] ? check_leaf+0x330/0x330 [btrfs]
[81189.679624]  [] kthread+0xc9/0xe0
[81189.679631]  [] ? kthread_create_on_node+0x180/0x180
[81189.679636]  [] ret_from_fork+0x3f/0x70
[81189.679643]  [] ? kthread_create_on_node+0x180/0x180
[81189.679654] xfsaild/dm-4S 88041615fe08 0  2350  2 
0x
[81189.679659]  88041615fe08  880453f02c40 
880416b4bb00
[81189.679664]  88041616 880416b4bb00  
88045252e100
[81189.679669]  880418acb800 88041615fe20 817d3ef5 


[81189.679674] Call Trace:
[81189.679679]  [] schedule+0x35/0x80
[81189.679734]  [] xfsaild+0x53f/0x5d0 [xfs]
[81189.679780]  [] ? 
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[81189.679820]  [] ? 
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]

[81189.679827]  [] kthread+0xc9/0xe0
[81189.679834]  [] ? kthread_create_on_node+0x180/0x180
[81189.679840]  [] ret_from_fork+0x3f/0x70
[81189.679845]  [] ? kthread_create_on_node+0x180/0x180
[81189.679930]

4.3.0 would hibernate correctly.


--

Regards,

Jon Christopherson
j...@jons.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Runs the xor function if a Block has failed

2015-12-30 Thread David Sterba
On Wed, Dec 30, 2015 at 01:28:36AM -0500, Sanidhya Solanki wrote:
> The patch adds the xor function after the P stripe
> has failed, without bad data or the Q stripe.

That's just the comment copied, the changelog does not explain why it's
ok to do just the run_xor there. It does not seem trivial to me. Please
describe that the end result after the code change is expected.

> @@ -1864,8 +1864,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
> *rbio)
>   /*
>* Just the P stripe has failed, without
>* a bad data or Q stripe.
> -  * TODO, we should redo the xor here.
>*/
> + run_xor(pointers, rbio->nr_data - 1, 
> PAGE_CACHE_SIZE);
>   err = -EIO;
>   goto cleanup;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

2015-12-30 Thread cheater00 .
Hi,
I have a 6TB partition here, it filled up while still just under 2TB
were on it. btrfs fi df showed that Data is 1.92TB:

Data, single: total=1.92TiB, used=1.92TiB
System, DUP: total=8.00MiB, used=224.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=5.00GiB, used=3.32GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

btrfs fs resize max . did nothing, I also tried resize -1T and resize
+1T and that did nothing as well. On IRC I was directed to this:

https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29

"When you haven't hit the "usual" problem

If the conditions above aren't true (i.e. there's plenty of
unallocated space, or there's lots of unused metadata allocation),
then you may have hit a known but unresolved bug. If this is the case,
please report it to either the mailing list, or IRC. In some cases, it
has been possible to deal with the problem, but the approach is new,
and we would like more direct contact with people experiencing this
particular bug."

What do I do now? It's kind of important to me to get that free space.
I'm really jonesing for that free space.

Thanks.

(btw, so far I haven't been able to follow up on that unrelated thread
from a while back. But I hope to be able to do that sometime in
January.)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

2015-12-30 Thread Chris Murphy
kernel and btrfs-progs versions
and output from:
'btrfs fi show '
'btrfs fi usage '
'btrfs-show-super '
'df -h'

Then umount the volume, and mount with option enospc_debug, and try to
reproduce the problem, then include everything from dmesg from the
time the volume was mounted.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Sanidhya Solanki
On Wed, 30 Dec 2015 16:58:05 +0100
David Sterba  wrote:

> On Wed, Dec 30, 2015 at 06:15:23AM -0500, Sanidhya Solanki wrote:

> > - Implement a way to do an in-place Stripe Length change.
>
> How are you going to implement that? I've suggested the balance filter
> style of conversion, which is not in-place so I'm curious what do you
> mean by in-place.

As CAM suggested, it would basically be a CoW, with a checksum
comparison at the end to make sure no data has been corrupted.

In-place: Without taking the drives or filesystem offline or unmounting
them. Doing the conversion while the rest of the RAID is in use.
Risky, slow, but possible, given enough time for large data sets.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Sanidhya Solanki
On Wed, 30 Dec 2015 17:17:22 +0100
David Sterba  wrote:

> Let me note that a good reputation is also built from patch reviews
> (hint hint).

Unfortunately, not too many patches coming in for BTRFS presently.
Mailing list activity is down to 25-35 mails per day. Mostly feature
and bug requests.

I will try to pitch in with patch reviews where possible.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Runs the xor function if a Block has failed

2015-12-30 Thread Sanidhya Solanki
On Wed, 30 Dec 2015 18:18:26 +0100
David Sterba  wrote:

> That's just the comment copied, the changelog does not explain why
> it's ok to do just the run_xor there. It does not seem trivial to me.
> Please describe that the end result after the code change is expected.

In the RAID 6 case after a failure, we discover that the failure
affected the entire P stripe, without any bad data occurring. Hence, we
xor the previously stored parity data to return the data that was lost
in the P stripe failure.

The xor-red data is from the parity blocks. Hence, we are left with 
recovered data belonging to the P stripe.

If there is an error during the completion of the xor (provided by the
patch ), we got to the cleanup function.

Hope that is satisfactory.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] BTRFS: Runs the xor function if a Block has failed

2015-12-30 Thread Sanidhya Solanki
The patch adds the xor function after the P stripe
has failed, without bad data or the Q stripe.

Signed-off-by: Sanidhya Solanki 
---
 fs/btrfs/raid56.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 1a33d3e..d33734a 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1864,8 +1864,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
/*
 * Just the P stripe has failed, without
 * a bad data or Q stripe.
-* TODO, we should redo the xor here.
 */
+   run_xor(pointers, rbio->nr_data - 1, 
PAGE_CACHE_SIZE);
err = -EIO;
goto cleanup;
}
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Feedback on inline dedup patches

2015-12-30 Thread Qu Wenruo

Hi Marcel

Thanks a lot for the feedback.

Marcel Ritter wrote on 2015/12/30 09:15 +0100:

Hi Qu Wenrou,

I just wanted to give some feedback on yesterdays dedup patches:

I just applied them to a 4.4-rc7 kernel and did some (very basic)
testing:

Test1: in-memory

Didn't crash on my 350 GB test files. Copying those files again,
but "btrfs fi df" didn't show much space savings (maybe that's not
the tool to check anyway?).

Two reasons, one is the the default limit for 4096 hashes.

The other one is a bug for not deduping if a transaction is just committed.
We already have fix for it internally, but we are busy fixing/testing 
on-disk backend.

But that's not a big problem as it will only skipped the first several hit.


Looking further I found the (default) limit of 4096 hashes (is it really
hashes? with 16k blocks that'd cover a dataset of only 64 MB?).


Yes, that's the default value.

Allowing even embedded device to have a try on btrfs dedup.
Default value shouldn't be super big to make the system OOM, so I just 
chose the small 4096 default value.



I think I'll start a new test run, with a much higher number of hashes,
but I'd like to know the memory requirements involved - is there
a formula for calculating those memory needs?


The formula is very easy:
Memory usage = btrfs_dedup_hash_size * limit.

Currently, btrfs_dedup_hahs_size for SHA-256 is 112 bytes.



Test2: ondisk

Created filesystem with "-O dedup", did a btrfs dedup enable -s ondisk"
and started copying the same date (s. above). Just a few seconds
later I got a kernel crash :-(
I'll try to get a kernel dump - maybe this helps to track down the problem.


We're aware of the bug, and are trying our best to fix it.
But the bug seems quite wired and it may take some time to fix.

So on-disk is not recommended, unless you want to help fixing the bug.

Thanks,
Qu




Marcel





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Feedback on inline dedup patches

2015-12-30 Thread Qu Wenruo



Qu Wenruo wrote on 2015/12/30 16:38 +0800:

Hi Marcel

Thanks a lot for the feedback.

Marcel Ritter wrote on 2015/12/30 09:15 +0100:

Hi Qu Wenrou,

I just wanted to give some feedback on yesterdays dedup patches:

I just applied them to a 4.4-rc7 kernel and did some (very basic)
testing:

Test1: in-memory

Didn't crash on my 350 GB test files. Copying those files again,
but "btrfs fi df" didn't show much space savings (maybe that's not
the tool to check anyway?).

Two reasons, one is the the default limit for 4096 hashes.

The other one is a bug for not deduping if a transaction is just committed.
We already have fix for it internally, but we are busy fixing/testing
on-disk backend.
But that's not a big problem as it will only skipped the first several hit.


Looking further I found the (default) limit of 4096 hashes (is it really
hashes? with 16k blocks that'd cover a dataset of only 64 MB?).


Yes, that's the default value.

Allowing even embedded device to have a try on btrfs dedup.
Default value shouldn't be super big to make the system OOM, so I just
chose the small 4096 default value.


I think I'll start a new test run, with a much higher number of hashes,
but I'd like to know the memory requirements involved - is there
a formula for calculating those memory needs?


The formula is very easy:
Memory usage = btrfs_dedup_hash_size * limit.

Currently, btrfs_dedup_hahs_size for SHA-256 is 112 bytes.



Test2: ondisk

Created filesystem with "-O dedup", did a btrfs dedup enable -s ondisk"
and started copying the same date (s. above). Just a few seconds
later I got a kernel crash :-(
I'll try to get a kernel dump - maybe this helps to track down the
problem.


We're aware of the bug, and are trying our best to fix it.
But the bug seems quite wired and it may take some time to fix.


OK

I'm just confused with "btrfs_item_offset_nr" and "btrfs_item_ptr_offset".

And that's the root cause of the problem for on-disk backend.

What a SPER STUPID bug!!!

So the fix will be much sooner than I'd expected.

Thanks,
Qu



So on-disk is not recommended, unless you want to help fixing the bug.

Thanks,
Qu




Marcel





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Sanidhya Solanki
On Tue, 29 Dec 2015 18:06:11 +0100
David Sterba  wrote:

> So you want to make the stripe size configurable?...

As I see it there are 3 ways to do it:
-Make it a compile time option that only configures it for a single
system with any devices that are added to the RAID.
-Make it a runtime option that can change based on how the
administrator configures it.
-A non-user facing option that is configurable by someone like a
distribution maintainer for all systems using the Binary Distribution.

As I see it, DS would like something like the third option, but CAM
(ostensibly a SysAdmin) wants the second option.

On the other hand, I implemented the first option. 

The first and third option can co-exit, the second is an orthogonal
target that needs to be setup separately.

Or we can make all options co-exist, but make it more complicated.

Please let me know which implementation is preferable, and, if you just
want me to expand the description (as DS' mail asked for) or redo the
entire setup.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Sanidhya Solanki
On Wed, 30 Dec 2015 19:59:16 +0800
Qu Wenruo  wrote:
> Not really sure about the difference between 2 and 3.

I should have made it clear before, I was asking the exact use case in
mind when listing the choices. Option 2 would be for SysAdmins running
production software and configuring it as they desire.
Option 3 is what we have in the Kernel now, before my patch, where the
option exists, but it is fixed by the code. You can change it, but you
need to be someone fairly involved in the upstream work (like a
distribution Maintainer). This is what my patch implements (well, this
and option 3).
Option 1 leaves it as a compile time option.

> When you mention runtime option, did you mean ioctl/mount/balance 
> convert option?

Yes, that is correct.

> And what's the third one? Default mkfs time option?
> If you can make it mkfs time option, it won't be really hard to make
> it configurable.

This would be ideal for all use-cases, but make the implementation
much larger than it would be for the other options. Hence, I asked
what the exact use case was for the end-user being targeted.
 
> I didn't consider David means something that.
> As far as I read, he means balance convert option along with mkfs
> option.

Hence, why I asked.

> At least from what I have learned in recent btrfs development,
He> He> either
> we provide a good enough interfaces (normally, balance convert ioctl
> with mkfs time option) to configure some on-disk fields.

Just confirming before starting the implementation.
> So fixed kernel value is not a really good idea, and should at least
> be replace by mkfs time option.

Will do after confirmation.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



On 12/30/2015 05:54 PM, Sanidhya Solanki wrote:

On Wed, 30 Dec 2015 19:59:16 +0800
Qu Wenruo  wrote:

Not really sure about the difference between 2 and 3.


I should have made it clear before, I was asking the exact use case in
mind when listing the choices. Option 2 would be for SysAdmins running
production software and configuring it as they desire.
Option 3 is what we have in the Kernel now, before my patch, where the
option exists, but it is fixed by the code. You can change it, but you
need to be someone fairly involved in the upstream work (like a
distribution Maintainer). This is what my patch implements (well, this
and option 3).
Option 1 leaves it as a compile time option.


When you mention runtime option, did you mean ioctl/mount/balance
convert option?


Yes, that is correct.


And what's the third one? Default mkfs time option?
If you can make it mkfs time option, it won't be really hard to make
it configurable.


This would be ideal for all use-cases, but make the implementation
much larger than it would be for the other options. Hence, I asked
what the exact use case was for the end-user being targeted.


I didn't consider David means something that.
As far as I read, he means balance convert option along with mkfs
option.


Hence, why I asked.


At least from what I have learned in recent btrfs development,

He> He> either

we provide a good enough interfaces (normally, balance convert ioctl
with mkfs time option) to configure some on-disk fields.


Just confirming before starting the implementation.

So fixed kernel value is not a really good idea, and should at least
be replace by mkfs time option.


Will do after confirmation.


Understood now.

Now I am on the same side of David.
Which means a runtime interface to change them. (along with mkfs option)

If provide some configurable features, then it should be able to be 
tuned at both right time and mkfs time.

Or, just don't touch it until there is really enough user demand.
(In stripe_len case, it's also a possible choice, as configurable stripe 
length doesn't really affect much except RAID5/6)



I totally understand that implement will cost you a lot of more time, 
not only kernel part but also user-tool part.


But this also means more patches.
No matter what the motivation for you to contribute to btrfs, more 
patches (except the more time spent) are always good.


More patches, more reputation built in community, and more patches also 
means better split code structures for easier review.

And also you will need to do more debugging/tests, to polish your skill.

Thanks,
Qu



Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Qu Wenruo



On 12/30/2015 02:39 PM, Sanidhya Solanki wrote:

On Tue, 29 Dec 2015 18:06:11 +0100
David Sterba  wrote:


So you want to make the stripe size configurable?...


As I see it there are 3 ways to do it:
-Make it a compile time option that only configures it for a single
system with any devices that are added to the RAID.
-Make it a runtime option that can change based on how the
administrator configures it.
-A non-user facing option that is configurable by someone like a
distribution maintainer for all systems using the Binary Distribution.


Not really sure about the difference between 2 and 3.

When you mention runtime option, did you mean ioctl/mount/balance 
convert option?


And what's the third one? Default mkfs time option?

If you can make it mkfs time option, it won't be really hard to make it 
configurable.




As I see it, DS would like something like the third option, but CAM
(ostensibly a SysAdmin) wants the second option.


I didn't consider David means something that.

As far as I read, he means balance convert option along with mkfs option.



On the other hand, I implemented the first option.


At least from what I have learned in recent btrfs development, either we 
provide a good enough interfaces (normally, balance convert ioctl with 
mkfs time option) to configure some on-disk fields.


Or we just leave it to fixed value(normally 0, just like for encryption 
of EXTENT_DATA, and that's the case for current stripe_size).


So fixed kernel value is not a really good idea, and should at least be 
replace by mkfs time option.




The first and third option can co-exit, the second is an orthogonal
target that needs to be setup separately.

Or we can make all options co-exist, but make it more complicated.


No need.
Just refer to how btrfs kernel handle chunk profile.

It can be specified at mkfs time (by -d and -m options), and can also be 
converted later by balance ioctl. (by btrfs balance convert filter).


The only tricky thing I am a little considered about is, how do we keep 
the default chunk stripe size for a fs.


Thanks,
Qu


Please let me know which implementation is preferable, and, if you just
want me to expand the description (as DS' mail asked for) or redo the
entire setup.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] btrfs: Enhance chunk validation check

2015-12-30 Thread Qu Wenruo

On 12/29/2015 06:11 PM, Chandan Rajendra wrote:

On Tuesday 08 Dec 2015 16:40:42 Qu Wenruo wrote:

Enhance chunk validation:
1) Num_stripes
We already have such check but it's only in super block sys chunk
array.
Now check all on-disk chunks.

2) Chunk logical
It should be aligned to sector size.
This behavior should be *DOUBLE CHECKED* for 64K sector size like
PPC64 or AArch64.
Maybe we can found some hidden bugs.



Sorry about the delayed response. I executed fstests on ppc64 with 64k block
size and all the tests that used to pass earlier (i.e. without patch applied)
continue to pass. Hence,

Tested-by: Chandan Rajendra 


Very glad to hear that.

Thanks for all the test.
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BTRFS: Adds an option to select RAID Stripe size

2015-12-30 Thread Sanidhya Solanki
On Wed, 30 Dec 2015 22:10:44 +0800
Qu Wenruo  wrote:
> Understood now.

Good.

> I totally understand that implement ... to polish your
> skill.

That has got to be the most hilarious way I believe I have seen someone
delegate a task. But it was effective.

Only one problem. I do not run BTRFS on my systems nor do I have a
RAID setup, due to possessing a limited number of free drives. So, while
I may be able to code for it, I will not be able to test it. I will need
the community's help to do the testing.

I will get started tomorrow.

To-do (so far):
- Implement RAID Stripe length as a compile and runtime option.
- Implement a way to do an in-place Stripe Length change.
- Debugging & testing for the above additions.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Confining scrub to a subvolume

2015-12-30 Thread Sree Harsha Totakura
Hi,

Is it possible to confine scrubbing to a subvolume instead of the whole
file system?

The problem I am facing is that I have a 5 TB btrfs FS.  On it I have
created subvolumes for weekly backups, personal photos, music, and
documents.  Obviously, I am more concerned about my photos and documents
than my backups and music.  Therefore, I would like to scrub the photos
and documents subvolumes more often than the backups subvolume.  Would
this be possible with the current tools?

Regards,
Sree
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html