Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

2016-08-05 Thread Gabriel C

On 04.08.2016 18:53, Lutz Vieweg wrote:
> 
> I was today hit by what I think is probably the same bug:
> A btrfs on a close-to-4TB sized block device, only half filled
> to almost exactly 2 TB, suddenly says "no space left on device"
> upon any attempt to write to it. The filesystem was NOT automatically
> switched to read-only by the kernel, I should mention.
> 
> Re-mounting (which is a pain as this filesystem is used for
> $HOMEs of a multitude of active users who I have to kick from
> the server for doing things like re-mounting) removed the symptom
> for now, but from what I can read in linux-btrfs mailing list
> archives, it pretty likely the symptom will re-appear.
> 
> Here are some more details:
> 
> Software versions:
>> linux-4.6.1 (vanilla from kernel.org)
...
> 
> dmesg output from the time the "no space left on device"-symptom
> appeared:
> 
>> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 
>> btrfs_destroy_inode+0x263/0x2a0 [btrfs]


> ...
>> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 
>> btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]


Sounds like the bug I hit too also ..

To fix this you'll need :


crazy@zwerg:~/Work/linux-git$ git show 8b8b08cbf
commit 8b8b08cbfb9021af4b54b4175fc4c51d655aac8c
Author: Chris Mason 
Date:   Tue Jul 19 05:52:36 2016 -0700

Btrfs: fix delalloc accounting after copy_from_user faults

Commit 56244ef151c3cd11 was almost but not quite enough to fix the
reservation math after btrfs_copy_from_user returned partial copies.

Some users are still seeing warnings in btrfs_destroy_inode, and with a
long enough test run I'm able to trigger them as well.

This patch fixes the accounting math again, bringing it much closer to
the way it was before the sectorsize conversion Chandan did.  The
problem is accounting for the offset into the page/sector when we do a
partial copy.  This one just uses the dirty_sectors variable which
should already be updated properly.

Signed-off-by: Chris Mason 
cc: sta...@vger.kernel.org # v4.6+

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f3f61d1..bcfb4a2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1629,13 +1629,11 @@ again:
 * managed to copy.
 */
if (num_sectors > dirty_sectors) {
-   /*
-* we round down because we don't want to count
-* any partial blocks actually sent through the
-* IO machines
-*/
-   release_bytes = round_down(release_bytes - copied,
- root->sectorsize);
+
+   /* release everything except the sectors we dirtied */
+   release_bytes -= dirty_sectors <<
+   root->fs_info->sb->s_blocksize_bits;
+
if (copied > 0) {
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->outstanding_extents++;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.8] btrfs heats my room with lock contention

2016-08-05 Thread Chris Mason



On 08/04/2016 11:01 PM, Dave Chinner wrote:

On Thu, Aug 04, 2016 at 10:28:44AM -0400, Chris Mason wrote:



On 08/04/2016 02:41 AM, Dave Chinner wrote:


Simple test. 8GB pmem device on a 16p machine:

# mkfs.btrfs /dev/pmem1
# mount /dev/pmem1 /mnt/scratch
# dbench -t 60 -D /mnt/scratch 16

And heat your room with the warm air rising from your CPUs. Top
half of the btrfs profile looks like:

.

Performance vs CPu usage is:

nprocs  throughput  cpu usage
1   440MB/s  50%
2   770MB/s 100%
4   880MB/s 250%
8   690MB/s 450%
16  280MB/s 950%

In comparision, at 8-16 threads ext4 is running at ~2600MB/s and
XFS is running at ~3800MB/s. Even if I throw 300-400 processes at
ext4 and XFS, they only drop to ~1500-2000MB/s as they hit internal
limits.


Yes, with dbench btrfs does much much better if you make a subvol
per dbench dir.  The difference is pretty dramatic.  I'm working on
it this month, but focusing more on database workloads right now.


You've been giving this answer to lock contention reports for the
past 6-7 years, Chris.  I really don't care about getting big
benchmark numbers with contrived setups - the "use multiple
subvolumes" solution is simply not practical for users or their
workloads.  The default config should behave sanely and not not
contribute to global warming like this.



The btree setup that makes lock contention here makes some other 
benchmarks faster.  Needing to create subvolumes in order to fix 
performance problems on dbench is far from ideal, but in production here 
the tradeoffs have been worth it.


Basically this one definitely comes up during dbench and fs_mark and 
much less often elsewhere.  For the workloads that hit this lock 
contention, splitting things out into subvolumes hugely reduces metadata 
fragmentation on reads.  So it's not just CPU we're helping with 
subvolumes but spindle time too.


It's true I haven't invested time into guessing when the admin wants to 
split on a per-subvolume basis.  Still, I do love the polar bears, so 
I'll take another shot at the btree lock.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash in btrfs_uuid_tree_iterate during mount

2016-08-05 Thread Nikolay Borisov
On Fri, Aug 5, 2016 at 6:12 PM, Chris Mason  wrote:
>
> On 08/05/2016 07:08 AM, Nikolay Borisov wrote:
>> Hello,
>>
>> Any ideas how come btrfs_path can be all zero, the one in
>> the first slot comes from the increment in btrfs_next_old_item.
>
> Thanks for all the extra details.  It really must be this:
>
> if (ret > 0) {
>  btrfs_release_path(path);
>  ret = btrfs_uuid_iter_rem(root, uuid, 
> key.type,
>subid_cpu);
>  if (ret == 0) {
>  /*
>   * this might look inefficient, but 
> the
>   * justification is that it is an
>   * exception that check_func returns 
> 1,
>   * and that in the regular case only 
> one
>   * entry per UUID exists.
>   */
>  goto again_search_slot;
>  }
>  if (ret < 0 && ret != -ENOENT)
>  goto out;
>  }
>  item_size -= sizeof(subid_le);
>  offset += sizeof(subid_le);
>
>
> We've released the path, which would explain why its full of NULL.  ret
> was ENOENT, so it kept on going, and we fell through to
> btrfs_next_item()
>
> Once the path is released, we should either be searching again or
> exiting.  A goto again_search_slot would probably fix it, but I'd want
> to also bump the key so we don't just process the same item over and
> over again.
>
> Can you reproduce this reliably?  I'd hate to patch it now and make more
> problems later just because we didn't fully understand the items we were
> tripping over.

Well there are 2 things I can do:
 a) Dig more in the crash dump to see whether ret has been saved to
the stack and extract the return value. If your theory is correct I
should see the value of ENOENT.
 b) Patch the code to print a warn when btrfs_uuid_iter_rem returns an
ENOENT, that way at least we will know that this is happening.

In either cases this would take me until at least next week, at which
time I should be able to  give more information.

>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Diego Calleja
El sábado, 6 de agosto de 2016 0:45:13 (CEST) Tomasz Chmielewski escribió:
> And, miracle cure O_o
> 
> # file ./2016-08-02/serverX/syslog.log
> ERROR: cannot read `./2016-08-02/serverX/syslog.log' (Input/output
> error)
> 
> # echo 3 > /proc/sys/vm/drop_caches
> 
> # file 2016-08-02/serverX/syslog.log
> 2016-08-02/serverX/syslog.log: ASCII text, with very long lines

FWIW, bugs similar to this one were reported in the past:

http://www.spinics.net/lists/linux-btrfs/msg54962.html
http://www.spinics.net/lists/linux-btrfs/msg52371.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Tomasz Chmielewski

On 2016-08-06 00:45, Tomasz Chmielewski wrote:


And, miracle cure O_o

# file ./2016-08-02/serverX/syslog.log
ERROR: cannot read `./2016-08-02/serverX/syslog.log' (Input/output 
error)


# echo 3 > /proc/sys/vm/drop_caches

# file 2016-08-02/serverX/syslog.log
2016-08-02/serverX/syslog.log: ASCII text, with very long lines

# cat 2016-08-02/serverX/syslog.log
(...)


A few mins after the previous "echo 3 > /proc/sys/vm/drop_caches" (this 
file is around 1.5 MB and wasn't touched since 2016-06-21):


# file ./2016-06-21/serverY/nginx-dashboard-error.log
./2016-06-21/serverY/nginx-dashboard-error.log: ERROR: cannot read 
`./2016-06-21/serverY/nginx-dashboard-error.log' (Input/output error)


# echo 3 > /proc/sys/vm/drop_caches

# file ./2016-06-21/serverY/nginx-dashboard-error.log
./2016-06-21/serverY/nginx-dashboard-error.log: ASCII text, with very 
long lines


# cat ./2016-06-21/serverY/nginx-dashboard-error.log
(...works OK, no corruption...)


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Chris Mason


On 08/05/2016 11:45 AM, Tomasz Chmielewski wrote:
> On 2016-08-06 00:38, Tomasz Chmielewski wrote:
> 
>>> Too big for the known problem though.  Still, can you btrfs-debug-tree
>>> and just make sure it doesn't have inline items?
>>
>> Hmmm
>>
>> # btrfs-debug-tree /dev/xvdb > /root/debug.tree
>> parent transid verify failed on 355229302784 wanted 49943295 found
>> 49943301
>> parent transid verify failed on 355229302784 wanted 49943295 found
>> 49943301
>> Ignoring transid failure
>> parent transid verify failed on 355233251328 wanted 49943299 found
>> 49943303
>> parent transid verify failed on 355233251328 wanted 49943299 found
>> 49943303
>> Ignoring transid failure
>> print-tree.c:1105: btrfs_print_tree: Assertion failed.
>> btrfs-debug-tree[0x418d99]
>> btrfs-debug-tree(btrfs_print_tree+0x26a)[0x41acf6]
>> btrfs-debug-tree(main+0x9a5)[0x432589]
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2369de0f45]
>> btrfs-debug-tree[0x4070e9]
> 
> And, miracle cure O_o
> 
> # file ./2016-08-02/serverX/syslog.log
> ERROR: cannot read `./2016-08-02/serverX/syslog.log' (Input/output error)
> 
> # echo 3 > /proc/sys/vm/drop_caches
> 
> # file 2016-08-02/serverX/syslog.log
> 2016-08-02/serverX/syslog.log: ASCII text, with very long lines
> 
> # cat 2016-08-02/serverX/syslog.log
> (...)
> 

If you don't already have this commit, please give it a try.  Should fix things 
up.

commit 8dff9c85341032767d7b519217a79ea04cd676b0
Author: Chris Mason 
Date:   Sat Sep 19 11:28:25 2015 -0700

Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Tomasz Chmielewski

On 2016-08-06 00:40, Chris Mason wrote:

Too big for the known problem though.  Still, can you 
btrfs-debug-tree

and just make sure it doesn't have inline items?


Hmmm

# btrfs-debug-tree /dev/xvdb > /root/debug.tree
parent transid verify failed on 355229302784 wanted 49943295 found 
49943301
parent transid verify failed on 355229302784 wanted 49943295 found 
49943301

Ignoring transid failure
parent transid verify failed on 355233251328 wanted 49943299 found 
49943303
parent transid verify failed on 355233251328 wanted 49943299 found 
49943303

Ignoring transid failure
print-tree.c:1105: btrfs_print_tree: Assertion failed.
btrfs-debug-tree[0x418d99]
btrfs-debug-tree(btrfs_print_tree+0x26a)[0x41acf6]
btrfs-debug-tree(main+0x9a5)[0x432589]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2369de0f45]
btrfs-debug-tree[0x4070e9]


Looks like the FS is mounted?


It is mounted, yes. Does btrfs-debug-tree need an unmounted FS?

I'm not able to unmount it unfortunately (in sense, the system has to 
work).



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Tomasz Chmielewski

On 2016-08-06 00:38, Tomasz Chmielewski wrote:


Too big for the known problem though.  Still, can you btrfs-debug-tree
and just make sure it doesn't have inline items?


Hmmm

# btrfs-debug-tree /dev/xvdb > /root/debug.tree
parent transid verify failed on 355229302784 wanted 49943295 found 
49943301
parent transid verify failed on 355229302784 wanted 49943295 found 
49943301

Ignoring transid failure
parent transid verify failed on 355233251328 wanted 49943299 found 
49943303
parent transid verify failed on 355233251328 wanted 49943299 found 
49943303

Ignoring transid failure
print-tree.c:1105: btrfs_print_tree: Assertion failed.
btrfs-debug-tree[0x418d99]
btrfs-debug-tree(btrfs_print_tree+0x26a)[0x41acf6]
btrfs-debug-tree(main+0x9a5)[0x432589]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2369de0f45]
btrfs-debug-tree[0x4070e9]


And, miracle cure O_o

# file ./2016-08-02/serverX/syslog.log
ERROR: cannot read `./2016-08-02/serverX/syslog.log' (Input/output 
error)


# echo 3 > /proc/sys/vm/drop_caches

# file 2016-08-02/serverX/syslog.log
2016-08-02/serverX/syslog.log: ASCII text, with very long lines

# cat 2016-08-02/serverX/syslog.log
(...)


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Chris Mason



On 08/05/2016 11:38 AM, Tomasz Chmielewski wrote:

On 2016-08-06 00:15, Chris Mason wrote:


# cat 2016-08-02/serverX/syslog.log
cat: 2016-08-02/serverX/syslog.log: Input/output error


How big is the file?  We had one bug with inline files that might have
caused this.


This one's tiny, 158137 bytes.


Too big for the known problem though.  Still, can you btrfs-debug-tree
and just make sure it doesn't have inline items?


Hmmm

# btrfs-debug-tree /dev/xvdb > /root/debug.tree
parent transid verify failed on 355229302784 wanted 49943295 found 49943301
parent transid verify failed on 355229302784 wanted 49943295 found 49943301
Ignoring transid failure
parent transid verify failed on 355233251328 wanted 49943299 found 49943303
parent transid verify failed on 355233251328 wanted 49943299 found 49943303
Ignoring transid failure
print-tree.c:1105: btrfs_print_tree: Assertion failed.
btrfs-debug-tree[0x418d99]
btrfs-debug-tree(btrfs_print_tree+0x26a)[0x41acf6]
btrfs-debug-tree(main+0x9a5)[0x432589]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2369de0f45]
btrfs-debug-tree[0x4070e9]


Looks like the FS is mounted?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Tomasz Chmielewski

On 2016-08-06 00:15, Chris Mason wrote:


# cat 2016-08-02/serverX/syslog.log
cat: 2016-08-02/serverX/syslog.log: Input/output error


How big is the file?  We had one bug with inline files that might 
have

caused this.


This one's tiny, 158137 bytes.


Too big for the known problem though.  Still, can you btrfs-debug-tree
and just make sure it doesn't have inline items?


Hmmm

# btrfs-debug-tree /dev/xvdb > /root/debug.tree
parent transid verify failed on 355229302784 wanted 49943295 found 
49943301
parent transid verify failed on 355229302784 wanted 49943295 found 
49943301

Ignoring transid failure
parent transid verify failed on 355233251328 wanted 49943299 found 
49943303
parent transid verify failed on 355233251328 wanted 49943299 found 
49943303

Ignoring transid failure
print-tree.c:1105: btrfs_print_tree: Assertion failed.
btrfs-debug-tree[0x418d99]
btrfs-debug-tree(btrfs_print_tree+0x26a)[0x41acf6]
btrfs-debug-tree(main+0x9a5)[0x432589]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2369de0f45]
btrfs-debug-tree[0x4070e9]



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Chris Mason



On 08/05/2016 10:44 AM, Tomasz Chmielewski wrote:

On 2016-08-05 23:26, Chris Mason wrote:

On 08/05/2016 07:42 AM, Tomasz Chmielewski wrote:

I'm getting occasional (every few weeks) input/output errors on a btrfs
filesystem with compress-force=zlib, running on Amazon EC2, with 4.5.2
kernel:

# cat 2016-08-02/serverX/syslog.log
cat: 2016-08-02/serverX/syslog.log: Input/output error


How big is the file?  We had one bug with inline files that might have
caused this.


This one's tiny, 158137 bytes.


Too big for the known problem though.  Still, can you btrfs-debug-tree 
and just make sure it doesn't have inline items?


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash in btrfs_uuid_tree_iterate during mount

2016-08-05 Thread Chris Mason

On 08/05/2016 07:08 AM, Nikolay Borisov wrote:
> Hello,
>
> Recently I started getting the following crashes on some servers,
> running btrfs:
>
> [340435.480338] BTRFS info (device loop7): disk space caching is enabled
> [340435.480509] BTRFS: has skinny extents
> [340441.716174] BTRFS: checking UUID tree
> [340441.912070] BUG: unable to handle kernel NULL pointer dereference at 
> 0098
> [340441.912463] IP: [] btrfs_uuid_tree_iterate+0xf4/0x2d0 
> [btrfs]
> [340441.912823] PGD 0
> [340441.913035] Oops:  [#1] SMP
> [340441.913302] Modules linked in:
> [340441.916996] CPU: 10 PID: 24990 Comm: btrfs-uuid Tainted: PW  O
> 4.4.14-clouder1 #55
> [340441.917287] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 
> 01/16/2015
> [340441.917573] task: 8801b95c1b80 ti: 88034e504000 task.ti: 
> 88034e504000
> [340441.917859] RIP: 0010:[]  [] 
> btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
> [340441.918212] RSP: 0018:88034e507e20  EFLAGS: 00010246
> [340441.918382] RAX:  RBX: 1600 RCX: 
> 8800
> [340441.918665] RDX: 0001 RSI: 8801e3abd140 RDI: 
> 88046f027f00
> [340441.918952] RBP: 88034e507ea8 R08: 60fb80001760 R09: 
> a07ac1de
> [340441.919236] R10: e8d41760 R11: ea00078eaf40 R12: 
> 8801b98ab750
> [340441.919521] R13: fffe R14: 8801e3abd140 R15: 
> 880049586000
> [340441.919810] FS:  () GS:88047fd4() 
> knlGS:
> [340441.920097] CS:  0010 DS:  ES:  CR0: 80050033
> [340441.920267] CR2: 0098 CR3: 01c0a000 CR4: 
> 000406e0
> [340441.920554] Stack:
> [340441.920717]  880049586000 8801b98ab750 3f7b00014fc0 
> 8803711dec08
> [340441.921186]  a07d0c40 880332342000 0114 
> 1b7088046d7612f8
> [340441.921655]  8cfb42689378e508 70157e0ade97f5d6 8c42689378e5081b 
> 15157e0ade97f5d6
> [340441.922126] Call Trace:
> [340441.922315]  [] ? find_live_mirror.isra.18+0xc0/0xc0 
> [btrfs]
> [340441.922614]  [] ? btrfs_uuid_scan_kthread+0x3c0/0x3c0 
> [btrfs]
> [340441.922917]  [] btrfs_uuid_rescan_kthread+0x1b/0x60 
> [btrfs]
> [340441.923197]  [] kthread+0xef/0x110
> [340441.923363]  [] ? kthread_park+0x60/0x60
> [340441.923531]  [] ret_from_fork+0x3f/0x70
> [340441.923697]  [] ? kthread_park+0x60/0x60
> [340441.923863] Code: 0f 86 a0 00 00 00 48 bb 00 00 00 00 00 16 00 00 41 8b 
> 44 24 40 48 b9 00 00 00 00 00 88 ff ff 8d 50 01 49 8b 04 24 41 89 54 24 40 
> <48> 03 98 98 00 00 00 48 89 d8 48 c1 f8 06 48 c1 e0 0c 3b 54 08
> [340441.927296] RIP  [] btrfs_uuid_tree_iterate+0xf4/0x2d0 
> [btrfs]
> [340441.927641]  RSP 
> [340441.927806] CR2: 0098
>
>
> a081f774 is in the heavily inlined btrfs_next_item. Here
> is the decoded instructions, right before the crash with annotations:
>
>0: 0f 86 a0 00 00 00   jbe0xa6
>6: 48 bb 00 00 00 00 00mov$0x1600,%rbx
>d: 16 00 00
>   10: 41 8b 44 24 40  mov0x40(%r12),%eax ; r12 is btrfs_path, eax 
> points to first slot
>   15: 48 b9 00 00 00 00 00mov$0x8800,%rcx
>   1c: 88 ff ff
>   1f: 8d 50 01lea0x1(%rax),%edx ; incr slot
>   22: 49 8b 04 24 mov(%r12),%rax ; load first extent_buffer 
> in rax
>   26: 41 89 54 24 40  mov%edx,0x40(%r12) ; save incremented slot
>   2b:*48 03 98 98 00 00 00add0x98(%rax),%rbx <-- trapping 
> instruction ; load the first page from the extent_buffer
>   32: 48 89 d8mov%rbx,%rax
>   35: 48 c1 f8 06 sar$0x6,%rax
>   39: 48 c1 e0 0c shl$0xc,%rax
>   3d: 3b  .byte 0x3b
>   3e: 54  push   %rsp
>   3f: 08  .byte 0x8
>
> So as can be seen rax is zero and naturally dereferencing it is
> also zero. What's interesting is the content of the btrf_path:
>
> struct btrfs_path {
>   nodes = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
>   slots = {1, 0, 0, 0, 0, 0, 0, 0},
>   locks = {0, 0, 0, 0, 0, 0, 0, 0},
>   reada = 0,
>   lowest_level = 0,
>   search_for_split = 0,
>   keep_locks = 0,
>   skip_locking = 0,
>   leave_spinning = 0,
>   search_commit_root = 0,
>   need_commit_sem = 0,
>   skip_release_on_error = 0
> }
>
> Any ideas how come btrfs_path can be all zero, the one in
> the first slot comes from the increment in btrfs_next_old_item.

Thanks for all the extra details.  It really must be this:

if (ret > 0) { 
 btrfs_release_path(path); 
 ret = btrfs_uuid_iter_rem(root, uuid, key.type,
   subid_cpu); 
 if (ret == 0) { 
 /* 
  * this might look inefficient, but the
 

Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Tomasz Chmielewski

On 2016-08-05 23:26, Chris Mason wrote:

On 08/05/2016 07:42 AM, Tomasz Chmielewski wrote:
I'm getting occasional (every few weeks) input/output errors on a 
btrfs

filesystem with compress-force=zlib, running on Amazon EC2, with 4.5.2
kernel:

# cat 2016-08-02/serverX/syslog.log
cat: 2016-08-02/serverX/syslog.log: Input/output error


How big is the file?  We had one bug with inline files that might have
caused this.


This one's tiny, 158137 bytes.



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error, nothing appended in dmesg

2016-08-05 Thread Chris Mason



On 08/05/2016 07:42 AM, Tomasz Chmielewski wrote:

I'm getting occasional (every few weeks) input/output errors on a btrfs
filesystem with compress-force=zlib, running on Amazon EC2, with 4.5.2
kernel:

# cat 2016-08-02/serverX/syslog.log
cat: 2016-08-02/serverX/syslog.log: Input/output error


How big is the file?  We had one bug with inline files that might have 
caused this.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

2016-08-05 Thread Lutz Vieweg

On 08/05/2016 02:12 PM, Austin S. Hemmelgarn wrote:

> If you stick to single disk

We do, all our btrfs filesystems reside on one single block device,
redundancy is provided by a DRBD layer below.

> don't use quota groups

We don't use any quotas.

> stick to reasonably sized filesystems (not more than a few TB)

We do, currently 4 TB max, because that's the only way to utilize
different physical storage devices for different filesystem instances
such that we can backup them in parallel within reasonable time.

> and avoid a couple of specific unconventional storage configurations below it

Configurations like what?

> The whole issue with
> databases is often a non-issue for desktop users in my experience

Well, try a "cat" on a sqlite file that has been used by some ordinary
desktop software (like a browser) for a year - and you'll experience
horrible performance, due to the extreme amount of fragments.

(Having to manually "de-fragment" a filesystem periodically is something
that I had considered a thing of the past when I started using BSD's hfs
instead of the Amiga FFS in the late 1980s... ;-)

> and if you think VM image
> performance is bad, you should really be looking at using real block storage 
instead of a file
> (seriously, this will usually get you a bigger performance boost than using 
ext4 or XFS
> over BTRFS as an underlying filesystem will).

Sure, assigning block devices to each VM would be even better, but
also much less convenient for operations. It's a feature here that any
user can start a new VM instance (without root privileges) at any
time, and that the images used by those VMs are part of the incremental
backup that stores only differences, not "whole files that have been changed".

>> We sure do - actually, the possibility to "run daily backups from a
>> snapshot while write performance remains acceptable" is the one and
>> only reason for me to use btrfs rather than xfs for those $HOME dirs.
>> In every other aspect (stability, performance, suitability for
>> storing VM-images or database-files) xfs wins for me.
>> And the btrfs advantage "file system based snapshot being more
>> performant than block device based snapshot" may fade away
>> with the replacement of magnetic disks with SSDs in the long run.
> I'm going to respond to the two parts of this separately:
> 1. As far as snapshot performance, you'd be surprised. I've got pretty good 
consumer grade SSD's
> that can do a sustained 250MB/s write speed, which means that to be as fast 
as a snapshot,
> the data set would have to be less than 25MB

No, I'm talking about LVM snapshots, which utilitze Copy-On-Write
on the block device level. Creating such an LVM snapshot is
as quick as creating a btrfs snapshot, regardless of the size.
The only significant draw-back of the LVM snapshot is that whenever
data is written to the filesystem, that causes copy operations from
one part of the (currently magnetic) storage to another part, and
that seriously hurts the write performance.

(Of course, it would not be a reasonable option to take a block device
snapshot by first copying all the data on it.)

> 2. As far as snapshots being the only advantage of BTRFS, that's just bogus.
> XFS does have metadata checksumming now, but that provides no protection for
> data, just metadata.

We check for bit-rot on the block device level, DRBD verifies the integrity
of the data by reading from both redundant storage devices and comparing the
checksums, periodically every week.

So far, we never encountered a single bit-rot error, even though the underlying
physical storage devices are "cheap SATA disks".

> XFS also doesn't have transparent compression support

I have no use for that. Disk space is relatively cheap, cheap enough
that we don't bother with RAID-5 or such, but use the "full redundancy"
provided by a shared-nothing DRBD setup.

> filesystems can't be shrunk

I enlarged XFS filesystems multiple times while in use, which worked well.
I never had to shrink a filesystem, and I cannot imagine how such a use case
could occur to me.

> and it stores no backups of any metadata except super-blocks.

Which is fine with me, as redundancy is provided on the block device level
by DRBD.

> While the compression and filesystem shrinking may not be needed in
> your use case, the data integrity features are almost certainly an advantage.

Btrfs sure has some nifty features, and I understand that for some
stuff like "subvolumes" or "deduplication" are important.

But a hundred great features cannot make up for a lack of stability,
therefore I would love to see those ENOSPC-related issues to
be resolved rather than more fancy features being built :-)

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

2016-08-05 Thread Austin S. Hemmelgarn

On 2016-08-05 06:56, Lutz Vieweg wrote:

On 08/04/2016 10:30 PM, Chris Murphy wrote:

Keep in mind the list is rather self-selecting for problems. People
who aren't having problems are unlikely to post their non-problems to
the list.


True, but the number of people inclined to post a bug report to
the list is also a lot smaller than the number of people who
experienced problems.

Personally, I know at least 2 Linux users who happened to
get a btrfs filesystem as part of upgrading to a newer Suse
distribution on their PC, and both of them experienced
trouble with their filesystems that caused them to re-install
without using btrfs. They weren't interested in what filesystem
they use enough to bother investigating what happened
in detail or to issue bug-reports.

I'm afraid that btrfs' reputation has already taken damage
from the combination of "early deployment as a root filesystem
to unsuspecting users" and "being at a development stage where
users are likely to experience trouble at some time".
FWIW, the 'early deployment' thing is an issue of the distributions 
themselves, and most people who have come to me personally complaining 
about BTRFS have understood this after I explained it to them.


As far as the rest, it's hit or miss whether you have issues.  I've been 
using BTRFS on all my personal systems since about 3.14, and have had 
zero issues with data loss or filesystem corruption (or horrible 
performance) since about 3.18 that were actually BTRFS issues (it's 
helped me ID a lot of marginal hardware though), and in fact, I had more 
issues trying to use ZFS for a year than I've had in the now multiple 
years of using BTRFS, and in the case of BTRFS, I was actually able to 
fix things.  I know quite a few people (and a number of big companies 
for that matter) who have been running BTRFS for longer and had fewer 
issues too.  The biggest issue is that the risks involved aren't well 
characterized, although most filesystems have that same issue.


If you stick to single disk or raid1 mode, don't use quota groups (which 
at least SUSE does by default now), stick to reasonably sized 
filesystems (not more than a few TB), and avoid a couple of specific 
unconventional storage configurations below it, BTRFS works fine.  The 
whole issue with databases is often a non-issue for desktop users in my 
experience, and if you think VM image performance is bad, you should 
really be looking at using real block storage instead of a file 
(seriously, this will usually get you a bigger performance boost than 
using ext4 or XFS over BTRFS as an underlying filesystem will).



c. Take some risk and use 4.8 rc1 once it's out. Just make sure to
keep backups.


We sure do - actually, the possibility to "run daily backups from a
snapshot while write performance remains acceptable" is the one and
only reason for me to use btrfs rather than xfs for those $HOME dirs.
In every other aspect (stability, performance, suitability for
storing VM-images or database-files) xfs wins for me.
And the btrfs advantage "file system based snapshot being more
performant than block device based snapshot" may fade away
with the replacement of magnetic disks with SSDs in the long run.

I'm going to respond to the two parts of this separately:
1. As far as snapshot performance, you'd be surprised. I've got pretty 
good consumer grade SSD's that can do a sustained 250MB/s write speed, 
which means that to be as fast as a snapshot, the data set would have to 
be less than 25MB (and that's being generous, snapshots usually take 
less than 0.1s to create on my system).  Where the turnover point occurs 
varies of course based on storage bandwidth, but I don't see it being 
very likely that SSD's will obsolete snapshotting any time soon.  Even 
if disks suddenly get the ability to run at full bandwidth of the link 
they're on, a SAS3 disk (12Gbit/s signaling, practical bandwidth of 
about 1GB/s) would have a turn over point of about 100MB, and a NVMe 
device on a PCIe 4.0 X16 link (3.151GB/s theoretical bandwidth) would 
have a turn over point of 3.1GB.  In theory, a high-end NVDIMM might be 
able to do better than a snapshot, but it probably couldn't get much 
faster right now than twice the speed of a PCIe 4.0 X16 link, which 
means that it would likely have a turn over point of about 6.2GB.  In 
comparison, it's not unusual to need a snapshot of a data set in excess 
of a terabyte in size.
2. As far as snapshots being the only advantage of BTRFS, that's just 
bogus. XFS does have metadata checksumming now, but that provides no 
protection for data, just metadata.  XFS also doesn't have transparent 
compression support, filesystems can't be shrunk, and it stores no 
backups of any metadata except super-blocks.  While the compression and 
filesystem shrinking may not be needed in your use case, the data 
integrity features are almost certainly an advantage.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body

Input/output error, nothing appended in dmesg

2016-08-05 Thread Tomasz Chmielewski
I'm getting occasional (every few weeks) input/output errors on a btrfs 
filesystem with compress-force=zlib, running on Amazon EC2, with 4.5.2 
kernel:


# cat 2016-08-02/serverX/syslog.log
cat: 2016-08-02/serverX/syslog.log: Input/output error


Strangely, nothing gets appended in dmesg:

# dmesg -c
#


The filesystem stores mostly remote syslog files (so, all text files, 
appended to).


Expected?



# btrfs fi show /var/log/remote/
Label: none  uuid: 5cec93a8-7894-41f6-94a4-9d9b58216dd4
Total devices 1 FS bytes used 146.55GiB
devid1 size 200.00GiB used 153.01GiB path /dev/xvdb


# btrfs fi df /var/log/remote/
Data, single: total=149.00GiB, used=144.50GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=4.01GiB, used=2.05GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



Tomasz Chmielewski
https://lxadm.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to stress test raid6 on 122 disk array

2016-08-05 Thread Austin S. Hemmelgarn

On 2016-08-04 17:12, Chris Murphy wrote:

On Thu, Aug 4, 2016 at 2:51 PM, Martin  wrote:

Thanks for the benchmark tools and tips on where the issues might be.

Is Fedora 24 rawhide preferred over ArchLinux?


I'm not sure what Arch does any differently to their kernels from
kernel.org kernels. But bugzilla.kernel.org offers a Mainline and
Fedora drop down for identifying the kernel source tree.
IIRC, they're pretty close to mainline kernels.  I don't think they have 
any patches in the filesystem or block layer code at least, but I may be 
wrong, it's been a long time since I looked at an Arch kernel.




If I want to compile a mainline kernel. Are there anything I need to tune?


Fedora kernels do not have these options set.

# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set

The sanity and integrity tests are both compile time and mount time
options, i.e. it has to be compiled enabled for the mount option to do
anything. I can't recall any thread where a developer asked a user to
set any of these options for testing though.
FWIW, I actually have the integrity checking code built in on most 
kernels I build.  I don't often use it, but it has near zero overhead 
when not enabled, and it's helped me track down lower-level storage 
configuration issues on occasion.




When I do the tests, how do I log the info you would like to see, if I
find a bug?


bugzilla.kernel.org for tracking, and then reference the URL for the
bug with a summary in an email to list is how I usually do it. The
main thing is going to be the exact reproduce steps. It's also better,
I think, to have complete dmesg (or journalctl -k) attached to the bug
report because not all problems are directly related to Btrfs, they
can have contributing factors elsewhere. And various MTAs, or more
commonly MUAs, have a tendancy to wrap such wide text as found in
kernel or journald messages.

Aside from kernel messages, the other general stuff you want to have is:
1. Kernel version and userspace tools version (`uname -a` and `btrfs 
--version`)
2. Any underlying storage configuration if it's not just plain a SSD/HDD 
or partitions (for example, usage of dm-crypt, LVM, mdadm, and similar 
things).
3. Output from `btrfs filesystem show` (this can be trimmed to the 
filesystem that's having the issue).
4. If you can still mount the filesystem, `btrfs filesystem df` output 
can be helpful.
5. If you can't mount the filesystem, output from `btrfs check` run 
without any options will usually be asked for.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Crash in btrfs_uuid_tree_iterate during mount

2016-08-05 Thread Nikolay Borisov
Hello, 

Recently I started getting the following crashes on some servers, 
running btrfs: 

[340435.480338] BTRFS info (device loop7): disk space caching is enabled
[340435.480509] BTRFS: has skinny extents
[340441.716174] BTRFS: checking UUID tree
[340441.912070] BUG: unable to handle kernel NULL pointer dereference at 
0098
[340441.912463] IP: [] btrfs_uuid_tree_iterate+0xf4/0x2d0 
[btrfs]
[340441.912823] PGD 0 
[340441.913035] Oops:  [#1] SMP 
[340441.913302] Modules linked in: 
[340441.916996] CPU: 10 PID: 24990 Comm: btrfs-uuid Tainted: PW  O
4.4.14-clouder1 #55
[340441.917287] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 
01/16/2015
[340441.917573] task: 8801b95c1b80 ti: 88034e504000 task.ti: 
88034e504000
[340441.917859] RIP: 0010:[]  [] 
btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
[340441.918212] RSP: 0018:88034e507e20  EFLAGS: 00010246
[340441.918382] RAX:  RBX: 1600 RCX: 
8800
[340441.918665] RDX: 0001 RSI: 8801e3abd140 RDI: 
88046f027f00
[340441.918952] RBP: 88034e507ea8 R08: 60fb80001760 R09: 
a07ac1de
[340441.919236] R10: e8d41760 R11: ea00078eaf40 R12: 
8801b98ab750
[340441.919521] R13: fffe R14: 8801e3abd140 R15: 
880049586000
[340441.919810] FS:  () GS:88047fd4() 
knlGS:
[340441.920097] CS:  0010 DS:  ES:  CR0: 80050033
[340441.920267] CR2: 0098 CR3: 01c0a000 CR4: 
000406e0
[340441.920554] Stack:
[340441.920717]  880049586000 8801b98ab750 3f7b00014fc0 
8803711dec08
[340441.921186]  a07d0c40 880332342000 0114 
1b7088046d7612f8
[340441.921655]  8cfb42689378e508 70157e0ade97f5d6 8c42689378e5081b 
15157e0ade97f5d6
[340441.922126] Call Trace:
[340441.922315]  [] ? find_live_mirror.isra.18+0xc0/0xc0 
[btrfs]
[340441.922614]  [] ? btrfs_uuid_scan_kthread+0x3c0/0x3c0 
[btrfs]
[340441.922917]  [] btrfs_uuid_rescan_kthread+0x1b/0x60 
[btrfs]
[340441.923197]  [] kthread+0xef/0x110
[340441.923363]  [] ? kthread_park+0x60/0x60
[340441.923531]  [] ret_from_fork+0x3f/0x70
[340441.923697]  [] ? kthread_park+0x60/0x60
[340441.923863] Code: 0f 86 a0 00 00 00 48 bb 00 00 00 00 00 16 00 00 41 8b 44 
24 40 48 b9 00 00 00 00 00 88 ff ff 8d 50 01 49 8b 04 24 41 89 54 24 40 <48> 03 
98 98 00 00 00 48 89 d8 48 c1 f8 06 48 c1 e0 0c 3b 54 08 
[340441.927296] RIP  [] btrfs_uuid_tree_iterate+0xf4/0x2d0 
[btrfs]
[340441.927641]  RSP 
[340441.927806] CR2: 0098


a081f774 is in the heavily inlined btrfs_next_item. Here
is the decoded instructions, right before the crash with annotations:

   0:   0f 86 a0 00 00 00   jbe0xa6
   6:   48 bb 00 00 00 00 00mov$0x1600,%rbx
   d:   16 00 00 
  10:   41 8b 44 24 40  mov0x40(%r12),%eax ; r12 is btrfs_path, eax 
points to first slot
  15:   48 b9 00 00 00 00 00mov$0x8800,%rcx
  1c:   88 ff ff 
  1f:   8d 50 01lea0x1(%rax),%edx ; incr slot
  22:   49 8b 04 24 mov(%r12),%rax ; load first extent_buffer 
in rax
  26:   41 89 54 24 40  mov%edx,0x40(%r12) ; save incremented slot
  2b:*  48 03 98 98 00 00 00add0x98(%rax),%rbx <-- trapping instruction 
; load the first page from the extent_buffer
  32:   48 89 d8mov%rbx,%rax
  35:   48 c1 f8 06 sar$0x6,%rax
  39:   48 c1 e0 0c shl$0xc,%rax
  3d:   3b  .byte 0x3b
  3e:   54  push   %rsp
  3f:   08  .byte 0x8

So as can be seen rax is zero and naturally dereferencing it is 
also zero. What's interesting is the content of the btrf_path:

struct btrfs_path {
  nodes = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  slots = {1, 0, 0, 0, 0, 0, 0, 0}, 
  locks = {0, 0, 0, 0, 0, 0, 0, 0}, 
  reada = 0, 
  lowest_level = 0, 
  search_for_split = 0, 
  keep_locks = 0, 
  skip_locking = 0, 
  leave_spinning = 0, 
  search_commit_root = 0, 
  need_commit_sem = 0, 
  skip_release_on_error = 0
}

Any ideas how come btrfs_path can be all zero, the one in
the first slot comes from the increment in btrfs_next_old_item.

Regards, 
Nikolay 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

2016-08-05 Thread Lutz Vieweg

On 08/04/2016 10:30 PM, Chris Murphy wrote:

Keep in mind the list is rather self-selecting for problems. People
who aren't having problems are unlikely to post their non-problems to
the list.


True, but the number of people inclined to post a bug report to
the list is also a lot smaller than the number of people who
experienced problems.

Personally, I know at least 2 Linux users who happened to
get a btrfs filesystem as part of upgrading to a newer Suse
distribution on their PC, and both of them experienced
trouble with their filesystems that caused them to re-install
without using btrfs. They weren't interested in what filesystem
they use enough to bother investigating what happened
in detail or to issue bug-reports.

I'm afraid that btrfs' reputation has already taken damage
from the combination of "early deployment as a root filesystem
to unsuspecting users" and "being at a development stage where
users are likely to experience trouble at some time".


c. Take some risk and use 4.8 rc1 once it's out. Just make sure to
keep backups.


We sure do - actually, the possibility to "run daily backups from a
snapshot while write performance remains acceptable" is the one and
only reason for me to use btrfs rather than xfs for those $HOME dirs.
In every other aspect (stability, performance, suitability for
storing VM-images or database-files) xfs wins for me.
And the btrfs advantage "file system based snapshot being more
performant than block device based snapshot" may fade away
with the replacement of magnetic disks with SSDs in the long run.


Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: check btree node's nritems

2016-08-05 Thread Holger Hoffstätte
On 08/05/16 11:24, Holger Hoffstätte wrote:
> On Wed, 03 Aug 2016 12:57:28 -0700, Liu Bo wrote:
> 
>> When btree node (level = 1) has nritems which equals to zero,
>> we can end up with panic due to insert_ptr()'s
>>
>> BUG_ON(slot > nritems);
>>
>> where slot is 1 and nritems is 0, as copy_for_split() calls
>> insert_ptr(.., path->slots[1] + 1, ...);
>>
>> A invalid value results in the whole mess, this adds the check
>> for btree's node nritems so that we stop reading block when
>> when something is wrong.
>>
>> Signed-off-by: Liu Bo 
>> ---
>>  fs/btrfs/disk-io.c | 17 +
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 37d1780..a5a22be 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -612,6 +612,20 @@ static noinline int check_leaf(struct btrfs_root *root,
>>  return 0;
>>  }
>>  
>> +static noinline int check_node(struct btrfs_root *root,
>> +   struct extent_buffer *node)
>> +{
>> +unsigned long nr = btrfs_header_nritems(node);
>> +
>> +if (nr <= 0 || nr >= BTRFS_NODEPTRS_PER_BLOCK(root)) {
>> +btrfs_crit(root->fs_info,
>> +   "corrupt node: block %llu root %llu nritems %lu\n",
> 
> I think the trailing \n can be dropped here, btrfs_crit() already provides
> a proper newline.

On top of that I get a whole bunch of false positives with this patch.
Files that are perfectly readable without it now error out, in which
case the logged nritems is always 493 - regardless of file or containing
subvolume. Something is fishy here.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to stress test raid6 on 122 disk array

2016-08-05 Thread Erkki Seppala
Martin  writes:

> The smallest disk of the 122 is 500GB. Is it possible to have btrfs
> see each disk as only e.g. 10GB? That way I can corrupt and resilver
> more disks over a month.

Well, at least you can easily partition the devices for that to happen.

However, I would also suggest that would it be more useful use of the
resource to run many arrays in parallel? Ie. one 6-device raid6, one
20-device raid6, and then perhaps use the rest of the devices for a very
large btrfs filesystem? Or if you have been using partitioning the large
btrfs volume can also be composed of all the 122 devices; in fact you
could even run multiple 122-device raid6s and use different kind of
testing on each. For performance testing you might only excert one of
the file systems at a time, though.

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: check btree node's nritems

2016-08-05 Thread Holger Hoffstätte
On Wed, 03 Aug 2016 12:57:28 -0700, Liu Bo wrote:

> When btree node (level = 1) has nritems which equals to zero,
> we can end up with panic due to insert_ptr()'s
> 
> BUG_ON(slot > nritems);
> 
> where slot is 1 and nritems is 0, as copy_for_split() calls
> insert_ptr(.., path->slots[1] + 1, ...);
> 
> A invalid value results in the whole mess, this adds the check
> for btree's node nritems so that we stop reading block when
> when something is wrong.
> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/disk-io.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 37d1780..a5a22be 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -612,6 +612,20 @@ static noinline int check_leaf(struct btrfs_root *root,
>   return 0;
>  }
>  
> +static noinline int check_node(struct btrfs_root *root,
> +struct extent_buffer *node)
> +{
> + unsigned long nr = btrfs_header_nritems(node);
> +
> + if (nr <= 0 || nr >= BTRFS_NODEPTRS_PER_BLOCK(root)) {
> + btrfs_crit(root->fs_info,
> +"corrupt node: block %llu root %llu nritems %lu\n",

I think the trailing \n can be dropped here, btrfs_crit() already provides
a proper newline.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] generic: test accurate shared extent reporting

2016-08-05 Thread Eryu Guan
On Fri, Aug 05, 2016 at 01:02:12AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 05, 2016 at 03:46:07PM +0800, Eryu Guan wrote:
> > On Fri, Aug 05, 2016 at 12:21:47AM -0700, Darrick J. Wong wrote:

> > > +_count_holes $testdir/file2
> > > +echo "file1 shared extents"
> > > +$XFS_IO_PROG -c 'fiemap -v' $testdir/file1 | awk '{print $5}' | grep 
> > > '0x.*[2367aAbBfF]...$' -c
> > 
> > Missing a command at the end?
> 
> Nope, it echoes the number of shared extents (that's what that awk and grep
> globule does), which /should/ be exactly 2.
> 
> (Unless I'm missing something?)

Ah, thanks! I saw "-c" at the end and thought it was part of xfs_io
command without looking at it carefully.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] generic: test accurate shared extent reporting

2016-08-05 Thread Darrick J. Wong
Ensure that we can create a file with a single extent, reflink two
blocks out of the middle of that extent, and the resulting fiemap
reports two shared extents, instead of lazily reporting the entire
huge extent as shared.

v2: add _supported_fs

Signed-off-by: Darrick J. Wong 
---
 tests/generic/929 |   90 +
 tests/generic/929.out |   17 +
 tests/generic/group   |1 +
 3 files changed, 108 insertions(+)
 create mode 100755 tests/generic/929
 create mode 100644 tests/generic/929.out

diff --git a/tests/generic/929 b/tests/generic/929
new file mode 100755
index 000..1871789
--- /dev/null
+++ b/tests/generic/929
@@ -0,0 +1,90 @@
+#! /bin/bash
+# FS QA Test No. 929
+#
+# Check that bmap/fiemap accurately report shared extents.
+#
+#---
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+   cd /
+   rm -rf $tmp.*
+   wait
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs generic
+_require_scratch_reflink
+_require_fiemap
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+testdir=$SCRATCH_MNT/test-$seq
+mkdir $testdir
+
+blocks=5
+blksz=65536
+sz=$((blocks * blksz))
+
+echo "Create the original files"
+$XFS_IO_PROG -f -c "falloc 0 $sz" $testdir/file1 >> $seqres.full
+_pwrite_byte 0x61 0 $sz $testdir/file1 >> $seqres.full
+_scratch_cycle_mount
+
+echo "file1 extents and holes"
+_count_extents $testdir/file1
+_count_holes $testdir/file1
+
+_reflink_range $testdir/file1 $blksz $testdir/file2 $((blksz * 3)) $blksz >> 
$seqres.full
+_reflink_range $testdir/file1 $((blksz * 3)) $testdir/file2 $blksz $blksz >> 
$seqres.full
+_scratch_cycle_mount
+
+echo "Compare files"
+md5sum $testdir/file1 | _filter_scratch
+md5sum $testdir/file2 | _filter_scratch
+
+echo "file1 extents and holes"
+_count_extents $testdir/file1
+_count_holes $testdir/file1
+echo "file2 extents and holes"
+_count_extents $testdir/file2
+_count_holes $testdir/file2
+echo "file1 shared extents"
+$XFS_IO_PROG -c 'fiemap -v' $testdir/file1 | awk '{print $5}' | grep -c 
'0x.*[2367aAbBfF]...$'
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/929.out b/tests/generic/929.out
new file mode 100644
index 000..e290f4c
--- /dev/null
+++ b/tests/generic/929.out
@@ -0,0 +1,17 @@
+QA output created by 929
+Format and mount
+Create the original files
+file1 extents and holes
+1
+0
+Compare files
+17af09af790a9b4c79cddf72f6b642cb  SCRATCH_MNT/test-929/file1
+79418df9c55ab7f58781cb7b9e7d5d91  SCRATCH_MNT/test-929/file2
+file1 extents and holes
+5
+0
+file2 extents and holes
+2
+2
+file1 shared extents
+2
diff --git a/tests/generic/group b/tests/generic/group
index 18b9775..732f6f6 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -375,3 +375,4 @@
 370 auto quick richacl
 927 auto quick clone
 928 auto quick clone dedupe
+929 auto quick clone
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] generic: test accurate shared extent reporting

2016-08-05 Thread Darrick J. Wong
On Fri, Aug 05, 2016 at 03:46:07PM +0800, Eryu Guan wrote:
> On Fri, Aug 05, 2016 at 12:21:47AM -0700, Darrick J. Wong wrote:
> > Ensure that we can create a file with a single extent, reflink two
> > blocks out of the middle of that extent, and the resulting fiemap
> > reports two shared extents, instead of lazily reporting the entire
> > huge extent as shared.
> > 
> > Signed-off-by: Darrick J. Wong 
> > ---
> >  tests/generic/929 |   89 
> > +
> >  tests/generic/929.out |   17 +
> >  tests/generic/group   |1 +
> >  3 files changed, 107 insertions(+)
> >  create mode 100755 tests/generic/929
> >  create mode 100644 tests/generic/929.out
> > 
> > diff --git a/tests/generic/929 b/tests/generic/929
> > new file mode 100755
> > index 000..9793be0
> > --- /dev/null
> > +++ b/tests/generic/929
> > @@ -0,0 +1,89 @@
> > +#! /bin/bash
> > +# FS QA Test No. 929
> > +#
> > +# Check that bmap/fiemap accurately report shared extents.
> > +#
> > +#---
> > +# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#---
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 7 15
> > +
> > +_cleanup()
> > +{
> > +   cd /
> > +   rm -rf $tmp.*
> > +   wait
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +. ./common/reflink
> > +
> > +# real QA test starts here
> > +_supported_os Linux
> 
> Need "_supported_fs generic"

Ok.

> > +_require_scratch_reflink
> > +_require_fiemap
> > +
> > +echo "Format and mount"
> > +_scratch_mkfs > $seqres.full 2>&1
> > +_scratch_mount >> $seqres.full 2>&1
> > +
> > +testdir=$SCRATCH_MNT/test-$seq
> > +mkdir $testdir
> > +
> > +blocks=5
> > +blksz=65536
> > +sz=$((blocks * blksz))
> > +
> > +echo "Create the original files"
> > +$XFS_IO_PROG -f -c "falloc 0 $sz" $testdir/file1 >> $seqres.full
> > +_pwrite_byte 0x61 0 $sz $testdir/file1 >> $seqres.full
> > +_scratch_cycle_mount
> > +
> > +echo "file1 extents and holes"
> > +_count_extents $testdir/file1
> > +_count_holes $testdir/file1
> > +
> > +_reflink_range $testdir/file1 $blksz $testdir/file2 $((blksz * 3)) $blksz 
> > >> $seqres.full
> > +_reflink_range $testdir/file1 $((blksz * 3)) $testdir/file2 $blksz $blksz 
> > >> $seqres.full
> > +_scratch_cycle_mount
> > +
> > +echo "Compare files"
> > +md5sum $testdir/file1 | _filter_scratch
> > +md5sum $testdir/file2 | _filter_scratch
> > +
> > +echo "file1 extents and holes"
> > +_count_extents $testdir/file1
> > +_count_holes $testdir/file1
> > +echo "file2 extents and holes"
> > +_count_extents $testdir/file2
> > +_count_holes $testdir/file2
> > +echo "file1 shared extents"
> > +$XFS_IO_PROG -c 'fiemap -v' $testdir/file1 | awk '{print $5}' | grep 
> > '0x.*[2367aAbBfF]...$' -c
> 
> Missing a command at the end?

Nope, it echoes the number of shared extents (that's what that awk and grep
globule does), which /should/ be exactly 2.

(Unless I'm missing something?)

--D

> 
> Thanks,
> Eryu
> 
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/929.out b/tests/generic/929.out
> > new file mode 100644
> > index 000..e290f4c
> > --- /dev/null
> > +++ b/tests/generic/929.out
> > @@ -0,0 +1,17 @@
> > +QA output created by 929
> > +Format and mount
> > +Create the original files
> > +file1 extents and holes
> > +1
> > +0
> > +Compare files
> > +17af09af790a9b4c79cddf72f6b642cb  SCRATCH_MNT/test-929/file1
> > +79418df9c55ab7f58781cb7b9e7d5d91  SCRATCH_MNT/test-929/file2
> > +file1 extents and holes
> > +5
> > +0
> > +file2 extents and holes
> > +2
> > +2
> > +file1 shared extents
> > +2
> > diff --git a/tests/generic/group b/tests/generic/group
> > index 18b9775..732f6f6 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -375,3 +375,4 @@
> >  370 auto quick richacl
> >  927 auto quick clone
> >  928 auto quick clone dedupe
> > +929 auto quick clone
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> 

BTRFS: Transaction aborted (ENOSPC)

2016-08-05 Thread Mordechay Kaganer
B.H.

Hello. I have a setup with 4 RAID10 arrays, 4 drives each (using md).
Device usage is as follows:

# btrfs device usage /storage/bkp1
/dev/md1, ID: 1
   Device size:10.92TiB
   Device slack:  0.00B
   Data,single:10.19TiB
   Metadata,RAID1:199.00GiB
   System,RAID1:8.00MiB
   Unallocated:   542.79GiB

/dev/md2, ID: 2
   Device size:10.92TiB
   Device slack:  0.00B
   Data,single:10.21TiB
   Metadata,RAID1:181.00GiB
   System,RAID1:8.00MiB
   Unallocated:   541.80GiB

/dev/md3, ID: 3
   Device size:10.92TiB
   Device slack:  0.00B
   Data,single:10.41TiB
   Metadata,RAID1: 65.00GiB
   Unallocated:   457.81GiB

/dev/md4, ID: 4
   Device size:10.92TiB
   Device slack:  0.00B
   Data,single: 9.89TiB
   Metadata,RAID1: 89.00GiB
   Unallocated:   959.81GiB

Mount options: compress=zlib,commit=60,noatime

This setup is used to store regular backups from 2 different sites
(each on different subvolume with regular snapshots). The backup is
done using rsync as the source storage is using xfs not btrfs. This
setup has been working excellently for about 7 months. Curently, it
has about 100 snapshots in total.
Recently, i've started to face problems with "transaction aborted"
messages and volume going read-only. This happens unexpectedly, after
several hours of rsync running.

As a precursor, it throws several warnings about tasks blocked for 120
seconds, this is probably connected to a long time required to commit
transaction.

After transaction abort, i reboot the server, then restart the backup
and it seems to continue OK until the next crash.

Scrub didn't find and errors on the volume. I'm unable to run btrfs
check as it consumes all of the RAM and crashes.

Any suggestions what's going wrong and how to fix this?

# uname -a
Linux yemot-4u 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.7

Thanks in advance!

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] generic: test accurate shared extent reporting

2016-08-05 Thread Eryu Guan
On Fri, Aug 05, 2016 at 12:21:47AM -0700, Darrick J. Wong wrote:
> Ensure that we can create a file with a single extent, reflink two
> blocks out of the middle of that extent, and the resulting fiemap
> reports two shared extents, instead of lazily reporting the entire
> huge extent as shared.
> 
> Signed-off-by: Darrick J. Wong 
> ---
>  tests/generic/929 |   89 
> +
>  tests/generic/929.out |   17 +
>  tests/generic/group   |1 +
>  3 files changed, 107 insertions(+)
>  create mode 100755 tests/generic/929
>  create mode 100644 tests/generic/929.out
> 
> diff --git a/tests/generic/929 b/tests/generic/929
> new file mode 100755
> index 000..9793be0
> --- /dev/null
> +++ b/tests/generic/929
> @@ -0,0 +1,89 @@
> +#! /bin/bash
> +# FS QA Test No. 929
> +#
> +# Check that bmap/fiemap accurately report shared extents.
> +#
> +#---
> +# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 7 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -rf $tmp.*
> + wait
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/reflink
> +
> +# real QA test starts here
> +_supported_os Linux

Need "_supported_fs generic"

> +_require_scratch_reflink
> +_require_fiemap
> +
> +echo "Format and mount"
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +testdir=$SCRATCH_MNT/test-$seq
> +mkdir $testdir
> +
> +blocks=5
> +blksz=65536
> +sz=$((blocks * blksz))
> +
> +echo "Create the original files"
> +$XFS_IO_PROG -f -c "falloc 0 $sz" $testdir/file1 >> $seqres.full
> +_pwrite_byte 0x61 0 $sz $testdir/file1 >> $seqres.full
> +_scratch_cycle_mount
> +
> +echo "file1 extents and holes"
> +_count_extents $testdir/file1
> +_count_holes $testdir/file1
> +
> +_reflink_range $testdir/file1 $blksz $testdir/file2 $((blksz * 3)) $blksz >> 
> $seqres.full
> +_reflink_range $testdir/file1 $((blksz * 3)) $testdir/file2 $blksz $blksz >> 
> $seqres.full
> +_scratch_cycle_mount
> +
> +echo "Compare files"
> +md5sum $testdir/file1 | _filter_scratch
> +md5sum $testdir/file2 | _filter_scratch
> +
> +echo "file1 extents and holes"
> +_count_extents $testdir/file1
> +_count_holes $testdir/file1
> +echo "file2 extents and holes"
> +_count_extents $testdir/file2
> +_count_holes $testdir/file2
> +echo "file1 shared extents"
> +$XFS_IO_PROG -c 'fiemap -v' $testdir/file1 | awk '{print $5}' | grep 
> '0x.*[2367aAbBfF]...$' -c

Missing a command at the end?

Thanks,
Eryu

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/929.out b/tests/generic/929.out
> new file mode 100644
> index 000..e290f4c
> --- /dev/null
> +++ b/tests/generic/929.out
> @@ -0,0 +1,17 @@
> +QA output created by 929
> +Format and mount
> +Create the original files
> +file1 extents and holes
> +1
> +0
> +Compare files
> +17af09af790a9b4c79cddf72f6b642cb  SCRATCH_MNT/test-929/file1
> +79418df9c55ab7f58781cb7b9e7d5d91  SCRATCH_MNT/test-929/file2
> +file1 extents and holes
> +5
> +0
> +file2 extents and holes
> +2
> +2
> +file1 shared extents
> +2
> diff --git a/tests/generic/group b/tests/generic/group
> index 18b9775..732f6f6 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -375,3 +375,4 @@
>  370 auto quick richacl
>  927 auto quick clone
>  928 auto quick clone dedupe
> +929 auto quick clone
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] generic: test accurate shared extent reporting

2016-08-05 Thread Darrick J. Wong
Ensure that we can create a file with a single extent, reflink two
blocks out of the middle of that extent, and the resulting fiemap
reports two shared extents, instead of lazily reporting the entire
huge extent as shared.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/929 |   89 +
 tests/generic/929.out |   17 +
 tests/generic/group   |1 +
 3 files changed, 107 insertions(+)
 create mode 100755 tests/generic/929
 create mode 100644 tests/generic/929.out

diff --git a/tests/generic/929 b/tests/generic/929
new file mode 100755
index 000..9793be0
--- /dev/null
+++ b/tests/generic/929
@@ -0,0 +1,89 @@
+#! /bin/bash
+# FS QA Test No. 929
+#
+# Check that bmap/fiemap accurately report shared extents.
+#
+#---
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+   cd /
+   rm -rf $tmp.*
+   wait
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_fiemap
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+testdir=$SCRATCH_MNT/test-$seq
+mkdir $testdir
+
+blocks=5
+blksz=65536
+sz=$((blocks * blksz))
+
+echo "Create the original files"
+$XFS_IO_PROG -f -c "falloc 0 $sz" $testdir/file1 >> $seqres.full
+_pwrite_byte 0x61 0 $sz $testdir/file1 >> $seqres.full
+_scratch_cycle_mount
+
+echo "file1 extents and holes"
+_count_extents $testdir/file1
+_count_holes $testdir/file1
+
+_reflink_range $testdir/file1 $blksz $testdir/file2 $((blksz * 3)) $blksz >> 
$seqres.full
+_reflink_range $testdir/file1 $((blksz * 3)) $testdir/file2 $blksz $blksz >> 
$seqres.full
+_scratch_cycle_mount
+
+echo "Compare files"
+md5sum $testdir/file1 | _filter_scratch
+md5sum $testdir/file2 | _filter_scratch
+
+echo "file1 extents and holes"
+_count_extents $testdir/file1
+_count_holes $testdir/file1
+echo "file2 extents and holes"
+_count_extents $testdir/file2
+_count_holes $testdir/file2
+echo "file1 shared extents"
+$XFS_IO_PROG -c 'fiemap -v' $testdir/file1 | awk '{print $5}' | grep 
'0x.*[2367aAbBfF]...$' -c
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/929.out b/tests/generic/929.out
new file mode 100644
index 000..e290f4c
--- /dev/null
+++ b/tests/generic/929.out
@@ -0,0 +1,17 @@
+QA output created by 929
+Format and mount
+Create the original files
+file1 extents and holes
+1
+0
+Compare files
+17af09af790a9b4c79cddf72f6b642cb  SCRATCH_MNT/test-929/file1
+79418df9c55ab7f58781cb7b9e7d5d91  SCRATCH_MNT/test-929/file2
+file1 extents and holes
+5
+0
+file2 extents and holes
+2
+2
+file1 shared extents
+2
diff --git a/tests/generic/group b/tests/generic/group
index 18b9775..732f6f6 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -375,3 +375,4 @@
 370 auto quick richacl
 927 auto quick clone
 928 auto quick clone dedupe
+929 auto quick clone
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html