Re: corruption: yet another one after deleting a ro snapshot

2017-01-15 Thread Qu Wenruo



At 01/16/2017 12:53 PM, Christoph Anton Mitterer wrote:

On Mon, 2017-01-16 at 11:16 +0800, Qu Wenruo wrote:

It would be very nice if you could paste the output of
"btrfs-debug-tree -t extent " and "btrfs-debug-tree -t
root
"

That would help us to fix the bug in lowmem mode.

I'll send you the link in a private mail ... if any other developer
needs it, just ask me or Qu for the link.



BTW, if it's possible, would you please try to run btrfs-check
before
your next deletion on ro-snapshots?

You mean in general, when I do my next runs of backups respectively
snaphot-cleanup?
Sure, actually I did this this time as well (in original mode, though),
and no error was found.

For what should I look out?


Nothing special, just in case the fs is already corrupted.





Not really needed, as all corruption happens on tree block of root
6403,
it means, if it's a real corruption, it will only disturb you(make
fs
suddenly RO) when you try to modify something(leaves under that node)
in
that subvolume.

Ah... and it couldn't cause corruption to the same data blocks if they
were used by another snaphshot?


No, it won't cause corruption to any data block, no matter shared or not.






And I highly suspect if the subvolume 6403 is the RO snapshot you
just removed.

I guess there is no way to find out whether it was that snapshot, is
there?


"btrfs subvolume list" could do it."
If no output of 6403, then it's removed.

And "btrfs-debug-tree -t root" also has info for it.
A deleted subvolume won't have corresponding ROOT_BACKREF, and its 
ROOT_ITEM should have none-zero drop key.


And in your case, your subvolume is indeed undergoing deletion.


Also checked the extent tree, the result is a little interesting:
1) Most tree backref are good.
   In fact, 3 of all the 4 errors reported are tree blocks shared by
   other subvolumes, like:

item 77 key (5120 METADATA_ITEM 1) itemoff 13070 itemsize 42
extent refs 2 gen 11 flags TREE_BLOCK|FULL_BACKREF
tree block skinny level 1
tree block backref root 7285
tree block backref root 6572

This means the tree blocks are share by 2 other subvolumes,
7285 and 6572.

7285 subvolume is completely OK, while 6572 is also undergoing subvolume 
deletion(while real deletion doesn't start yet).


And considering the generation, I assume 6403 is deleted before 6572.

So we're almost clear that, btrfs (maybe only btrfsck) doesn't handle it 
well if there are multiple subvolume undergoing deletion.


This gives us enough info to try to build such image by ourselves now.
(Although still quite hard to do though).

Also that also explained why btrfs-progs test image 021 won't trigger 
the problem.

As it's only one subvolume undergoing deletion and no full backref extent.



And for the scary lowmem mode, it's false alert.

I manually checked the used size of a block group and it's OK.

BTW, most of your block groups are completely used, without any space.
But interestingly, mostly data extent size are just 512K, larger than
compressed extent upper limit, but still quite small.

In other words, your fs seems to be fragmented considering the upper 
limit of a data extent is 128M.

(Or your case is quite common in common world?)






If 'btrfs subvolume list' can't find that subvolume, then I think
it's
mostly OK for you to RW mount and wait the subvolume to be fully
deleted.

And I think you have already provided enough data for us to, at
least
try to, reproduce the bug.


I won't do the remount,rw this night, so you have the rest of your
day/night time to think of anything further I should test or provide
you with from that fs... then it will be "gone" (in the sense of
mounted RW).
Just give your veto if I should wait :)


At least from extent and root tree dump, I found nothing wrong.

It's still possible that some full backref needs to be checked from 
subvolume tree (consdiering your fs size, not really practical) and can 
be wrong, but the possibility is quite low.

And in that case, there should be more than 4 extent tree errors reported.

So you are mostly OK to mount it rw any time you want, and I have 
already downloaded the raw data.


Hard part is remaining for us developers to build such small image to 
reproduce your situation then.


Thanks,
Qu



Thanks,
Chris.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption: yet another one after deleting a ro snapshot

2017-01-15 Thread Christoph Anton Mitterer
On Mon, 2017-01-16 at 11:16 +0800, Qu Wenruo wrote:
> It would be very nice if you could paste the output of
> "btrfs-debug-tree -t extent " and "btrfs-debug-tree -t
> root 
> "
> 
> That would help us to fix the bug in lowmem mode.
I'll send you the link in a private mail ... if any other developer
needs it, just ask me or Qu for the link.


> BTW, if it's possible, would you please try to run btrfs-check
> before 
> your next deletion on ro-snapshots?
You mean in general, when I do my next runs of backups respectively
snaphot-cleanup?
Sure, actually I did this this time as well (in original mode, though),
and no error was found.

For what should I look out?


> Not really needed, as all corruption happens on tree block of root
> 6403,
> it means, if it's a real corruption, it will only disturb you(make
> fs 
> suddenly RO) when you try to modify something(leaves under that node)
> in 
> that subvolume.
Ah... and it couldn't cause corruption to the same data blocks if they
were used by another snaphshot?



> And I highly suspect if the subvolume 6403 is the RO snapshot you
> just removed.
I guess there is no way to find out whether it was that snapshot, is
there?



> If 'btrfs subvolume list' can't find that subvolume, then I think
> it's 
> mostly OK for you to RW mount and wait the subvolume to be fully
> deleted.
>
> And I think you have already provided enough data for us to, at
> least 
> try to, reproduce the bug.

I won't do the remount,rw this night, so you have the rest of your
day/night time to think of anything further I should test or provide
you with from that fs... then it will be "gone" (in the sense of
mounted RW).
Just give your veto if I should wait :)


Thanks,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] fstests: generic: splitted large dio write could trigger assertion on btrfs

2017-01-15 Thread Eryu Guan
On Thu, Jan 12, 2017 at 02:22:06PM -0800, Liu Bo wrote:
> On btrfs, if a large dio write (>=128MB) got splitted, the outstanding_extents
> assertion would complain.  Note that CONFIG_BTRFS_ASSERT is required.
> 
> Regression test for
>   Btrfs: adjust outstanding_extents counter properly when dio write is split
> 
> Signed-off-by: Liu Bo 
> ---
> I didn't figure out how to check if CONFIG_BTRFS_ASSERT is enabled, but since
> there is no btrfs specific stuff in the test case, it might be better to not
> have such a _require check b/c it doesn't make sense to other FS.
> 
>  tests/generic/392 | 75 
> +++
>  tests/generic/392.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 78 insertions(+)
>  create mode 100755 tests/generic/392
>  create mode 100644 tests/generic/392.out

There're over 400 generic tests now, seems your repo hasn't been updated
for a long time :)

> 
> diff --git a/tests/generic/392 b/tests/generic/392
> new file mode 100755
> index 000..4d88c44
> --- /dev/null
> +++ b/tests/generic/392
> @@ -0,0 +1,75 @@
> +#! /bin/bash
> +# FS QA Test generic/392
> +#
> +# If a larger dio write (size >= 128M) got splitted, the assertion in endio
> +# would complain (CONFIG_BTRFS_ASSERT is required).
> +#
> +# Regression test for
> +#   Btrfs: adjust outstanding_extents counter properly when dio write is 
> split
> +#
> +#---
> +# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_odirect
> +
> +# 2G / 1K
> +fsblock=$((1 << 21))
> +fssize=$((1 << 31))
> +_require_fs_space $SCRATCH_MNT $fsblock

You should mkfs & mount $SCRATCH_DEV first before _require_fs_space,
otherwise it's testing against the wrong fs.

> +
> +_scratch_mkfs_sized $fssize >> $seqres.full 2>&1

_scratch_mkfs_sized also make sure SCRATCH_DEV is big enough to make the
filesystem, in this test _require_fs_space and _scratch_mkfs_sized both
could do the work, only one is needed.

But not all filesystems have _scratch_mkfs_sized support, I'd prefer
using _require_fs_space.

Thanks,
Eryu

> +_scratch_mount >> $seqres.full 2>&1
> +
> +echo "Silence is golden"
> +
> +blocksize=$(( (128 + 1) * 2 * 1024 * 1024))
> +$XFS_IO_PROG -f -d -c "pwrite -b ${blocksize} 0 ${blocksize}" 
> $SCRATCH_MNT/testfile.$seq >> $seqres.full 2>&1
> +
> +_scratch_unmount
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/392.out b/tests/generic/392.out
> new file mode 100644
> index 000..665233c
> --- /dev/null
> +++ b/tests/generic/392.out
> @@ -0,0 +1,2 @@
> +QA output created by 392
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index 2c16bd1..1631933 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -394,3 +394,4 @@
>  389 auto quick acl
>  390 auto freeze stress dangerous
>  391 auto quick rw
> +392 auto quick dangerous
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption: yet another one after deleting a ro snapshot

2017-01-15 Thread Qu Wenruo



At 01/16/2017 10:56 AM, Christoph Anton Mitterer wrote:

On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote:

So the fs is REALLY corrupted.

*sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh*
another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually-
cause-data-loss...)


BTW, lowmem mode seems to have a new false alert when checking the
block
group item.


Anything you want to check me there?


It would be very nice if you could paste the output of
"btrfs-debug-tree -t extent " and "btrfs-debug-tree -t root 
"


That would help us to fix the bug in lowmem mode.





Did you have any "lightweight" method to reproduce the bug?

Na, not at all... as I've said this already happened to me once before,
and in both cases I was cleaning up old ro-snapshots.

At least in the current case the fs was only ever filled via
send/receive (well apart from minor mkdirs or so)... so there shouldn't
have been any "extreme ways" of using it.


Since it's mostly populated by receive, yes, receive is completely sane, 
since it's done purely in user-space.


So if we have any way to reproduce it, it won't involve anything special.

BTW, if it's possible, would you please try to run btrfs-check before 
your next deletion on ro-snapshots?




I think (but not sure), that this was also the case on the other
occasion that happened to me with a different fs (i.e. I think it was
also a backup 8TB disk).



For example, on a 1G btrfs fs with moderate operations, for example
15min or so, to reproduce the bug?

Well I could try to produce it, but I guess you'd have far better means
to do so.

As I've said I was mostly doing send (with -p) | receive to do
incremental backups... and after a while I was cleaning up the old
snapshots on the backup fs.
Of course the snapshot subvols are pretty huge.. as I've said close to
8TB (7.5 or so)... everything from quite big files (4GB) to very small,
smylinks (no device/sockets/fifos)... perhaps some hardlinks...
Some refcopied files. The whole fs has compression enabled.



Shall I rw-mount the fs and do sync and wait and retry? Or is there
anything else that you want me to try before in order to get the
kernel
bug (if any) or btrfs-progs bug nailed down?


Personally speaking, rw mount would help, to verify if it's just a
bug
that will disappear after the deletion is done.

Well but than we might loose any chance to further track it down.

And even if it would go away, it would still at least be a bug in terms
of fsck false positive if not more (in the sense of... corruptions
may happen if some affect parts of the fs are used while not cleaned up
again).



But considering the size of your fs, it may not be a good idea as we
don't have reliable method to recover/rebuild extent tree yet.


So what do you effectively want now?
Wait and try something else?
RW mount and recheck to see whether it goes away with that? (And even
if, should I rather re-create/populate the fs from scratch just to be
sure?

What I can also offer in addition... as mentioned some times
previously, I do have full lists of the reg-files/dirs/symlinks as well
as SHA512 sums of each of the reg-files, as they are expected to be on
the fs respectively the snapshot.
So I can offer to do a full verification pass of these, to see whether
anything is missing or (file)data actually corrupted.


Not really needed, as all corruption happens on tree block of root 6403,
it means, if it's a real corruption, it will only disturb you(make fs 
suddenly RO) when you try to modify something(leaves under that node) in 
that subvolume.


At least data is good.

And I highly suspect if the subvolume 6403 is the RO snapshot you just 
removed.


If 'btrfs subvolume list' can't find that subvolume, then I think it's 
mostly OK for you to RW mount and wait the subvolume to be fully deleted.


And I think you have already provided enough data for us to, at least 
try to, reproduce the bug.


Thanks,
Qu



Of course that will take a while, and even if everything verifies, I'm
still not really sure whether I'd trust that fs anymore ;-)


Cheers,
Chris.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption: yet another one after deleting a ro snapshot

2017-01-15 Thread Christoph Anton Mitterer
On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote:
> So the fs is REALLY corrupted.
*sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh*
another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually-
cause-data-loss...)

> BTW, lowmem mode seems to have a new false alert when checking the
> block 
> group item.

Anything you want to check me there?


> Did you have any "lightweight" method to reproduce the bug?
Na, not at all... as I've said this already happened to me once before,
and in both cases I was cleaning up old ro-snapshots.

At least in the current case the fs was only ever filled via
send/receive (well apart from minor mkdirs or so)... so there shouldn't
have been any "extreme ways" of using it.

I think (but not sure), that this was also the case on the other
occasion that happened to me with a different fs (i.e. I think it was
also a backup 8TB disk).


> For example, on a 1G btrfs fs with moderate operations, for example 
> 15min or so, to reproduce the bug?
Well I could try to produce it, but I guess you'd have far better means
to do so.

As I've said I was mostly doing send (with -p) | receive to do
incremental backups... and after a while I was cleaning up the old
snapshots on the backup fs.
Of course the snapshot subvols are pretty huge.. as I've said close to
8TB (7.5 or so)... everything from quite big files (4GB) to very small,
smylinks (no device/sockets/fifos)... perhaps some hardlinks...
Some refcopied files. The whole fs has compression enabled.


> > Shall I rw-mount the fs and do sync and wait and retry? Or is there
> > anything else that you want me to try before in order to get the
> > kernel
> > bug (if any) or btrfs-progs bug nailed down?
> 
> Personally speaking, rw mount would help, to verify if it's just a
> bug 
> that will disappear after the deletion is done.
Well but than we might loose any chance to further track it down.

And even if it would go away, it would still at least be a bug in terms
of fsck false positive if not more (in the sense of... corruptions
may happen if some affect parts of the fs are used while not cleaned up
again).


> But considering the size of your fs, it may not be a good idea as we 
> don't have reliable method to recover/rebuild extent tree yet.

So what do you effectively want now?
Wait and try something else?
RW mount and recheck to see whether it goes away with that? (And even
if, should I rather re-create/populate the fs from scratch just to be
sure?

What I can also offer in addition... as mentioned some times
previously, I do have full lists of the reg-files/dirs/symlinks as well
as SHA512 sums of each of the reg-files, as they are expected to be on
the fs respectively the snapshot.
So I can offer to do a full verification pass of these, to see whether
anything is missing or (file)data actually corrupted.

Of course that will take a while, and even if everything verifies, I'm
still not really sure whether I'd trust that fs anymore ;-)


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


[PATCH] btrfs: raid56: Remove unused variant in lock_stripe_add

2017-01-15 Thread Qu Wenruo
Variant 'walk' in lock_stripe_add() is never used.
Remove it.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/raid56.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 453eefdcb591..b8ffd9ea7499 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -693,11 +693,9 @@ static noinline int lock_stripe_add(struct btrfs_raid_bio 
*rbio)
struct btrfs_raid_bio *freeit = NULL;
struct btrfs_raid_bio *cache_drop = NULL;
int ret = 0;
-   int walk = 0;
 
spin_lock_irqsave(>lock, flags);
list_for_each_entry(cur, >hash_list, hash_list) {
-   walk++;
if (cur->bbio->raid_map[0] == rbio->bbio->raid_map[0]) {
spin_lock(>bio_list_lock);
 
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3 for-4.10] RAID56 scrub fixes

2017-01-15 Thread Qu Wenruo

Hi David, Chris,

Any comment on this patchset?

Although we don't recommend user to use RAID5/6 for now, but this 
patchset still fixes 2 quite important bugs for RAID5/6.


And current so far the test shows no regression.
(Although test cases like btrfs/011 and btrfs/069 is still causing 
problem, as v4.10-rc without this patch will also cause problems).


And the patchset is relatively small enough for an rc.

Thanks,
Qu

At 12/12/2016 05:38 PM, Qu Wenruo wrote:

Can be feteched from github:
https://github.com/adam900710/linux.git raid56_fixes

Fixes 2 scrub bugs:
1) Scrub recover correct data, but wrong parity
2) Scrub report wrong csum error number, or even unrecoverable error

The patches are still undergoing xfstests, but currect for-linus-4.10 is already
causing deadlock for btrfs/011, even without the patches.

So I'd remove btrfs/011 and continue the test, even these test cases won't 
trigger
real recover code.

But the current internal test cases are quite good so far.
I'll test them for several extra loop, and submit the internal test for 
reference.
(Since it's not suitable for xfstest, so I'd only submit the test script, which
needs manually to probe chunk layout)

Qu Wenruo (3):
  btrfs: scrub: Introduce full stripe lock for RAID56
  btrfs: scrub: Fix RAID56 recovery race conditiong
  btrfs: raid56: Use correct stolen pages to calculate P/Q

 fs/btrfs/ctree.h   |   4 ++
 fs/btrfs/extent-tree.c |   3 +
 fs/btrfs/raid56.c  |  62 ++--
 fs/btrfs/scrub.c   | 192 +
 4 files changed, 257 insertions(+), 4 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption: yet another one after deleting a ro snapshot

2017-01-15 Thread Qu Wenruo



At 01/16/2017 01:04 AM, Christoph Anton Mitterer wrote:

On Thu, 2017-01-12 at 10:38 +0800, Qu Wenruo wrote:

IIRC, RO mount won't continue background deletion.

I see.



Would you please try 4.9 btrfs-progs?


Done now, see results (lowmem and original mode) below:

# btrfs version
btrfs-progs v4.9

# btrfs check /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ref mismatch on [37765120 16384] extent item 0, found 1
Backref 37765120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [37765120 16384]
owner ref check failed [37765120 16384]
ref mismatch on [5120 16384] extent item 0, found 1
Backref 5120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5120 16384]
owner ref check failed [5120 16384]
ref mismatch on [78135296 16384] extent item 0, found 1
Backref 78135296 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [78135296 16384]
owner ref check failed [78135296 16384]
ref mismatch on [5960381235200 16384] extent item 0, found 1
Backref 5960381235200 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5960381235200 16384]
checking free space cache
checking fs roots
checking csums
checking root refs
found 7483995824128 bytes used err is 0
total csum bytes: 7296183880
total tree bytes: 10875944960
total fs tree bytes: 2035286016
total extent tree bytes: 1015988224
btree space waste bytes: 920641324
file data blocks allocated: 8267656339456
 referenced 8389440876544
0


# btrfs check --mode=lowmem /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items 
used 0

delete subvolume bad key order

2017-01-15 Thread York-Simon Johannsen

Hello,

i have the message 'bad key order' and as you can see is the message of 
the debug very creepy. The RAM is ECC based and i have already checked 
this, no errors here. A 'btrfs check --repair' unfortunately did not 
help. What else can I do?


The system builds as follows:

sda ---\
sdb --\ \
sdc md0(Raid5)-vg0(Stripe)--btrfs2(lv)--btrfs
sdd --/


Log:
http://pastebin.com/Pu9ENpUM

Overview:
http://pastebin.com/PnddKQYz

Debug:
http://pastebin.com/qmR30AjJ

Greeting York

0xD987052F.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


[ISSUE] uncorrectable errors on Raid1

2017-01-15 Thread randomtech...@laposte.net
Hello /all, 

I have some concerns about the raid 1 of BTRFS. I have encountered 114 
uncorrectable errors on the directory hosting my 'seafile-data'. Seafile is a 
software to backup the data. My 2 hard drives seems to be fined. SMARTCTL 
reports do not identify any badlocks (Reallocated_Event_Count or 
Current_Pending_Sector). 
How can I have uncorrectable errors since BTRFS is assuring data integrity ? 
How did my data got corrupted ? What can I do to ensure that it does not happen 
again ? 

Sincerely, 


You can find below all the useful information I can think of. If you need more, 
let me know. 

sudo btrfs scrub status /mnt 
scrub status for 89f6f57e-90d9-46ac-1132-144e6ac150e4 
scrub started at Sat Jan 14 17:09:36 2017 and finished after 2207 seconds 
total bytes scrubbed: 598.03GiB with 114 errors 
error details: csum=114 
corrected errors: 0, uncorrectable errors: 114, unverified errors: 0 


if I look, at the dmesg log , I can that both logical block seems to be 
corrupted. 
[ 1047.312852] BTRFS: bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 49, gen 
0 
[ 1047.352631] BTRFS: unable to fixup (regular) error at logical 429848649728 
on dev /dev/sde1 
[ 1062.667080] BTRFS: checksum error at logical 441348554752 on dev /dev/sdd1, 
sector 195114560, root 5, inode 964364, offset 819200, length 4096, links 1 
(path: 
seafile-data/storage/blocks/bd71e3e1-95bd-40fc-b6db-55c4ea9467c1/30/bfa04bb182ff8050fe4a0f357da7df335e7511)
 
[ 1062.667092] BTRFS: bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 18, gen 
0 
[ 1062.710999] BTRFS: unable to fixup (regular) error at logical 441348554752 
on dev /dev/sdd1 
[ 1074.536137] BTRFS: checksum error at logical 441348554752 on dev /dev/sde1, 
sector 195075648, root 5, inode 964364, offset 819200, length 4096, links 1 
(path: 
seafile-data/storage/blocks/bd71e3e1-95bd-40fc-b6db-55c4ea9467c1/30/bfa04bb182ff8050fe4a0f357da7df335e7511)
 


sudo btrfs inspect-internal logical-resolve 441348554752 -v /mnt 
ioctl ret=0, total_size=4096, bytes_left=4056, bytes_missing=0, cnt=3, missed=0 
ioctl ret=0, bytes_left=3965, bytes_missing=0, cnt=1, missed=0 
/vault/seafile-data/storage/blocks/bd71e3e1-95bd-40fc-b6db-55c4ea9467c1/30/bfa04bb182ff8050fe4a0f357da7df335e7511
 


If I attempt to read the corresponding file, I have an " Input/output error ". 


Here is my Raid1 configuration: 

sudo btrfs fi show /mnt 
Label: none uuid: 91f6f57e-23d7-46ac-8056-144e6ac150e4 
Total devices 2 FS bytes used 299.02GiB 
devid 1 size 2.73TiB used 301.03GiB path /dev/sdd1 
devid 2 size 2.73TiB used 301.01GiB path /dev/sde1 

btrfs-progs v3.19.1 

sudo btrfs fi df /mnt 
Data, RAID1: total=299.00GiB, used=298.15GiB 
Data, single: total=8.00MiB, used=0.00B 
System, RAID1: total=8.00MiB, used=64.00KiB 
System, single: total=4.00MiB, used=0.00B 
Metadata, RAID1: total=2.00GiB, used=887.55MiB 
Metadata, single: total=8.00MiB, used=0.00B 
GlobalReserve, single: total=304.00MiB, used=0.00B 


sudo btrfs fi us /mnt 
Overall: 
Device size: 5.46TiB 
Device allocated: 602.04GiB 
Device unallocated: 4.87TiB 
Device missing: 0.00B 
Used: 598.04GiB 
Free (estimated): 2.44TiB (min: 2.44TiB) 
Data ratio: 2.00 
Metadata ratio: 2.00 
Global reserve: 304.00MiB (used: 0.00B) 


Data,single: Size:8.00MiB, Used:0.00B 
/dev/sdd1 8.00MiB 


Data,RAID1: Size:299.00GiB, Used:298.15GiB 
/dev/sdd1 299.00GiB 
/dev/sde1 299.00GiB 


Metadata,single: Size:8.00MiB, Used:0.00B 
/dev/sdd1 8.00MiB 


Metadata,RAID1: Size:2.00GiB, Used:887.55MiB 
/dev/sdd1 2.00GiB 
/dev/sde1 2.00GiB 


System,single: Size:4.00MiB, Used:0.00B 
/dev/sdd1 4.00MiB 


System,RAID1: Size:8.00MiB, Used:64.00KiB 
/dev/sdd1 8.00MiB 
/dev/sde1 8.00MiB 


Unallocated: 
/dev/sdd1 2.43TiB 
/dev/sde1 2.43TiB 


btrfs --version 
btrfs-progs v3.19.1 


sudo smartctl -a /dev/sde 
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.28.3.el7.x86_64] (local 
build) 
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org 
=== START OF INFORMATION SECTION === 
Model Family: Western Digital Red (AF) 
Device Model: WDC WD30EFRX-68EUZN0 
Serial Number: WD-WCC4N1003742 
LU WWN Device Id: 5 0014ee 25f64a417 
Firmware Version: 80.00A80 
User Capacity: 3 000 592 982 016 bytes [3,00 TB] 
Sector Sizes: 512 bytes logical, 4096 bytes physical 
Rotation Rate: 5400 rpm 
Device is: In smartctl database [for details use: -P show] 
ATA Version is: ACS-2 (minor revision not indicated) 
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) 
Local Time is: Sun Jan 15 16:46:37 2017 CET 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 


General SMART Values: 
Offline data collection status: (0x00) Offline data collection activity 
was never started. 
Auto Offline Data Collection: Disabled. 
Self-test execution status: ( 0) The previous self-test routine completed 
without error or no self-test has ever 
been run. 
Total 

Re: corruption: yet another one after deleting a ro snapshot

2017-01-15 Thread Christoph Anton Mitterer
On Thu, 2017-01-12 at 10:38 +0800, Qu Wenruo wrote:
> IIRC, RO mount won't continue background deletion.
I see.


> Would you please try 4.9 btrfs-progs?

Done now, see results (lowmem and original mode) below:

# btrfs version
btrfs-progs v4.9

# btrfs check /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ref mismatch on [37765120 16384] extent item 0, found 1
Backref 37765120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [37765120 16384]
owner ref check failed [37765120 16384]
ref mismatch on [5120 16384] extent item 0, found 1
Backref 5120 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5120 16384]
owner ref check failed [5120 16384]
ref mismatch on [78135296 16384] extent item 0, found 1
Backref 78135296 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [78135296 16384]
owner ref check failed [78135296 16384]
ref mismatch on [5960381235200 16384] extent item 0, found 1
Backref 5960381235200 parent 6403 root 6403 not found in extent tree
backpointer mismatch on [5960381235200 16384]
checking free space cache
checking fs roots
checking csums
checking root refs
found 7483995824128 bytes used err is 0
total csum bytes: 7296183880
total tree bytes: 10875944960
total fs tree bytes: 2035286016
total extent tree bytes: 1015988224
btree space waste bytes: 920641324
file data blocks allocated: 8267656339456
 referenced 8389440876544
0


# btrfs check --mode=lowmem /dev/nbd0 ; echo $?
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6997067956224 1073741824] used 1073741824 

[LSF/MM TOPIC] [LSF/MM ATTEND] BTRFS Encryption

2017-01-15 Thread Anand Jain


 I am working on BTRFS Encryption stage 2 design [1], which circles
 around the data center solution requisites. I shall be presenting
 an overview of the proposed design, so to obtain the constructive
 feedback and comments. And, I hope this will help to finalize on the
 design before it is taken up for the implementation.

 Though the current experiment is only with in BTRFS, the
 encryption-method part of the design should be part of fs/crypto,
 when it is appropriate.

 The presentation and the review will be in two parts, the
 encryption-method part, in which the experts from the fs/crypto may
 like to participate. And the btrfs part in which the btrfs experts
 may like to provide their comments on the proposed design in
 specific to the btrfs.


 [1] (Working draft, I hope to put more details into it before the LSF).

https://docs.google.com/document/d/1jWB3lyY2PF5CSAzcOnR8Yzh_45xU3Ub9bf3zKoMruWs/edit?usp=sharing



Thanks, Anand


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html