Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Qu Wenruo



At 05/05/2017 10:40 AM, Marc MERLIN wrote:

On Fri, May 05, 2017 at 09:19:29AM +0800, Qu Wenruo wrote:

Sorry for not noticing the link.
  
no problem, it was only one line amongst many :)

Thanks much for having had a look.


[Conclusion]
After checking the full result, some of fs/subvolume trees are corrupted.

[Details]
Some example here:

---
ref mismatch on [6674127745024 32768] extent item 0, found 1
Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 not
found in extent tree
Incorrect local backref count on 6674127745024 parent 7566652473344 owner 0
offset 0 found 1 wanted 0 back 0x5648afda0f20
backpointer mismatch on [6674127745024 32768]
---

The extent at 6674127745024 seems to be an *DATA* extent.
While current default nodesize is 16K and ancient default node is 4K.

Unless you specified -n 32K at mkfs time, it's a DATA extent.


I did not, so you must be right about DATA, which should be good, right,
I don't mind losing data as long as the underlying metadata is correct.

I should have given more data on the FS:

gargamel:/var/local/src/btrfs-progs# btrfs fi df /mnt/btrfs_pool2/
Data, single: total=6.28TiB, used=6.12TiB
System, DUP: total=32.00MiB, used=720.00KiB
Metadata, DUP: total=97.00GiB, used=94.39GiB


Tons of metadata since the fs is so large.


GlobalReserve, single: total=512.00MiB, used=0.00B

gargamel:/var/local/src/btrfs-progs# btrfs fi usage /mnt/btrfs_pool2
Overall:
 Device size:   7.28TiB
 Device allocated:  6.47TiB
 Device unallocated:  824.48GiB
 Device missing:  0.00B
 Used:  6.30TiB
 Free (estimated):994.45GiB  (min: 582.21GiB)
 Data ratio:   1.00
 Metadata ratio:   2.00
 Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:6.28TiB, Used:6.12TiB
/dev/mapper/dshelf2 6.28TiB

Metadata,DUP: Size:97.00GiB, Used:94.39GiB
/dev/mapper/dshelf2   194.00GiB

System,DUP: Size:32.00MiB, Used:720.00KiB
/dev/mapper/dshelf264.00MiB

Unallocated:
/dev/mapper/dshelf2   824.48GiB



Further more, it's a shared data backref, it's using its parent tree block
to do backref walk.

And its parent tree block is 7566652473344.
While such bytenr can't be found anywhere (including csum error output),
that's to say either we can't find that tree block nor can't reach the tree
root for it.

Considering it's data extent, its owner is either root or fs/subvolume tree.


Such cases are everywhere, as I found other extent sized from 4K to 44K, so
I'm pretty sure there must be some fs/subvolume tree corrupted.
(Data extent in root tree is seldom 4K sized)

So unfortunately, your fs/subvolume trees are also corrupted.
And almost no chance to do a graceful recovery.
  
So I'm confused here. You're saying my metadata is not corrupted (and in

my case, I have DUP, so I should have 2 copies),


Nope, here I'm all talking about metadata (tree blocks).
Difference is the owner, either extent tree or fs/subvolume tree.

The fsck doesn't check data blocks.


but with data blocks
(which are not duped) corrupted, it's also possible to lose the
filesystem in a way that it can't be taken back to a clean state, even
by deleting some corrupted data?


No, it can't be repaired by deleting data.

The problem is, tree blocks (metadata) that refers these data blocks are 
corrupted.


And they are corrupted in such a way that both extent tree (tree 
contains extent allocation info) and fs tree (tree contains real fs 
info, like inode and data location) are corrupted.


So graceful recovery is not possible now.




[Alternatives]
I would recommend to use "btrfs restore -f " to restore specified
subvolume.


I don't need to restore data, the data is a backup. It will just take
many days to recreate (plus many hours of typing from me because the
backup updates are automated, but recreating everything, is not
automated)

So if I understand correctly, my metadata is fine (and I guess I have 2
copies, so it would have been unlucky to get both copies corrupted), but
enough data blocks got corrupted that btrfs cannot recover, even by
deleting the corrupted data blocks. Correct?


Unfortunately, no, even you have 2 copies, a lot of tree blocks are 
corrupted that neither copy matches checksum.


Just like the following tree block, both copy have wrong checksum.
---
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
---



And is it not possible to clear the corrupted blocks like this?
./btrfs-corrupt-block -l  2899180224512 /dev/mapper/dshelf2
and just accept the lost data but get btrfs check repair to deal with
the deleted blocks and bring the rest back to a clean state?No, that won't help.


Corrupted blocks are corrupted, that command is just trying to corrupt 
it again.

It won't do the black magic to adjust tree 

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Marc MERLIN
On Fri, May 05, 2017 at 09:19:29AM +0800, Qu Wenruo wrote:
> Sorry for not noticing the link.
 
no problem, it was only one line amongst many :)
Thanks much for having had a look.

> [Conclusion]
> After checking the full result, some of fs/subvolume trees are corrupted.
> 
> [Details]
> Some example here:
> 
> ---
> ref mismatch on [6674127745024 32768] extent item 0, found 1
> Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 not
> found in extent tree
> Incorrect local backref count on 6674127745024 parent 7566652473344 owner 0
> offset 0 found 1 wanted 0 back 0x5648afda0f20
> backpointer mismatch on [6674127745024 32768]
> ---
> 
> The extent at 6674127745024 seems to be an *DATA* extent.
> While current default nodesize is 16K and ancient default node is 4K.
> 
> Unless you specified -n 32K at mkfs time, it's a DATA extent.

I did not, so you must be right about DATA, which should be good, right,
I don't mind losing data as long as the underlying metadata is correct.

I should have given more data on the FS:

gargamel:/var/local/src/btrfs-progs# btrfs fi df /mnt/btrfs_pool2/
Data, single: total=6.28TiB, used=6.12TiB
System, DUP: total=32.00MiB, used=720.00KiB
Metadata, DUP: total=97.00GiB, used=94.39GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

gargamel:/var/local/src/btrfs-progs# btrfs fi usage /mnt/btrfs_pool2
Overall:
Device size:   7.28TiB
Device allocated:  6.47TiB
Device unallocated:  824.48GiB
Device missing:  0.00B
Used:  6.30TiB
Free (estimated):994.45GiB  (min: 582.21GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:6.28TiB, Used:6.12TiB
   /dev/mapper/dshelf2 6.28TiB

Metadata,DUP: Size:97.00GiB, Used:94.39GiB
   /dev/mapper/dshelf2   194.00GiB

System,DUP: Size:32.00MiB, Used:720.00KiB
   /dev/mapper/dshelf264.00MiB

Unallocated:
   /dev/mapper/dshelf2   824.48GiB


> Further more, it's a shared data backref, it's using its parent tree block
> to do backref walk.
> 
> And its parent tree block is 7566652473344.
> While such bytenr can't be found anywhere (including csum error output),
> that's to say either we can't find that tree block nor can't reach the tree
> root for it.
> 
> Considering it's data extent, its owner is either root or fs/subvolume tree.
> 
> 
> Such cases are everywhere, as I found other extent sized from 4K to 44K, so
> I'm pretty sure there must be some fs/subvolume tree corrupted.
> (Data extent in root tree is seldom 4K sized)
> 
> So unfortunately, your fs/subvolume trees are also corrupted.
> And almost no chance to do a graceful recovery.
 
So I'm confused here. You're saying my metadata is not corrupted (and in
my case, I have DUP, so I should have 2 copies), but with data blocks
(which are not duped) corrupted, it's also possible to lose the
filesystem in a way that it can't be taken back to a clean state, even
by deleting some corrupted data?

> [Alternatives]
> I would recommend to use "btrfs restore -f " to restore specified
> subvolume.

I don't need to restore data, the data is a backup. It will just take
many days to recreate (plus many hours of typing from me because the
backup updates are automated, but recreating everything, is not
automated)

So if I understand correctly, my metadata is fine (and I guess I have 2
copies, so it would have been unlucky to get both copies corrupted), but
enough data blocks got corrupted that btrfs cannot recover, even by
deleting the corrupted data blocks. Correct?

And is it not possible to clear the corrupted blocks like this?
./btrfs-corrupt-block -l  2899180224512 /dev/mapper/dshelf2
and just accept the lost data but get btrfs check repair to deal with
the deleted blocks and bring the rest back to a clean state?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Qu Wenruo



At 05/05/2017 09:19 AM, Qu Wenruo wrote:



At 05/02/2017 11:23 AM, Marc MERLIN wrote:

Hi Chris,

Thanks for the reply, much appreciated.

On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote:
What about btfs check (no repair), without and then also with 
--mode=lowmem?


In theory I like the idea of a 24 hour rollback; but in normal usage
Btrfs will eventually free up space containing stale and no longer
necessary metadata. Like the chunk tree, it's always changing, so you
get to a point, even with snapshots, that the old state of that tree
is just - gone. A snapshot of an fs tree does not make the chunk tree
frozen in time.

Right, of course, I was being way over optimistic here. I kind of forgot
that metadata wasn't COW, my bad.


In any case, it's a big problem in my mind if no existing tools can
fix a file system of this size. So before making anymore changes, make
sure you have a btrfs-image somewhere, even if it's huge. The offline
checker needs to be able to repair it, right now it's all we have for
such a case.


The image will be huge, and take maybe 24H to make (last time it took
some silly amount of time like that), and honestly I'm not sure how
useful it'll be.
Outside of the kernel crashing if I do a btrfs balance, and hopefully
the crash report I gave is good enough, the state I'm in is not btrfs'
fault.

If I can't roll back to a reasonably working state, with data loss of a
known quantity that I can recover from backup, I'll have to destroy and
filesystem and recover from scratch, which will take multiple days.
Since I can't wait too long before getting back to a working state, I
think I'm going to try btrfs check --repair after a scrub to get a list
of all the pathanmes/inodes that are known to be damaged, and work from
there.
Sounds reasonable?

Also, how is --mode=lowmem being useful?

And for re-parenting a sub-subvolume, is that possible?
(I want to delete /sub1/ but I can't because I have /sub1/sub2 that's 
also a subvolume
and I'm not sure how to re-parent sub2 to somewhere else so that I can 
subvolume delete

sub1)

In the meantime, a simple check without repair looks like this. It will
likely take many hours to complete:
gargamel:/var/local/space# btrfs check /dev/mapper/dshelf2
Checking filesystem on /dev/mapper/dshelf2
UUID: 03e9a50c-1ae6-4782-ab9c-5f310a98e653
checking extents
checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A
checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5
checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5
checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B
checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B
parent transid verify failed on 1671538819072 wanted 293964 found 293902
parent transid verify failed on 1671538819072 wanted 293964 found 293902
checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0
checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0
checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00
checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09
checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
(...)


Full output please.


Sorry for not noticing the link.

[Conclusion]
After checking the full result, some of fs/subvolume trees are corrupted.

[Details]
Some example here:

---
ref mismatch on [6674127745024 32768] extent item 0, found 1
Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 
not found in extent tree
Incorrect local backref count on 6674127745024 parent 7566652473344 
owner 0 offset 0 found 1 wanted 0 back 

[PATCH] btrfs: cleanup qgroup trace event

2017-05-04 Thread Anand Jain
Commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for
qgroup data reserve) added the following events which aren't used.
  btrfs__qgroup_data_map
  btrfs_qgroup_init_data_rsv_map
  btrfs_qgroup_free_data_rsv_map
So remove them.

Signed-off-by: Anand Jain 
  cc: quwen...@cn.fujitsu.com
Reviewed-by: Qu Wenruo 
---
 include/trace/events/btrfs.h | 36 
 1 file changed, 36 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index a3c3cab643a9..5471f9b4dc9e 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1270,42 +1270,6 @@ DEFINE_EVENT(btrfs__workqueue_done, 
btrfs_workqueue_destroy,
TP_ARGS(wq)
 );
 
-DECLARE_EVENT_CLASS(btrfs__qgroup_data_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved),
-
-   TP_STRUCT__entry_btrfs(
-   __field(u64,rootid  )
-   __field(unsigned long,  ino )
-   __field(u64,free_reserved   )
-   ),
-
-   TP_fast_assign_btrfs(btrfs_sb(inode->i_sb),
-   __entry->rootid =   BTRFS_I(inode)->root->objectid;
-   __entry->ino=   inode->i_ino;
-   __entry->free_reserved  =   free_reserved;
-   ),
-
-   TP_printk_btrfs("rootid=%llu ino=%lu free_reserved=%llu",
- __entry->rootid, __entry->ino, __entry->free_reserved)
-);
-
-DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved)
-);
-
-DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved)
-);
-
 #define BTRFS_QGROUP_OPERATIONS\
{ QGROUP_RESERVE,   "reserve"   },  \
{ QGROUP_RELEASE,   "release"   },  \
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Qu Wenruo



At 05/02/2017 11:23 AM, Marc MERLIN wrote:

Hi Chris,

Thanks for the reply, much appreciated.

On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote:

What about btfs check (no repair), without and then also with --mode=lowmem?

In theory I like the idea of a 24 hour rollback; but in normal usage
Btrfs will eventually free up space containing stale and no longer
necessary metadata. Like the chunk tree, it's always changing, so you
get to a point, even with snapshots, that the old state of that tree
is just - gone. A snapshot of an fs tree does not make the chunk tree
frozen in time.
  
Right, of course, I was being way over optimistic here. I kind of forgot

that metadata wasn't COW, my bad.


In any case, it's a big problem in my mind if no existing tools can
fix a file system of this size. So before making anymore changes, make
sure you have a btrfs-image somewhere, even if it's huge. The offline
checker needs to be able to repair it, right now it's all we have for
such a case.


The image will be huge, and take maybe 24H to make (last time it took
some silly amount of time like that), and honestly I'm not sure how
useful it'll be.
Outside of the kernel crashing if I do a btrfs balance, and hopefully
the crash report I gave is good enough, the state I'm in is not btrfs'
fault.

If I can't roll back to a reasonably working state, with data loss of a
known quantity that I can recover from backup, I'll have to destroy and
filesystem and recover from scratch, which will take multiple days.
Since I can't wait too long before getting back to a working state, I
think I'm going to try btrfs check --repair after a scrub to get a list
of all the pathanmes/inodes that are known to be damaged, and work from
there.
Sounds reasonable?

Also, how is --mode=lowmem being useful?

And for re-parenting a sub-subvolume, is that possible?
(I want to delete /sub1/ but I can't because I have /sub1/sub2 that's also a 
subvolume
and I'm not sure how to re-parent sub2 to somewhere else so that I can 
subvolume delete
sub1)

In the meantime, a simple check without repair looks like this. It will
likely take many hours to complete:
gargamel:/var/local/space# btrfs check /dev/mapper/dshelf2
Checking filesystem on /dev/mapper/dshelf2
UUID: 03e9a50c-1ae6-4782-ab9c-5f310a98e653
checking extents
checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A
checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5
checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5
checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B
checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B
parent transid verify failed on 1671538819072 wanted 293964 found 293902
parent transid verify failed on 1671538819072 wanted 293964 found 293902
checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0
checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0
checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00
checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09
checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071
(...)


Full output please.

I know it will be long, but the point here is, full output could help us 
to at least locate where the most corruption are.


If most corruption are only in extent tree, the chance to recover will 
increase hugely.


As extent tree is just a backref for all allocated extents, it's not 
really important if recovery (read) is the primary goal.


But if other tree (fs or subvolume tree important for you) also get 
corrupted, I'm afraid your last chance will be 

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Qu Wenruo



At 05/02/2017 02:08 AM, Marc MERLIN wrote:

So, I forgot to mention that it's my main media and backup server that got
corrupted. Yes, I do actually have a backup of a backup server, but it's
going to take days to recover due to the amount of data to copy back, not
counting lots of manual typing due to the number of subvolumes, btrfs
send/receive relationships and so forth.

Really, I should be able to roll back all writes from the last 24H, run a
check --repair/scrub on top just to be sure, and be back on track.

In the meantime, the good news is that the filesystem doesn't crash the
kernel (the poasted crash below) now that I was able to cancel the btrfs 
balance,
but it goes read only at the drop of a hat, even when I'm trying to delete
recent snapshots and all data that was potentially written in the last 24H

On Mon, May 01, 2017 at 10:06:41AM -0700, Marc MERLIN wrote:

I have a filesystem that sadly got corrupted by a SAS card I just installed 
yesterday.

I don't think in a case like this, there is there a way to roll back all
writes across all subvolumes in the last 24H, correct?


Sorry for the late reply.
I thought the case is already finished as I see little chance to recover. :(

No, no way to roll back unless you're completely sure there is only 1 
transaction commit happened in last 24H.

(Well, not really possible in real world)

Btrfs is only capable to rollback to *previous* commit.
That's ensure by forced metadata CoW.

But beyond previous commit, only god knows.

If all metadata CoW write is done in some place never used by any 
previous metadata, then there is the chance to recover.


But mostly the possibility is very low, some mount option like ssd will 
change the extent allocator behavior to improve the possibility, but 
still need a lot of luck.


More detailed comment will be replied to btrfs check mail.

Thanks,
Qu



Is the best thing to go in each subvolume, delete the recent snapshots and
rename the one from 24H as the current one?
  
Well, just like I expected, it's a pain in the rear and this can't even help

fix the top level mountpoint which doesn't have snapshots, so I can't roll
it back.
btrfs should really have an easy way to roll back X hours, or days to
recover from garbage written after a good known point, given that it is COW
afterall.

Is there a way do this with check --repair maybe?

In the meantime, I got stuck while trying to delete snapshots:

Let's say I have this:
ID 428 gen 294021 top level 5 path backup
ID 2023 gen 294021 top level 5 path Soft
ID 3021 gen 294051 top level 428 path backup/debian32
ID 4400 gen 294018 top level 428 path backup/debian64
ID 4930 gen 294019 top level 428 path backup/ubuntu

I can easily
Delete subvolume (no-commit): '/mnt/btrfs_pool2/Soft'
and then:
gargamel:/mnt/btrfs_pool2# mv Soft_rw.20170430_01:50:22 Soft

But I can't delete backup, which actually is mostly only a directory
containing other things (in hindsight I shouldn't have made that a
subvolume)
Delete subvolume (no-commit): '/mnt/btrfs_pool2/backup'
ERROR: cannot delete '/mnt/btrfs_pool2/backup': Directory not empty

This is because backup has a lot of subvolumes due to btrfs send/receive
relationships.

Is it possible to recover there? Can you reparent subvolumes to a different
subvolume without doing a full copy via btrfs send/receive?

Thanks,
Marc


BTRFS warning (device dm-5): failed to load free space cache for block group 
6746013696000, rebuilding it now
BTRFS warning (device dm-5): block group 6754603630592 has wrong amount of free 
space
BTRFS warning (device dm-5): failed to load free space cache for block group 
6754603630592, rebuilding it now
BTRFS warning (device dm-5): block group 7125178777600 has wrong amount of free 
space
BTRFS warning (device dm-5): failed to load free space cache for block group 
7125178777600, rebuilding it now
BTRFS error (device dm-5): bad tree block start 3981076597540270796 
2899180224512
BTRFS error (device dm-5): bad tree block start 942082474969670243 2899180224512
BTRFS: error (device dm-5) in __btrfs_free_extent:6944: errno=-5 IO failure
BTRFS info (device dm-5): forced readonly
BTRFS: error (device dm-5) in btrfs_run_delayed_refs:2961: errno=-5 IO failure
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: __del_reloc_root+0x3f/0xa6
PGD 189a0e067
PUD 189a0f067
PMD 0

Oops:  [#1] PREEMPT SMP
Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev 
lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc 
ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT 
nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common 
xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio 
iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack 
x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 

Re: btrfsck lowmem mode shows corruptions

2017-05-04 Thread Qu Wenruo



At 05/05/2017 01:29 AM, Kai Krakow wrote:

Hello!

Since I saw a few kernel freezes lately (due to experimenting with
ck-sources) including some filesystem-related backtraces, I booted my
rescue system to check my btrfs filesystem.

Luckily, it showed no problems. It said, everything's fine. But I also
thought: Okay, let's try lowmem mode. And that showed a frightening
long list of extent corruptions und unreferenced chunks. Should I worry?


Thanks for trying lowmem mode.

Would you please provide the version of btrfs-progs?

IIRC "ERROR: data extent[96316809216 2097152] backref lost" bug has been 
fixed in recent release.


And for reference, would you please provide the tree dump of your chunk 
and device tree?


This can be done by running:
# btrfs-debug-tree -t device 
# btrfs-debug-tree -t chunk 

And this 2 dump only contains the btrfs chunk mapping info, so nothing 
sensitive is contained.


Thanks,
Qu


PS: The freezes seem to be related to bfq, switching to deadline solved
these.

Full log attached, here's an excerpt:

---8<---

checking extents
ERROR: chunk[256 4324327424) stripe 0 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 1 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 2 did not find the related dev extent
ERROR: chunk[256 7545552896) stripe 0 did not find the related dev extent
ERROR: chunk[256 7545552896) stripe 1 did not find the related dev extent
ERROR: chunk[256 7545552896) stripe 2 did not find the related dev extent
[...]
ERROR: device extent[1, 1094713344, 1073741824] did not find the related chunk
ERROR: device extent[1, 2168455168, 1073741824] did not find the related chunk
ERROR: device extent[1, 3242196992, 1073741824] did not find the related chunk
[...]
ERROR: device extent[2, 608854605824, 1073741824] did not find the related chunk
ERROR: device extent[2, 609928347648, 1073741824] did not find the related chunk
ERROR: device extent[2, 611002089472, 1073741824] did not find the related chunk
[...]
ERROR: device extent[3, 64433946624, 1073741824] did not find the related chunk
ERROR: device extent[3, 65507688448, 1073741824] did not find the related chunk
ERROR: device extent[3, 66581430272, 1073741824] did not find the related chunk
[...]
ERROR: data extent[96316809216 2097152] backref lost
ERROR: data extent[96316809216 2097152] backref lost
ERROR: data extent[96316809216 2097152] backref lost
ERROR: data extent[686074396672 13737984] backref lost
ERROR: data extent[686074396672 13737984] backref lost
ERROR: data extent[686074396672 13737984] backref lost
[...]
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
ERROR: errors found in fs roots
Checking filesystem on /dev/disk/by-label/system
UUID: bc201ce5-8f2b-4263-995a-6641e89d4c88
found 1960075935744 bytes used, error(s) found
total csum bytes: 1673537040
total tree bytes: 4899094528
total fs tree bytes: 2793914368
total extent tree bytes: 190398464
btree space waste bytes: 871743708
file data blocks allocated: 6907169177600
  referenced 1979268648960




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH] btrfs: clean up qgroup trace event

2017-05-04 Thread Qu Wenruo



At 05/04/2017 10:04 PM, Anand Jain wrote:

Hi Qu,

The commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for
qgroup data reserve) added the following events which aren't used.
   btrfs__qgroup_data_map
   btrfs_qgroup_init_data_rsv_map
   btrfs_qgroup_free_data_rsv_map
I wonder if it is better to remove or keep it for future use.


Please remove them.

These 2 old tracepoints are never used due to later patch split.
Some of the old caller doesn't ever exist.

Reviewed-by: Qu Wenruo 

Thanks for catching this,
Qu



Signed-off-by: Anand Jain 
cc: quwen...@cn.fujitsu.com
---
  include/trace/events/btrfs.h | 36 
  1 file changed, 36 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index a3c3cab643a9..5471f9b4dc9e 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1270,42 +1270,6 @@ DEFINE_EVENT(btrfs__workqueue_done, 
btrfs_workqueue_destroy,
TP_ARGS(wq)
  );
  
-DECLARE_EVENT_CLASS(btrfs__qgroup_data_map,

-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved),
-
-   TP_STRUCT__entry_btrfs(
-   __field(u64,rootid  )
-   __field(unsigned long,  ino )
-   __field(u64,free_reserved   )
-   ),
-
-   TP_fast_assign_btrfs(btrfs_sb(inode->i_sb),
-   __entry->rootid  =   
BTRFS_I(inode)->root->objectid;
-   __entry->ino =   inode->i_ino;
-   __entry->free_reserved   =   free_reserved;
-   ),
-
-   TP_printk_btrfs("rootid=%llu ino=%lu free_reserved=%llu",
- __entry->rootid, __entry->ino, __entry->free_reserved)
-);
-
-DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved)
-);
-
-DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved)
-);
-
  #define BTRFS_QGROUP_OPERATIONS   \
{ QGROUP_RESERVE,   "reserve" },  \
{ QGROUP_RELEASE,   "release" },  \




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: File system corruption, btrfsck abort

2017-05-04 Thread Qu Wenruo



At 05/03/2017 10:21 PM, Christophe de Dinechin wrote:



On 2 May 2017, at 02:17, Qu Wenruo  wrote:



At 04/28/2017 04:47 PM, Christophe de Dinechin wrote:

On 28 Apr 2017, at 02:45, Qu Wenruo  wrote:



At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:

Hi,
I”ve been trying to run btrfs as my primary work filesystem for about 3-4 
months now on Fedora 25 systems. I ran a few times into filesystem corruptions. 
At least one I attributed to a damaged disk, but the last one is with a brand 
new 3T disk that reports no SMART errors. Worse yet, in at least three cases, 
the filesystem corruption caused btrfsck to crash.
The last filesystem corruption is documented here: 
https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.


According to the bugzilla, the btrfs-progs seems to be too old in btrfs 
standard.
What about using the latest btrfs-progs v4.10.2?

I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.
I am currently debugging with a build from the master branch as of Tuesday 
(commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2
There was no change in behavior. Runs are split about evenly between list crash 
and abort.
I added instrumentation and tried a fix, which brings me a tiny bit further, 
until I hit a message from delete_duplicate_records:
Ok we have overlapping extents that aren't completely covered by each
other, this is going to require more careful thought.  The extents are
[52428800-16384] and [52432896-16384]


Then I think lowmem mode may have better chance to handle it without crash.


I tried it and got:

[root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4
enabling repair mode
ERROR: low memory mode doesn't support repair yet

The problem only occurred in —repair mode anyway.





Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
You could try "btrfs check --mode=lowmem" to see if such problem can be avoided.

I will try that, but what makes you think this is a memory-related condition? 
The machine has 16G of RAM, isn’t that enough for an fsck?


Not for memory usage, but in fact lowmem mode is a completely rework, so I just 
want to see how good or bad the new lowmem mode handles it.


Is there a prototype with lowmem and repair?


Yes, Su Yue submitted a patchset for it, but still repair is only 
supported for fs tree contents.

https://www.spinics.net/lists/linux-btrfs/msg63316.html

Repairing other trees, especially extent tree, is not supported yet.

Thanks,
Qu



Thanks
Christophe



Thanks,
Qu



For the kernel bug, it seems to be related to wrongly inserted delayed ref, but 
I can totally be wrong.

For now, I’m focusing on the “repair” part as much as I can, because I assume 
the kernel bug is there anyway, so someone else is bound to hit this problem.
Thanks
Christophe


Thanks,
Qu

The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. 
I have two crash modes: either an abort or a SIGSEGV. I checked that both still 
happens on master as of today.
The cause of the abort is that we call set_extent_dirty from check_extent_refs 
with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 
(see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do 
sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 
16384 tmpl 0x7fffd120
78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from 
tmpl 0x7fffcf80
78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 
16384 tmpl 0x7fffd120
I don’t really know what to make of it.
The cause of the SIGSEGV is that we try to free a list entry that has its next 
set to NULL.
#0  list_del (entry=0x55db0420) at 
/usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
#1  free_all_extent_backrefs (rec=0x55db0350) at cmds-check.c:5386
#2  maybe_free_extent_rec (extent_cache=0x7fffd990, rec=0x55db0350) at 
cmds-check.c:5417
#3  0x555b308f in check_block (flags=, 
buf=0x7b87cdf0, extent_cache=0x7fffd990, root=0x5587d570) at 
cmds-check.c:5851
#4  run_next_block (root=root@entry=0x5587d570, bits=bits@entry=0x558841
I don’t know if the two problems are related, but they seem to be pretty 
consistent on this specific disk, so I think that we have a good opportunity to 
improve btrfsck to make it more robust to this specific form of corruption. But 
I don’t want to hapazardly modify a code I don’t really understand. So if 
anybody could make a suggestion on what the right strategy should be when we 
have max_size == 0, or how to avoid it in the first place.
I don’t know if this is relevant at all, but all the machines that failed that 
way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat 
intense on occasions, since the 

btrfsck lowmem mode shows corruptions

2017-05-04 Thread Kai Krakow
Hello!

Since I saw a few kernel freezes lately (due to experimenting with
ck-sources) including some filesystem-related backtraces, I booted my
rescue system to check my btrfs filesystem.

Luckily, it showed no problems. It said, everything's fine. But I also
thought: Okay, let's try lowmem mode. And that showed a frightening
long list of extent corruptions und unreferenced chunks. Should I worry?

PS: The freezes seem to be related to bfq, switching to deadline solved
these.

Full log attached, here's an excerpt:

---8<---

checking extents
ERROR: chunk[256 4324327424) stripe 0 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 1 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 2 did not find the related dev extent
ERROR: chunk[256 7545552896) stripe 0 did not find the related dev extent
ERROR: chunk[256 7545552896) stripe 1 did not find the related dev extent
ERROR: chunk[256 7545552896) stripe 2 did not find the related dev extent
[...]
ERROR: device extent[1, 1094713344, 1073741824] did not find the related chunk
ERROR: device extent[1, 2168455168, 1073741824] did not find the related chunk
ERROR: device extent[1, 3242196992, 1073741824] did not find the related chunk
[...]
ERROR: device extent[2, 608854605824, 1073741824] did not find the related chunk
ERROR: device extent[2, 609928347648, 1073741824] did not find the related chunk
ERROR: device extent[2, 611002089472, 1073741824] did not find the related chunk
[...]
ERROR: device extent[3, 64433946624, 1073741824] did not find the related chunk
ERROR: device extent[3, 65507688448, 1073741824] did not find the related chunk
ERROR: device extent[3, 66581430272, 1073741824] did not find the related chunk
[...]
ERROR: data extent[96316809216 2097152] backref lost
ERROR: data extent[96316809216 2097152] backref lost
ERROR: data extent[96316809216 2097152] backref lost
ERROR: data extent[686074396672 13737984] backref lost
ERROR: data extent[686074396672 13737984] backref lost
ERROR: data extent[686074396672 13737984] backref lost
[...]
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
ERROR: errors found in fs roots
Checking filesystem on /dev/disk/by-label/system
UUID: bc201ce5-8f2b-4263-995a-6641e89d4c88
found 1960075935744 bytes used, error(s) found
total csum bytes: 1673537040
total tree bytes: 4899094528
total fs tree bytes: 2793914368
total extent tree bytes: 190398464
btree space waste bytes: 871743708
file data blocks allocated: 6907169177600
 referenced 1979268648960

-- 
Regards,
Kai

Replies to list-only preferred.

lowmem.txt.gz
Description: application/gzip


Re: [PATCH 9/9] btrfs-progs: modify: Introduce option to specify the pattern to fill mirror

2017-05-04 Thread David Sterba
On Sun, Apr 23, 2017 at 01:12:42PM +0530, Lakshmipathi.G wrote:
> Thanks for the example and details. I understood some and need to
> re-read couple of more times to understand the remaining.
> 
> btw, I created a corruption framework(with previous org), the sample
> usage and example is below. It looks similar to Btrfs corruption tool.
> thanks.
> 
> --
> corrupt.py --help
[...]

Interesting, can you please share the script? This is another
alternative that seems more plausible for rapid prototyping of various
corruption scenarios. The C utility (either existing btrfs-corrupt-block
or the proposed btrfs-modify) can become tedious to change, but can be
compiled and distributed without the python dependency.

I wanted to use something python-based for tests when Hans announced the
python-btrfs project, but it has broader goals than just the testsuite
needs.  So we could have our own corrupt.py, just for our internal use.

I'm not sure if a compiled tool like btrfs-modify is really needed, but
why we can't have both.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Struggling with file system slowness

2017-05-04 Thread Duncan
Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted:

> Hi All,
> 
> Trying to peg down why I have one server that has btrfs-transacti pegged
> at 100% CPU for most of the time.
> 
> I thought this might have to do with fragmentation as mentioned in the
> Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as
> mentioned in the wiki), but after running a full defrag of the file
> system, and also enabling the 'autodefrag' mount option, the problem
> still persists.
> 
> What's the best way to figure out what btrfs is chugging away at here?
> 
> Kernel: 4.10.13-custom
> btrfs-progs: v4.10.2

Headed for work so briefer than usual...

Three questions:

Number of snapshots per subvolume?

Quotas enabled?

Do you do dedupe or otherwise have lots of reflinks?


These dramatically affect scaling.  Keeping the number of snapshots per 
subvolume under 300, under 100 if possible, should help a lot.  Quotas 
dramatically worsen the problem, so keeping them disabled unless your use-
case calls for them should help (and if your use-case calls for them, 
consider a filesystem where the quota feature is more mature).  And 
reflinks are the mechanism behind snapshots, so too many of them for 
other reasons (such as dedupe) create problems too, tho a snapshot 
basically reflinks /everything/, so it takes quite a few reflinks to 
trigger the scaling issues of a single snapshot, meaning they aren't 
normally a problem unless dedupe is done on a /massive/ scale.

Of course defrag interacts with snapshots too, tho it shouldn't affect 
/this/ problem, but potentially eating up more space than expected as it 
breaks the reflinks.


Beyond that, have you tried a (readonly) btrfs check and/or a scrub or 
balance recently?  Perhaps there's something wrong that's snagging 
things, and you simply haven't otherwise detected it yet?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Struggling with file system slowness

2017-05-04 Thread Peter Grandi
> Trying to peg down why I have one server that has
> btrfs-transacti pegged at 100% CPU for most of the time.

Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What kind
of workload and file size/distribution profile?

Typical high CPU are extents (your defragging not necessarily
worked), and 'qgroups', especially with many subvolumes. It
could be the fre space cache in some rare cases.

  
https://www.google.ca/search?num=100=images_q=cxpu_epq=btrfs-transaction

To this something like this happens often, but is not
Btrfs-related, but triggered for example by near-memory
exhaustion in the kernel memory manager.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH] btrfs: clean up qgroup trace event

2017-05-04 Thread Anand Jain
Hi Qu,

The commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for
qgroup data reserve) added the following events which aren't used.
  btrfs__qgroup_data_map
  btrfs_qgroup_init_data_rsv_map
  btrfs_qgroup_free_data_rsv_map
I wonder if it is better to remove or keep it for future use.

Signed-off-by: Anand Jain 
cc: quwen...@cn.fujitsu.com
---
 include/trace/events/btrfs.h | 36 
 1 file changed, 36 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index a3c3cab643a9..5471f9b4dc9e 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1270,42 +1270,6 @@ DEFINE_EVENT(btrfs__workqueue_done, 
btrfs_workqueue_destroy,
TP_ARGS(wq)
 );
 
-DECLARE_EVENT_CLASS(btrfs__qgroup_data_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved),
-
-   TP_STRUCT__entry_btrfs(
-   __field(u64,rootid  )
-   __field(unsigned long,  ino )
-   __field(u64,free_reserved   )
-   ),
-
-   TP_fast_assign_btrfs(btrfs_sb(inode->i_sb),
-   __entry->rootid =   BTRFS_I(inode)->root->objectid;
-   __entry->ino=   inode->i_ino;
-   __entry->free_reserved  =   free_reserved;
-   ),
-
-   TP_printk_btrfs("rootid=%llu ino=%lu free_reserved=%llu",
- __entry->rootid, __entry->ino, __entry->free_reserved)
-);
-
-DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved)
-);
-
-DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map,
-
-   TP_PROTO(struct inode *inode, u64 free_reserved),
-
-   TP_ARGS(inode, free_reserved)
-);
-
 #define BTRFS_QGROUP_OPERATIONS\
{ QGROUP_RESERVE,   "reserve"   },  \
{ QGROUP_RELEASE,   "release"   },  \
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Struggling with file system slowness

2017-05-04 Thread Matt McKinnon

Hi All,

Trying to peg down why I have one server that has btrfs-transacti pegged 
at 100% CPU for most of the time.


I thought this might have to do with fragmentation as mentioned in the 
Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as 
mentioned in the wiki), but after running a full defrag of the file 
system, and also enabling the 'autodefrag' mount option, the problem 
still persists.


What's the best way to figure out what btrfs is chugging away at here?

Kernel: 4.10.13-custom
btrfs-progs: v4.10.2


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: File system corruption, btrfsck abort

2017-05-04 Thread Christophe de Dinechin

> On 3 May 2017, at 16:21, Christophe de Dinechin  wrote:
> 
>> 
>> On 2 May 2017, at 02:17, Qu Wenruo  wrote:
>> 
>> 
>> 
>> At 04/28/2017 04:47 PM, Christophe de Dinechin wrote:
 On 28 Apr 2017, at 02:45, Qu Wenruo  wrote:
 
 
 
 At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
> Hi,
> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 
> months now on Fedora 25 systems. I ran a few times into filesystem 
> corruptions. At least one I attributed to a damaged disk, but the last 
> one is with a brand new 3T disk that reports no SMART errors. Worse yet, 
> in at least three cases, the filesystem corruption caused btrfsck to 
> crash.
> The last filesystem corruption is documented here: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in 
> there.
 
 According to the bugzilla, the btrfs-progs seems to be too old in btrfs 
 standard.
 What about using the latest btrfs-progs v4.10.2?
>>> I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.
>>> I am currently debugging with a build from the master branch as of Tuesday 
>>> (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2
>>> There was no change in behavior. Runs are split about evenly between list 
>>> crash and abort.
>>> I added instrumentation and tried a fix, which brings me a tiny bit 
>>> further, until I hit a message from delete_duplicate_records:
>>> Ok we have overlapping extents that aren't completely covered by each
>>> other, this is going to require more careful thought.  The extents are
>>> [52428800-16384] and [52432896-16384]
>> 
>> Then I think lowmem mode may have better chance to handle it without crash.
> 
> I tried it and got:
> 
> [root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4
> enabling repair mode
> ERROR: low memory mode doesn't support repair yet
> 
> The problem only occurred in —repair mode anyway.

For what it’s worth, without the --repair option, it gets stuck. I stopped it 
after 24 hours, it had printed:

[root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem  /dev/sda4
Checking filesystem on /dev/sda4
UUID: 26a0c84c-d2ac-4da8-b880-684f2ea48a22
checking extents
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
Csum didn't match
ERROR: extent [52428800 16384] lost referencer (owner: 7, level: 0)
checksum verify failed on 52445184 found 8D1BE62F wanted 
checksum verify failed on 52445184 found 8D1BE62F wanted 
checksum verify failed on 52445184 found 8D1BE62F wanted 
checksum verify failed on 52445184 found 8D1BE62F wanted 
bytenr mismatch, want=52445184, have=219902322
ERROR: extent [52445184 16384] lost referencer (owner: 2, level: 0)
ERROR: extent[52432896 16384] backref lost (owner: 2, level: 0)
ERROR: check leaf failed root 2 bytenr 52432896 level 0, force continue check

Any tips for further debugging this?


Christophe

> 
> 
>> 
 Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
 You could try "btrfs check --mode=lowmem" to see if such problem can be 
 avoided.
>>> I will try that, but what makes you think this is a memory-related 
>>> condition? The machine has 16G of RAM, isn’t that enough for an fsck?
>> 
>> Not for memory usage, but in fact lowmem mode is a completely rework, so I 
>> just want to see how good or bad the new lowmem mode handles it.
> 
> Is there a prototype with lowmem and repair?
> 
> 
> Thanks
> Christophe
> 
>> 
>> Thanks,
>> Qu
>> 
 
 For the kernel bug, it seems to be related to wrongly inserted delayed 
 ref, but I can totally be wrong.
>>> For now, I’m focusing on the “repair” part as much as I can, because I 
>>> assume the kernel bug is there anyway, so someone else is bound to hit this 
>>> problem.
>>> Thanks
>>> Christophe
 
 Thanks,
 Qu
> The btrfsck crash is here: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash 
> modes: either an abort or a SIGSEGV. I checked that both still happens on 
> master as of today.
> The cause of the abort is that we call set_extent_dirty from 
> check_extent_refs with rec->max_size == 0. I’ve instrumented to try to 
> see where we set this to 0 (see 
> https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do 
> sometimes see max_size set to 0 in a few locations. My instrumentation 
> shows this:
> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 
> max_size 16384 tmpl 0x7fffd120
> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 
> from tmpl 0x7fffcf80

help converting btrfs to new writeback error tracking?

2017-05-04 Thread Jeff Layton
I've been working on set of patches to clean up how writeback errors are
tracked and handled in the kernel:

http://marc.info/?l=linux-fsdevel=149304074111261=2

The basic idea is that rather than having a set of flags that are
cleared whenever they are checked, we have a sequence counter and error
that are tracked on a per-mapping basis, and can then use that sequence
counter to tell whether the error should be reported.

This changes the way that things like filemap_write_and_wait work.
Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared
inappropriately (and thus losing errors that should be reported), you
can now tell whether there has been a writeback error since a certain
point in time, irrespective of whether anyone else is checking for
errors.

I've been doing some conversions of the existing code to the new scheme,
but btrfs has _really_ complicated error handling. I think it could
probably be simplified with this new scheme, but I could use some help
here.

What I think we probably want to do is to sample the error sequence in
the mapping at well-defined points in time (probably when starting a
transaction?) and then use that to determine whether writeback errors
have occurred since then. Is there anyone in the btrfs community who
could help me here?

Thanks,
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html