Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?
At 05/05/2017 10:40 AM, Marc MERLIN wrote: On Fri, May 05, 2017 at 09:19:29AM +0800, Qu Wenruo wrote: Sorry for not noticing the link. no problem, it was only one line amongst many :) Thanks much for having had a look. [Conclusion] After checking the full result, some of fs/subvolume trees are corrupted. [Details] Some example here: --- ref mismatch on [6674127745024 32768] extent item 0, found 1 Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 6674127745024 parent 7566652473344 owner 0 offset 0 found 1 wanted 0 back 0x5648afda0f20 backpointer mismatch on [6674127745024 32768] --- The extent at 6674127745024 seems to be an *DATA* extent. While current default nodesize is 16K and ancient default node is 4K. Unless you specified -n 32K at mkfs time, it's a DATA extent. I did not, so you must be right about DATA, which should be good, right, I don't mind losing data as long as the underlying metadata is correct. I should have given more data on the FS: gargamel:/var/local/src/btrfs-progs# btrfs fi df /mnt/btrfs_pool2/ Data, single: total=6.28TiB, used=6.12TiB System, DUP: total=32.00MiB, used=720.00KiB Metadata, DUP: total=97.00GiB, used=94.39GiB Tons of metadata since the fs is so large. GlobalReserve, single: total=512.00MiB, used=0.00B gargamel:/var/local/src/btrfs-progs# btrfs fi usage /mnt/btrfs_pool2 Overall: Device size: 7.28TiB Device allocated: 6.47TiB Device unallocated: 824.48GiB Device missing: 0.00B Used: 6.30TiB Free (estimated):994.45GiB (min: 582.21GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:6.28TiB, Used:6.12TiB /dev/mapper/dshelf2 6.28TiB Metadata,DUP: Size:97.00GiB, Used:94.39GiB /dev/mapper/dshelf2 194.00GiB System,DUP: Size:32.00MiB, Used:720.00KiB /dev/mapper/dshelf264.00MiB Unallocated: /dev/mapper/dshelf2 824.48GiB Further more, it's a shared data backref, it's using its parent tree block to do backref walk. And its parent tree block is 7566652473344. While such bytenr can't be found anywhere (including csum error output), that's to say either we can't find that tree block nor can't reach the tree root for it. Considering it's data extent, its owner is either root or fs/subvolume tree. Such cases are everywhere, as I found other extent sized from 4K to 44K, so I'm pretty sure there must be some fs/subvolume tree corrupted. (Data extent in root tree is seldom 4K sized) So unfortunately, your fs/subvolume trees are also corrupted. And almost no chance to do a graceful recovery. So I'm confused here. You're saying my metadata is not corrupted (and in my case, I have DUP, so I should have 2 copies), Nope, here I'm all talking about metadata (tree blocks). Difference is the owner, either extent tree or fs/subvolume tree. The fsck doesn't check data blocks. but with data blocks (which are not duped) corrupted, it's also possible to lose the filesystem in a way that it can't be taken back to a clean state, even by deleting some corrupted data? No, it can't be repaired by deleting data. The problem is, tree blocks (metadata) that refers these data blocks are corrupted. And they are corrupted in such a way that both extent tree (tree contains extent allocation info) and fs tree (tree contains real fs info, like inode and data location) are corrupted. So graceful recovery is not possible now. [Alternatives] I would recommend to use "btrfs restore -f " to restore specified subvolume. I don't need to restore data, the data is a backup. It will just take many days to recreate (plus many hours of typing from me because the backup updates are automated, but recreating everything, is not automated) So if I understand correctly, my metadata is fine (and I guess I have 2 copies, so it would have been unlucky to get both copies corrupted), but enough data blocks got corrupted that btrfs cannot recover, even by deleting the corrupted data blocks. Correct? Unfortunately, no, even you have 2 copies, a lot of tree blocks are corrupted that neither copy matches checksum. Just like the following tree block, both copy have wrong checksum. --- checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 --- And is it not possible to clear the corrupted blocks like this? ./btrfs-corrupt-block -l 2899180224512 /dev/mapper/dshelf2 and just accept the lost data but get btrfs check repair to deal with the deleted blocks and bring the rest back to a clean state?No, that won't help. Corrupted blocks are corrupted, that command is just trying to corrupt it again. It won't do the black magic to adjust tree
Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?
On Fri, May 05, 2017 at 09:19:29AM +0800, Qu Wenruo wrote: > Sorry for not noticing the link. no problem, it was only one line amongst many :) Thanks much for having had a look. > [Conclusion] > After checking the full result, some of fs/subvolume trees are corrupted. > > [Details] > Some example here: > > --- > ref mismatch on [6674127745024 32768] extent item 0, found 1 > Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 not > found in extent tree > Incorrect local backref count on 6674127745024 parent 7566652473344 owner 0 > offset 0 found 1 wanted 0 back 0x5648afda0f20 > backpointer mismatch on [6674127745024 32768] > --- > > The extent at 6674127745024 seems to be an *DATA* extent. > While current default nodesize is 16K and ancient default node is 4K. > > Unless you specified -n 32K at mkfs time, it's a DATA extent. I did not, so you must be right about DATA, which should be good, right, I don't mind losing data as long as the underlying metadata is correct. I should have given more data on the FS: gargamel:/var/local/src/btrfs-progs# btrfs fi df /mnt/btrfs_pool2/ Data, single: total=6.28TiB, used=6.12TiB System, DUP: total=32.00MiB, used=720.00KiB Metadata, DUP: total=97.00GiB, used=94.39GiB GlobalReserve, single: total=512.00MiB, used=0.00B gargamel:/var/local/src/btrfs-progs# btrfs fi usage /mnt/btrfs_pool2 Overall: Device size: 7.28TiB Device allocated: 6.47TiB Device unallocated: 824.48GiB Device missing: 0.00B Used: 6.30TiB Free (estimated):994.45GiB (min: 582.21GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:6.28TiB, Used:6.12TiB /dev/mapper/dshelf2 6.28TiB Metadata,DUP: Size:97.00GiB, Used:94.39GiB /dev/mapper/dshelf2 194.00GiB System,DUP: Size:32.00MiB, Used:720.00KiB /dev/mapper/dshelf264.00MiB Unallocated: /dev/mapper/dshelf2 824.48GiB > Further more, it's a shared data backref, it's using its parent tree block > to do backref walk. > > And its parent tree block is 7566652473344. > While such bytenr can't be found anywhere (including csum error output), > that's to say either we can't find that tree block nor can't reach the tree > root for it. > > Considering it's data extent, its owner is either root or fs/subvolume tree. > > > Such cases are everywhere, as I found other extent sized from 4K to 44K, so > I'm pretty sure there must be some fs/subvolume tree corrupted. > (Data extent in root tree is seldom 4K sized) > > So unfortunately, your fs/subvolume trees are also corrupted. > And almost no chance to do a graceful recovery. So I'm confused here. You're saying my metadata is not corrupted (and in my case, I have DUP, so I should have 2 copies), but with data blocks (which are not duped) corrupted, it's also possible to lose the filesystem in a way that it can't be taken back to a clean state, even by deleting some corrupted data? > [Alternatives] > I would recommend to use "btrfs restore -f " to restore specified > subvolume. I don't need to restore data, the data is a backup. It will just take many days to recreate (plus many hours of typing from me because the backup updates are automated, but recreating everything, is not automated) So if I understand correctly, my metadata is fine (and I guess I have 2 copies, so it would have been unlucky to get both copies corrupted), but enough data blocks got corrupted that btrfs cannot recover, even by deleting the corrupted data blocks. Correct? And is it not possible to clear the corrupted blocks like this? ./btrfs-corrupt-block -l 2899180224512 /dev/mapper/dshelf2 and just accept the lost data but get btrfs check repair to deal with the deleted blocks and bring the rest back to a clean state? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?
At 05/05/2017 09:19 AM, Qu Wenruo wrote: At 05/02/2017 11:23 AM, Marc MERLIN wrote: Hi Chris, Thanks for the reply, much appreciated. On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote: What about btfs check (no repair), without and then also with --mode=lowmem? In theory I like the idea of a 24 hour rollback; but in normal usage Btrfs will eventually free up space containing stale and no longer necessary metadata. Like the chunk tree, it's always changing, so you get to a point, even with snapshots, that the old state of that tree is just - gone. A snapshot of an fs tree does not make the chunk tree frozen in time. Right, of course, I was being way over optimistic here. I kind of forgot that metadata wasn't COW, my bad. In any case, it's a big problem in my mind if no existing tools can fix a file system of this size. So before making anymore changes, make sure you have a btrfs-image somewhere, even if it's huge. The offline checker needs to be able to repair it, right now it's all we have for such a case. The image will be huge, and take maybe 24H to make (last time it took some silly amount of time like that), and honestly I'm not sure how useful it'll be. Outside of the kernel crashing if I do a btrfs balance, and hopefully the crash report I gave is good enough, the state I'm in is not btrfs' fault. If I can't roll back to a reasonably working state, with data loss of a known quantity that I can recover from backup, I'll have to destroy and filesystem and recover from scratch, which will take multiple days. Since I can't wait too long before getting back to a working state, I think I'm going to try btrfs check --repair after a scrub to get a list of all the pathanmes/inodes that are known to be damaged, and work from there. Sounds reasonable? Also, how is --mode=lowmem being useful? And for re-parenting a sub-subvolume, is that possible? (I want to delete /sub1/ but I can't because I have /sub1/sub2 that's also a subvolume and I'm not sure how to re-parent sub2 to somewhere else so that I can subvolume delete sub1) In the meantime, a simple check without repair looks like this. It will likely take many hours to complete: gargamel:/var/local/space# btrfs check /dev/mapper/dshelf2 Checking filesystem on /dev/mapper/dshelf2 UUID: 03e9a50c-1ae6-4782-ab9c-5f310a98e653 checking extents checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 bytenr mismatch, want=2899180224512, have=3981076597540270796 checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5 checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5 checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B parent transid verify failed on 1671538819072 wanted 293964 found 293902 parent transid verify failed on 1671538819072 wanted 293964 found 293902 checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0 checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0 checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00 checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09 checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09 checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 bytenr mismatch, want=2899180224512, have=3981076597540270796 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 (...) Full output please. Sorry for not noticing the link. [Conclusion] After checking the full result, some of fs/subvolume trees are corrupted. [Details] Some example here: --- ref mismatch on [6674127745024 32768] extent item 0, found 1 Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 6674127745024 parent 7566652473344 owner 0 offset 0 found 1 wanted 0 back
[PATCH] btrfs: cleanup qgroup trace event
Commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for qgroup data reserve) added the following events which aren't used. btrfs__qgroup_data_map btrfs_qgroup_init_data_rsv_map btrfs_qgroup_free_data_rsv_map So remove them. Signed-off-by: Anand Jaincc: quwen...@cn.fujitsu.com Reviewed-by: Qu Wenruo --- include/trace/events/btrfs.h | 36 1 file changed, 36 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index a3c3cab643a9..5471f9b4dc9e 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1270,42 +1270,6 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy, TP_ARGS(wq) ); -DECLARE_EVENT_CLASS(btrfs__qgroup_data_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved), - - TP_STRUCT__entry_btrfs( - __field(u64,rootid ) - __field(unsigned long, ino ) - __field(u64,free_reserved ) - ), - - TP_fast_assign_btrfs(btrfs_sb(inode->i_sb), - __entry->rootid = BTRFS_I(inode)->root->objectid; - __entry->ino= inode->i_ino; - __entry->free_reserved = free_reserved; - ), - - TP_printk_btrfs("rootid=%llu ino=%lu free_reserved=%llu", - __entry->rootid, __entry->ino, __entry->free_reserved) -); - -DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved) -); - -DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved) -); - #define BTRFS_QGROUP_OPERATIONS\ { QGROUP_RESERVE, "reserve" }, \ { QGROUP_RELEASE, "release" }, \ -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?
At 05/02/2017 11:23 AM, Marc MERLIN wrote: Hi Chris, Thanks for the reply, much appreciated. On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote: What about btfs check (no repair), without and then also with --mode=lowmem? In theory I like the idea of a 24 hour rollback; but in normal usage Btrfs will eventually free up space containing stale and no longer necessary metadata. Like the chunk tree, it's always changing, so you get to a point, even with snapshots, that the old state of that tree is just - gone. A snapshot of an fs tree does not make the chunk tree frozen in time. Right, of course, I was being way over optimistic here. I kind of forgot that metadata wasn't COW, my bad. In any case, it's a big problem in my mind if no existing tools can fix a file system of this size. So before making anymore changes, make sure you have a btrfs-image somewhere, even if it's huge. The offline checker needs to be able to repair it, right now it's all we have for such a case. The image will be huge, and take maybe 24H to make (last time it took some silly amount of time like that), and honestly I'm not sure how useful it'll be. Outside of the kernel crashing if I do a btrfs balance, and hopefully the crash report I gave is good enough, the state I'm in is not btrfs' fault. If I can't roll back to a reasonably working state, with data loss of a known quantity that I can recover from backup, I'll have to destroy and filesystem and recover from scratch, which will take multiple days. Since I can't wait too long before getting back to a working state, I think I'm going to try btrfs check --repair after a scrub to get a list of all the pathanmes/inodes that are known to be damaged, and work from there. Sounds reasonable? Also, how is --mode=lowmem being useful? And for re-parenting a sub-subvolume, is that possible? (I want to delete /sub1/ but I can't because I have /sub1/sub2 that's also a subvolume and I'm not sure how to re-parent sub2 to somewhere else so that I can subvolume delete sub1) In the meantime, a simple check without repair looks like this. It will likely take many hours to complete: gargamel:/var/local/space# btrfs check /dev/mapper/dshelf2 Checking filesystem on /dev/mapper/dshelf2 UUID: 03e9a50c-1ae6-4782-ab9c-5f310a98e653 checking extents checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A checksum verify failed on 3096461459456 found 0E6B7980 wanted FBE5477A checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 bytenr mismatch, want=2899180224512, have=3981076597540270796 checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5 checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5 checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B parent transid verify failed on 1671538819072 wanted 293964 found 293902 parent transid verify failed on 1671538819072 wanted 293964 found 293902 checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0 checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0 checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00 checksum verify failed on 1759425052672 found 843B59F1 wanted F0FF7D00 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09 checksum verify failed on 2898779357184 found 96395131 wanted 433D6E09 checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5 bytenr mismatch, want=2899180224512, have=3981076597540270796 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 checksum verify failed on 2182657212416 found CD8EFC0C wanted 70847071 (...) Full output please. I know it will be long, but the point here is, full output could help us to at least locate where the most corruption are. If most corruption are only in extent tree, the chance to recover will increase hugely. As extent tree is just a backref for all allocated extents, it's not really important if recovery (read) is the primary goal. But if other tree (fs or subvolume tree important for you) also get corrupted, I'm afraid your last chance will be
Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?
At 05/02/2017 02:08 AM, Marc MERLIN wrote: So, I forgot to mention that it's my main media and backup server that got corrupted. Yes, I do actually have a backup of a backup server, but it's going to take days to recover due to the amount of data to copy back, not counting lots of manual typing due to the number of subvolumes, btrfs send/receive relationships and so forth. Really, I should be able to roll back all writes from the last 24H, run a check --repair/scrub on top just to be sure, and be back on track. In the meantime, the good news is that the filesystem doesn't crash the kernel (the poasted crash below) now that I was able to cancel the btrfs balance, but it goes read only at the drop of a hat, even when I'm trying to delete recent snapshots and all data that was potentially written in the last 24H On Mon, May 01, 2017 at 10:06:41AM -0700, Marc MERLIN wrote: I have a filesystem that sadly got corrupted by a SAS card I just installed yesterday. I don't think in a case like this, there is there a way to roll back all writes across all subvolumes in the last 24H, correct? Sorry for the late reply. I thought the case is already finished as I see little chance to recover. :( No, no way to roll back unless you're completely sure there is only 1 transaction commit happened in last 24H. (Well, not really possible in real world) Btrfs is only capable to rollback to *previous* commit. That's ensure by forced metadata CoW. But beyond previous commit, only god knows. If all metadata CoW write is done in some place never used by any previous metadata, then there is the chance to recover. But mostly the possibility is very low, some mount option like ssd will change the extent allocator behavior to improve the possibility, but still need a lot of luck. More detailed comment will be replied to btrfs check mail. Thanks, Qu Is the best thing to go in each subvolume, delete the recent snapshots and rename the one from 24H as the current one? Well, just like I expected, it's a pain in the rear and this can't even help fix the top level mountpoint which doesn't have snapshots, so I can't roll it back. btrfs should really have an easy way to roll back X hours, or days to recover from garbage written after a good known point, given that it is COW afterall. Is there a way do this with check --repair maybe? In the meantime, I got stuck while trying to delete snapshots: Let's say I have this: ID 428 gen 294021 top level 5 path backup ID 2023 gen 294021 top level 5 path Soft ID 3021 gen 294051 top level 428 path backup/debian32 ID 4400 gen 294018 top level 428 path backup/debian64 ID 4930 gen 294019 top level 428 path backup/ubuntu I can easily Delete subvolume (no-commit): '/mnt/btrfs_pool2/Soft' and then: gargamel:/mnt/btrfs_pool2# mv Soft_rw.20170430_01:50:22 Soft But I can't delete backup, which actually is mostly only a directory containing other things (in hindsight I shouldn't have made that a subvolume) Delete subvolume (no-commit): '/mnt/btrfs_pool2/backup' ERROR: cannot delete '/mnt/btrfs_pool2/backup': Directory not empty This is because backup has a lot of subvolumes due to btrfs send/receive relationships. Is it possible to recover there? Can you reparent subvolumes to a different subvolume without doing a full copy via btrfs send/receive? Thanks, Marc BTRFS warning (device dm-5): failed to load free space cache for block group 6746013696000, rebuilding it now BTRFS warning (device dm-5): block group 6754603630592 has wrong amount of free space BTRFS warning (device dm-5): failed to load free space cache for block group 6754603630592, rebuilding it now BTRFS warning (device dm-5): block group 7125178777600 has wrong amount of free space BTRFS warning (device dm-5): failed to load free space cache for block group 7125178777600, rebuilding it now BTRFS error (device dm-5): bad tree block start 3981076597540270796 2899180224512 BTRFS error (device dm-5): bad tree block start 942082474969670243 2899180224512 BTRFS: error (device dm-5) in __btrfs_free_extent:6944: errno=-5 IO failure BTRFS info (device dm-5): forced readonly BTRFS: error (device dm-5) in btrfs_run_delayed_refs:2961: errno=-5 IO failure BUG: unable to handle kernel NULL pointer dereference at (null) IP: __del_reloc_root+0x3f/0xa6 PGD 189a0e067 PUD 189a0f067 PMD 0 Oops: [#1] PREEMPT SMP Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm
Re: btrfsck lowmem mode shows corruptions
At 05/05/2017 01:29 AM, Kai Krakow wrote: Hello! Since I saw a few kernel freezes lately (due to experimenting with ck-sources) including some filesystem-related backtraces, I booted my rescue system to check my btrfs filesystem. Luckily, it showed no problems. It said, everything's fine. But I also thought: Okay, let's try lowmem mode. And that showed a frightening long list of extent corruptions und unreferenced chunks. Should I worry? Thanks for trying lowmem mode. Would you please provide the version of btrfs-progs? IIRC "ERROR: data extent[96316809216 2097152] backref lost" bug has been fixed in recent release. And for reference, would you please provide the tree dump of your chunk and device tree? This can be done by running: # btrfs-debug-tree -t device # btrfs-debug-tree -t chunk And this 2 dump only contains the btrfs chunk mapping info, so nothing sensitive is contained. Thanks, Qu PS: The freezes seem to be related to bfq, switching to deadline solved these. Full log attached, here's an excerpt: ---8<--- checking extents ERROR: chunk[256 4324327424) stripe 0 did not find the related dev extent ERROR: chunk[256 4324327424) stripe 1 did not find the related dev extent ERROR: chunk[256 4324327424) stripe 2 did not find the related dev extent ERROR: chunk[256 7545552896) stripe 0 did not find the related dev extent ERROR: chunk[256 7545552896) stripe 1 did not find the related dev extent ERROR: chunk[256 7545552896) stripe 2 did not find the related dev extent [...] ERROR: device extent[1, 1094713344, 1073741824] did not find the related chunk ERROR: device extent[1, 2168455168, 1073741824] did not find the related chunk ERROR: device extent[1, 3242196992, 1073741824] did not find the related chunk [...] ERROR: device extent[2, 608854605824, 1073741824] did not find the related chunk ERROR: device extent[2, 609928347648, 1073741824] did not find the related chunk ERROR: device extent[2, 611002089472, 1073741824] did not find the related chunk [...] ERROR: device extent[3, 64433946624, 1073741824] did not find the related chunk ERROR: device extent[3, 65507688448, 1073741824] did not find the related chunk ERROR: device extent[3, 66581430272, 1073741824] did not find the related chunk [...] ERROR: data extent[96316809216 2097152] backref lost ERROR: data extent[96316809216 2097152] backref lost ERROR: data extent[96316809216 2097152] backref lost ERROR: data extent[686074396672 13737984] backref lost ERROR: data extent[686074396672 13737984] backref lost ERROR: data extent[686074396672 13737984] backref lost [...] ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots ERROR: errors found in fs roots Checking filesystem on /dev/disk/by-label/system UUID: bc201ce5-8f2b-4263-995a-6641e89d4c88 found 1960075935744 bytes used, error(s) found total csum bytes: 1673537040 total tree bytes: 4899094528 total fs tree bytes: 2793914368 total extent tree bytes: 190398464 btree space waste bytes: 871743708 file data blocks allocated: 6907169177600 referenced 1979268648960 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH] btrfs: clean up qgroup trace event
At 05/04/2017 10:04 PM, Anand Jain wrote: Hi Qu, The commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for qgroup data reserve) added the following events which aren't used. btrfs__qgroup_data_map btrfs_qgroup_init_data_rsv_map btrfs_qgroup_free_data_rsv_map I wonder if it is better to remove or keep it for future use. Please remove them. These 2 old tracepoints are never used due to later patch split. Some of the old caller doesn't ever exist. Reviewed-by: Qu WenruoThanks for catching this, Qu Signed-off-by: Anand Jain cc: quwen...@cn.fujitsu.com --- include/trace/events/btrfs.h | 36 1 file changed, 36 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index a3c3cab643a9..5471f9b4dc9e 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1270,42 +1270,6 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy, TP_ARGS(wq) ); -DECLARE_EVENT_CLASS(btrfs__qgroup_data_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved), - - TP_STRUCT__entry_btrfs( - __field(u64,rootid ) - __field(unsigned long, ino ) - __field(u64,free_reserved ) - ), - - TP_fast_assign_btrfs(btrfs_sb(inode->i_sb), - __entry->rootid = BTRFS_I(inode)->root->objectid; - __entry->ino = inode->i_ino; - __entry->free_reserved = free_reserved; - ), - - TP_printk_btrfs("rootid=%llu ino=%lu free_reserved=%llu", - __entry->rootid, __entry->ino, __entry->free_reserved) -); - -DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved) -); - -DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved) -); - #define BTRFS_QGROUP_OPERATIONS \ { QGROUP_RESERVE, "reserve" }, \ { QGROUP_RELEASE, "release" }, \ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File system corruption, btrfsck abort
At 05/03/2017 10:21 PM, Christophe de Dinechin wrote: On 2 May 2017, at 02:17, Qu Wenruowrote: At 04/28/2017 04:47 PM, Christophe de Dinechin wrote: On 28 Apr 2017, at 02:45, Qu Wenruo wrote: At 04/26/2017 01:50 AM, Christophe de Dinechin wrote: Hi, I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash. The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there. According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard. What about using the latest btrfs-progs v4.10.2? I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4. I am currently debugging with a build from the master branch as of Tuesday (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2 There was no change in behavior. Runs are split about evenly between list crash and abort. I added instrumentation and tried a fix, which brings me a tiny bit further, until I hit a message from delete_duplicate_records: Ok we have overlapping extents that aren't completely covered by each other, this is going to require more careful thought. The extents are [52428800-16384] and [52432896-16384] Then I think lowmem mode may have better chance to handle it without crash. I tried it and got: [root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4 enabling repair mode ERROR: low memory mode doesn't support repair yet The problem only occurred in —repair mode anyway. Furthermore for v4.10.2, btrfs check provides a new mode called lowmem. You could try "btrfs check --mode=lowmem" to see if such problem can be avoided. I will try that, but what makes you think this is a memory-related condition? The machine has 16G of RAM, isn’t that enough for an fsck? Not for memory usage, but in fact lowmem mode is a completely rework, so I just want to see how good or bad the new lowmem mode handles it. Is there a prototype with lowmem and repair? Yes, Su Yue submitted a patchset for it, but still repair is only supported for fs tree contents. https://www.spinics.net/lists/linux-btrfs/msg63316.html Repairing other trees, especially extent tree, is not supported yet. Thanks, Qu Thanks Christophe Thanks, Qu For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong. For now, I’m focusing on the “repair” part as much as I can, because I assume the kernel bug is there anyway, so someone else is bound to hit this problem. Thanks Christophe Thanks, Qu The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today. The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this: 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffd120 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffcf80 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffd120 I don’t really know what to make of it. The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL. #0 list_del (entry=0x55db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125 #1 free_all_extent_backrefs (rec=0x55db0350) at cmds-check.c:5386 #2 maybe_free_extent_rec (extent_cache=0x7fffd990, rec=0x55db0350) at cmds-check.c:5417 #3 0x555b308f in check_block (flags=, buf=0x7b87cdf0, extent_cache=0x7fffd990, root=0x5587d570) at cmds-check.c:5851 #4 run_next_block (root=root@entry=0x5587d570, bits=bits@entry=0x558841 I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place. I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the
btrfsck lowmem mode shows corruptions
Hello! Since I saw a few kernel freezes lately (due to experimenting with ck-sources) including some filesystem-related backtraces, I booted my rescue system to check my btrfs filesystem. Luckily, it showed no problems. It said, everything's fine. But I also thought: Okay, let's try lowmem mode. And that showed a frightening long list of extent corruptions und unreferenced chunks. Should I worry? PS: The freezes seem to be related to bfq, switching to deadline solved these. Full log attached, here's an excerpt: ---8<--- checking extents ERROR: chunk[256 4324327424) stripe 0 did not find the related dev extent ERROR: chunk[256 4324327424) stripe 1 did not find the related dev extent ERROR: chunk[256 4324327424) stripe 2 did not find the related dev extent ERROR: chunk[256 7545552896) stripe 0 did not find the related dev extent ERROR: chunk[256 7545552896) stripe 1 did not find the related dev extent ERROR: chunk[256 7545552896) stripe 2 did not find the related dev extent [...] ERROR: device extent[1, 1094713344, 1073741824] did not find the related chunk ERROR: device extent[1, 2168455168, 1073741824] did not find the related chunk ERROR: device extent[1, 3242196992, 1073741824] did not find the related chunk [...] ERROR: device extent[2, 608854605824, 1073741824] did not find the related chunk ERROR: device extent[2, 609928347648, 1073741824] did not find the related chunk ERROR: device extent[2, 611002089472, 1073741824] did not find the related chunk [...] ERROR: device extent[3, 64433946624, 1073741824] did not find the related chunk ERROR: device extent[3, 65507688448, 1073741824] did not find the related chunk ERROR: device extent[3, 66581430272, 1073741824] did not find the related chunk [...] ERROR: data extent[96316809216 2097152] backref lost ERROR: data extent[96316809216 2097152] backref lost ERROR: data extent[96316809216 2097152] backref lost ERROR: data extent[686074396672 13737984] backref lost ERROR: data extent[686074396672 13737984] backref lost ERROR: data extent[686074396672 13737984] backref lost [...] ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots ERROR: errors found in fs roots Checking filesystem on /dev/disk/by-label/system UUID: bc201ce5-8f2b-4263-995a-6641e89d4c88 found 1960075935744 bytes used, error(s) found total csum bytes: 1673537040 total tree bytes: 4899094528 total fs tree bytes: 2793914368 total extent tree bytes: 190398464 btree space waste bytes: 871743708 file data blocks allocated: 6907169177600 referenced 1979268648960 -- Regards, Kai Replies to list-only preferred. lowmem.txt.gz Description: application/gzip
Re: [PATCH 9/9] btrfs-progs: modify: Introduce option to specify the pattern to fill mirror
On Sun, Apr 23, 2017 at 01:12:42PM +0530, Lakshmipathi.G wrote: > Thanks for the example and details. I understood some and need to > re-read couple of more times to understand the remaining. > > btw, I created a corruption framework(with previous org), the sample > usage and example is below. It looks similar to Btrfs corruption tool. > thanks. > > -- > corrupt.py --help [...] Interesting, can you please share the script? This is another alternative that seems more plausible for rapid prototyping of various corruption scenarios. The C utility (either existing btrfs-corrupt-block or the proposed btrfs-modify) can become tedious to change, but can be compiled and distributed without the python dependency. I wanted to use something python-based for tests when Hans announced the python-btrfs project, but it has broader goals than just the testsuite needs. So we could have our own corrupt.py, just for our internal use. I'm not sure if a compiled tool like btrfs-modify is really needed, but why we can't have both. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Struggling with file system slowness
Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted: > Hi All, > > Trying to peg down why I have one server that has btrfs-transacti pegged > at 100% CPU for most of the time. > > I thought this might have to do with fragmentation as mentioned in the > Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as > mentioned in the wiki), but after running a full defrag of the file > system, and also enabling the 'autodefrag' mount option, the problem > still persists. > > What's the best way to figure out what btrfs is chugging away at here? > > Kernel: 4.10.13-custom > btrfs-progs: v4.10.2 Headed for work so briefer than usual... Three questions: Number of snapshots per subvolume? Quotas enabled? Do you do dedupe or otherwise have lots of reflinks? These dramatically affect scaling. Keeping the number of snapshots per subvolume under 300, under 100 if possible, should help a lot. Quotas dramatically worsen the problem, so keeping them disabled unless your use- case calls for them should help (and if your use-case calls for them, consider a filesystem where the quota feature is more mature). And reflinks are the mechanism behind snapshots, so too many of them for other reasons (such as dedupe) create problems too, tho a snapshot basically reflinks /everything/, so it takes quite a few reflinks to trigger the scaling issues of a single snapshot, meaning they aren't normally a problem unless dedupe is done on a /massive/ scale. Of course defrag interacts with snapshots too, tho it shouldn't affect /this/ problem, but potentially eating up more space than expected as it breaks the reflinks. Beyond that, have you tried a (readonly) btrfs check and/or a scrub or balance recently? Perhaps there's something wrong that's snagging things, and you simply haven't otherwise detected it yet? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Struggling with file system slowness
> Trying to peg down why I have one server that has > btrfs-transacti pegged at 100% CPU for most of the time. Too little information. Is IO happening at the same time? Is compression on? Deduplicated? Lots of subvolumes? SSD? What kind of workload and file size/distribution profile? Typical high CPU are extents (your defragging not necessarily worked), and 'qgroups', especially with many subvolumes. It could be the fre space cache in some rare cases. https://www.google.ca/search?num=100=images_q=cxpu_epq=btrfs-transaction To this something like this happens often, but is not Btrfs-related, but triggered for example by near-memory exhaustion in the kernel memory manager. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH] btrfs: clean up qgroup trace event
Hi Qu, The commit 81fb6f77a026 (btrfs: qgroup: Add new trace point for qgroup data reserve) added the following events which aren't used. btrfs__qgroup_data_map btrfs_qgroup_init_data_rsv_map btrfs_qgroup_free_data_rsv_map I wonder if it is better to remove or keep it for future use. Signed-off-by: Anand Jaincc: quwen...@cn.fujitsu.com --- include/trace/events/btrfs.h | 36 1 file changed, 36 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index a3c3cab643a9..5471f9b4dc9e 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1270,42 +1270,6 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy, TP_ARGS(wq) ); -DECLARE_EVENT_CLASS(btrfs__qgroup_data_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved), - - TP_STRUCT__entry_btrfs( - __field(u64,rootid ) - __field(unsigned long, ino ) - __field(u64,free_reserved ) - ), - - TP_fast_assign_btrfs(btrfs_sb(inode->i_sb), - __entry->rootid = BTRFS_I(inode)->root->objectid; - __entry->ino= inode->i_ino; - __entry->free_reserved = free_reserved; - ), - - TP_printk_btrfs("rootid=%llu ino=%lu free_reserved=%llu", - __entry->rootid, __entry->ino, __entry->free_reserved) -); - -DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved) -); - -DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map, - - TP_PROTO(struct inode *inode, u64 free_reserved), - - TP_ARGS(inode, free_reserved) -); - #define BTRFS_QGROUP_OPERATIONS\ { QGROUP_RESERVE, "reserve" }, \ { QGROUP_RELEASE, "release" }, \ -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Struggling with file system slowness
Hi All, Trying to peg down why I have one server that has btrfs-transacti pegged at 100% CPU for most of the time. I thought this might have to do with fragmentation as mentioned in the Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as mentioned in the wiki), but after running a full defrag of the file system, and also enabling the 'autodefrag' mount option, the problem still persists. What's the best way to figure out what btrfs is chugging away at here? Kernel: 4.10.13-custom btrfs-progs: v4.10.2 -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File system corruption, btrfsck abort
> On 3 May 2017, at 16:21, Christophe de Dinechinwrote: > >> >> On 2 May 2017, at 02:17, Qu Wenruo wrote: >> >> >> >> At 04/28/2017 04:47 PM, Christophe de Dinechin wrote: On 28 Apr 2017, at 02:45, Qu Wenruo wrote: At 04/26/2017 01:50 AM, Christophe de Dinechin wrote: > Hi, > I”ve been trying to run btrfs as my primary work filesystem for about 3-4 > months now on Fedora 25 systems. I ran a few times into filesystem > corruptions. At least one I attributed to a damaged disk, but the last > one is with a brand new 3T disk that reports no SMART errors. Worse yet, > in at least three cases, the filesystem corruption caused btrfsck to > crash. > The last filesystem corruption is documented here: > https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in > there. According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard. What about using the latest btrfs-progs v4.10.2? >>> I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4. >>> I am currently debugging with a build from the master branch as of Tuesday >>> (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2 >>> There was no change in behavior. Runs are split about evenly between list >>> crash and abort. >>> I added instrumentation and tried a fix, which brings me a tiny bit >>> further, until I hit a message from delete_duplicate_records: >>> Ok we have overlapping extents that aren't completely covered by each >>> other, this is going to require more careful thought. The extents are >>> [52428800-16384] and [52432896-16384] >> >> Then I think lowmem mode may have better chance to handle it without crash. > > I tried it and got: > > [root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4 > enabling repair mode > ERROR: low memory mode doesn't support repair yet > > The problem only occurred in —repair mode anyway. For what it’s worth, without the --repair option, it gets stuck. I stopped it after 24 hours, it had printed: [root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem /dev/sda4 Checking filesystem on /dev/sda4 UUID: 26a0c84c-d2ac-4da8-b880-684f2ea48a22 checking extents checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03 checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03 checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03 checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03 Csum didn't match ERROR: extent [52428800 16384] lost referencer (owner: 7, level: 0) checksum verify failed on 52445184 found 8D1BE62F wanted checksum verify failed on 52445184 found 8D1BE62F wanted checksum verify failed on 52445184 found 8D1BE62F wanted checksum verify failed on 52445184 found 8D1BE62F wanted bytenr mismatch, want=52445184, have=219902322 ERROR: extent [52445184 16384] lost referencer (owner: 2, level: 0) ERROR: extent[52432896 16384] backref lost (owner: 2, level: 0) ERROR: check leaf failed root 2 bytenr 52432896 level 0, force continue check Any tips for further debugging this? Christophe > > >> Furthermore for v4.10.2, btrfs check provides a new mode called lowmem. You could try "btrfs check --mode=lowmem" to see if such problem can be avoided. >>> I will try that, but what makes you think this is a memory-related >>> condition? The machine has 16G of RAM, isn’t that enough for an fsck? >> >> Not for memory usage, but in fact lowmem mode is a completely rework, so I >> just want to see how good or bad the new lowmem mode handles it. > > Is there a prototype with lowmem and repair? > > > Thanks > Christophe > >> >> Thanks, >> Qu >> For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong. >>> For now, I’m focusing on the “repair” part as much as I can, because I >>> assume the kernel bug is there anyway, so someone else is bound to hit this >>> problem. >>> Thanks >>> Christophe Thanks, Qu > The btrfsck crash is here: > https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash > modes: either an abort or a SIGSEGV. I checked that both still happens on > master as of today. > The cause of the abort is that we call set_extent_dirty from > check_extent_refs with rec->max_size == 0. I’ve instrumented to try to > see where we set this to 0 (see > https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do > sometimes see max_size set to 0 in a few locations. My instrumentation > shows this: > 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 > max_size 16384 tmpl 0x7fffd120 > 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 > from tmpl 0x7fffcf80
help converting btrfs to new writeback error tracking?
I've been working on set of patches to clean up how writeback errors are tracked and handled in the kernel: http://marc.info/?l=linux-fsdevel=149304074111261=2 The basic idea is that rather than having a set of flags that are cleared whenever they are checked, we have a sequence counter and error that are tracked on a per-mapping basis, and can then use that sequence counter to tell whether the error should be reported. This changes the way that things like filemap_write_and_wait work. Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared inappropriately (and thus losing errors that should be reported), you can now tell whether there has been a writeback error since a certain point in time, irrespective of whether anyone else is checking for errors. I've been doing some conversions of the existing code to the new scheme, but btrfs has _really_ complicated error handling. I think it could probably be simplified with this new scheme, but I could use some help here. What I think we probably want to do is to sample the error sequence in the mapping at well-defined points in time (probably when starting a transaction?) and then use that to determine whether writeback errors have occurred since then. Is there anyone in the btrfs community who could help me here? Thanks, -- Jeff Layton-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html