On 2017年12月10日 08:29, Qu Wenruo wrote: > > > On 2017年12月10日 07:12, Benjamin Beichler wrote: >> Hi Qu, >> >> 2017-12-07 12:09 GMT+00:00 Qu Wenruo <quwenruo.bt...@gmx.com>: >>> >>> Since the btrfs chunk recovery doesn't work and my dirty quick hack >>> doesn't work either, I don't expect much to recovery. >>> >>> Unless we have more detailed info about the how and why the BUG_ON() of >>> chunk recovery is triggered. >>> >>> That's to say, it will be a quite time consuming work to use gdb to >>> locate the problem, and see if any developer (mostly me) could use the >>> info to further dig into the problem or fix it. >>> (Considering the difference in timezone, I expect at least 8+ weeks to >>> get a conclusion) >> >> I'm really pleased that you want to help me, of course the current >> backtrace was quite useless. >> Firstly, I revised the code a bit, and since one run over the 1,7TB >> drive took about 6h, I thought about saving the state of already found >> chunks. I simply saved all bytenr which are valid to a file. The >> consequence was a reduction of the time for scan_one_device to about >> 30s. If you think this could be interesting for the normal version, I >> could create a patch for this. >> >>> >>> If you really want to do it, please step into the function >>> btrfs_insert_item() in __rebuild_device_items() and to see at which >>> point -EIO is returned. >>> >>> My guess is btrfs_search_slot() call in btrfs_insert_empty_items(). >>> >>> If that's true, please call >>> >>> btrfs_print_tree(root->fs_info->chunk_root, >>> root->fs_info->chunk_root->node, 1) >>> >>> in gdb, just before the btrfs_search_slot() call above, to show what's >>> the problem. >>> >> Your guess was right. The current stack trace and btrfs_print_tree is >> under : https://gist.github.com/anonymous/2cf40ac1d3ddcbca95177acec78041b2 > > The output is very helpful. > > I was originally thinking it's something more serious, but it turns out > to be less serious than my expectation. > >> >> As you can see, the code in disk.io:321 explicitly exclude the the >> sector from 0 to sectorsize, and states it is unaligned. I think >> because the code found a chunk/block at address zero, this triggers >> the problem. Is it possible, that there live chunks/blocks at address >> 0 or is this fuzzy data? > > 0 is completely valid in btrfs logical address space. > > It's the IS_ALIGNED macro which caused the problem. > So it's quite easy to fix in fact.
Sorry, IS_ALIGNED is working as expected. It's the bytenr < sectorsize line causing the problem. Please remove bytenr < sectorsize check, I'll submit a patch later to fix it. Thanks, Qu > > For 0, always return it as aligned should fix your problem. > > Thanks, > Qu > >> >>> >>> BTW, currently nothing in chunk tree/super block contains any info of >>> your fs, feel free to share it with the mail list, where more guys may help. >>> >> I added the list, I simply forgot it in some answer. >> >>> Thanks, >>> Qu >>> >> >> thanks >> >> Benjamin >> >
signature.asc
Description: OpenPGP digital signature