Re: Chunk-Recovery fails with alignment error

Qu Wenruo Sat, 09 Dec 2017 16:29:33 -0800


On 2017年12月10日 07:12, Benjamin Beichler wrote:
> Hi Qu,
> 
> 2017-12-07 12:09 GMT+00:00 Qu Wenruo <quwenruo.bt...@gmx.com>:
>>
>> Since the btrfs chunk recovery doesn't work and my dirty quick hack
>> doesn't work either, I don't expect much to recovery.
>>
>> Unless we have more detailed info about the how and why the BUG_ON() of
>> chunk recovery is triggered.
>>
>> That's to say, it will be a quite time consuming work to use gdb to
>> locate the problem, and see if any developer (mostly me) could use the
>> info to further dig into the problem or fix it.
>> (Considering the difference in timezone, I expect at least 8+ weeks to
>> get a conclusion)
> 
> I'm really pleased that you want to help me, of course the current
> backtrace was quite useless.
> Firstly, I revised the code a bit, and since one run over the 1,7TB
> drive took about 6h, I thought about saving the state of already found
> chunks. I simply saved all bytenr which are valid to a file. The
> consequence was a reduction of the time for scan_one_device to about
> 30s. If you think this could be interesting for the normal version, I
> could create a patch for this.
> 
>>
>> If you really want to do it, please step into the function
>> btrfs_insert_item() in __rebuild_device_items() and to see at which
>> point -EIO is returned.
>>
>> My guess is btrfs_search_slot() call in btrfs_insert_empty_items().
>>
>> If that's true, please call
>>
>> btrfs_print_tree(root->fs_info->chunk_root, root->fs_info->chunk_root->node, 
>> 1)
>>
>> in gdb, just before the btrfs_search_slot() call above, to show what's
>> the problem.
>>
> Your guess was right. The current stack trace and btrfs_print_tree is
> under : https://gist.github.com/anonymous/2cf40ac1d3ddcbca95177acec78041b2


The output is very helpful.

I was originally thinking it's something more serious, but it turns out
to be less serious than my expectation.

> 
> As you can see, the code in disk.io:321 explicitly exclude the the
> sector from 0 to sectorsize, and states it is unaligned. I think
> because the code found a chunk/block at address zero, this triggers
> the problem. Is it possible, that there live chunks/blocks at address
> 0 or is this fuzzy data?

0 is completely valid in btrfs logical address space.

It's the IS_ALIGNED macro which caused the problem.
So it's quite easy to fix in fact.

For 0, always return it as aligned should fix your problem.

Thanks,
Qu

> 
>>
>> BTW, currently nothing in chunk tree/super block contains any info of
>> your fs, feel free to share it with the mail list, where more guys may help.
>>
> I added the list, I simply forgot it in some answer.
> 
>> Thanks,
>> Qu
>>
> 
> thanks
> 
> Benjamin
>

signature.asc
Description: OpenPGP digital signature

Re: Chunk-Recovery fails with alignment error

Reply via email to