On 2017年12月10日 08:29, Qu Wenruo wrote:
> 
> 
> On 2017年12月10日 07:12, Benjamin Beichler wrote:
>> Hi Qu,
>>
>> 2017-12-07 12:09 GMT+00:00 Qu Wenruo <quwenruo.bt...@gmx.com>:
>>>
>>> Since the btrfs chunk recovery doesn't work and my dirty quick hack
>>> doesn't work either, I don't expect much to recovery.
>>>
>>> Unless we have more detailed info about the how and why the BUG_ON() of
>>> chunk recovery is triggered.
>>>
>>> That's to say, it will be a quite time consuming work to use gdb to
>>> locate the problem, and see if any developer (mostly me) could use the
>>> info to further dig into the problem or fix it.
>>> (Considering the difference in timezone, I expect at least 8+ weeks to
>>> get a conclusion)
>>
>> I'm really pleased that you want to help me, of course the current
>> backtrace was quite useless.
>> Firstly, I revised the code a bit, and since one run over the 1,7TB
>> drive took about 6h, I thought about saving the state of already found
>> chunks. I simply saved all bytenr which are valid to a file. The
>> consequence was a reduction of the time for scan_one_device to about
>> 30s. If you think this could be interesting for the normal version, I
>> could create a patch for this.
>>
>>>
>>> If you really want to do it, please step into the function
>>> btrfs_insert_item() in __rebuild_device_items() and to see at which
>>> point -EIO is returned.
>>>
>>> My guess is btrfs_search_slot() call in btrfs_insert_empty_items().
>>>
>>> If that's true, please call
>>>
>>> btrfs_print_tree(root->fs_info->chunk_root, 
>>> root->fs_info->chunk_root->node, 1)
>>>
>>> in gdb, just before the btrfs_search_slot() call above, to show what's
>>> the problem.
>>>
>> Your guess was right. The current stack trace and btrfs_print_tree is
>> under : https://gist.github.com/anonymous/2cf40ac1d3ddcbca95177acec78041b2
> 
> The output is very helpful.
> 
> I was originally thinking it's something more serious, but it turns out
> to be less serious than my expectation.
> 
>>
>> As you can see, the code in disk.io:321 explicitly exclude the the
>> sector from 0 to sectorsize, and states it is unaligned. I think
>> because the code found a chunk/block at address zero, this triggers
>> the problem. Is it possible, that there live chunks/blocks at address
>> 0 or is this fuzzy data?
> 
> 0 is completely valid in btrfs logical address space.
> 
> It's the IS_ALIGNED macro which caused the problem.
> So it's quite easy to fix in fact.

Sorry, IS_ALIGNED is working as expected.

It's the bytenr < sectorsize line causing the problem.
Please remove bytenr < sectorsize check, I'll submit a patch later to
fix it.

Thanks,
Qu

> 
> For 0, always return it as aligned should fix your problem.
> 
> Thanks,
> Qu
> 
>>
>>>
>>> BTW, currently nothing in chunk tree/super block contains any info of
>>> your fs, feel free to share it with the mail list, where more guys may help.
>>>
>> I added the list, I simply forgot it in some answer.
>>
>>> Thanks,
>>> Qu
>>>
>>
>> thanks
>>
>> Benjamin
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to