Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

Qu Wenruo Tue, 02 Apr 2019 07:12:32 -0700


On 2019/4/2 下午9:59, Nik. wrote:
> 
> 
> 2019-04-02 15:24, Qu Wenruo:
>>
>>
>> On 2019/4/2 下午9:06, Nik. wrote:
>>>
>>> 2019-04-02 02:24, Qu Wenruo:
>>>>
>>>> On 2019/4/1 上午2:44, bt...@avgustinov.eu wrote:
>>>>> Dear all,
>>>>>
>>>>>
>>>>> I am a big fan of btrfs, and I am using it since 2013 - in the
>>>>> meantime
>>>>> on at least four different computers. During this time, I suffered at
>>>>> least four bad btrfs-failures leading to unmountable, unreadable and
>>>>> unrecoverable file system. Since in three of the cases I did not
>>>>> manage
>>>>> to recover even a single file, I am beginning to lose my confidence in
>>>>> btrfs: for 35-years working with different computers no other file
>>>>> system was so bad at recovering files!
>>>>>
>>>>> Considering the importance of btrfs and keeping in mind the number of
>>>>> similar failures, described in countless forums on the net, I have got
>>>>> an idea: to donate my last two damaged filesystems for investigation
>>>>> purposes and thus hopefully contribute to the improvement of btrfs.
>>>>> One
>>>>> condition: any recovered personal data (mostly pictures and audio
>>>>> files)
>>>>> should remain undisclosed and be deleted.
>>>>>
>>>>> Should anybody be interested in this - feel free to contact me
>>>>> personally (I am not reading the list regularly!), otherwise I am
>>>>> going
>>>>> to reformat and reuse both systems in two weeks from today.
>>>>>
>>>>> Some more info:
>>>>>
>>>>>     - The smaller system is 83.6GB, I could either send you an
>>>>> image of
>>>>> this system on an unneeded hard drive or put it into a dedicated
>>>>> computer and give you root rights and ssh-access to it (the network
>>>>> link
>>>>> is 100Mb down, 50Mb up, so it should be acceptable).
>>>>
>>>> I'm a little more interested in this case, as it's easier to debug.
>>>>
>>>> However there is one requirement before debugging.
>>>>
>>>> *NO* btrfs check --repair/--init-* run at all.
>>>> btrfs check --repair is known to cause transid error.
>>>
>>> unfortunately, this file system was used as testbed and even
>>> "btrfs check --repair --check-data-csum --init-csum-tree --init-extent
>>> tree ..." was attempted on it.
>>> So I assume you are not interested.
>>
>> Then the fs can be further corrupted, so I'm not interested.
>>
>>>
>>> On the larger file system only "btrfs check --repair --readonly ..." was
>>> attempted (without success; most command executions were documented, so
>>> the results can be made available), no writing commands were issued.
>>
>> --repair will cause write, unless it even failed to open the filesystem.
>>
>> If that's the case, it would be pretty interesting for me to poking
>> around the fs, and obviously, all read-only.
>>
>>>
>>>> And, I'm afraid even with some debugging, the result would be pretty
>>>> predictable.
>>>
>>> I do not need anything from the smaller file system and have (hopefully
>>> fresh enough) backups from the bigger one.
>>> I would be good enough if it helps to find any bugs, which are still in
>>> the code.
>>>
>>>> It will be 90% transid error.
>>>> And if it's tree block from future, then it's something barrier
>>>> related.
>>>> If it's tree block from the past, then it's some tree block doesn't
>>>> reach disk.
>>>>
>>>> We have being chasing the spectre for a long time, had several
>>>> assumption but never pinned it down.
>>>
>>> IMHO spectre would lead to much bigger loses - at least in my case it
>>> could have happened all four times, but it did not.
>>>
>>>> But anyway, more info is always better.
>>>>
>>>> I'd like to get the ssh access for this smaller image.
>>>
>>> If you are still interested, please advise how to create the image of
>>> the file system.
>>
>> If the larger fs really doesn't get any write (btrfs check --repair
>> failed to open the fs, thus have the output "cannot open file system"),
>> I'm interesting in that one.
> 
> This is excerpt from the terminal log:
> "# btrfs check --readonly /dev/md0
> incorrect offsets 15003 146075
> ERROR: cannot open file system
> #"


That's great.

And to my surprise, this is completely different problem.

And I believe, it will be detected by latest write time tree checker
patches in next kernel release.

This problem is normally caused by memory bit flip.
This should ring a little alert about the problem.

Anyway, v5.2 or v5.3 kernel would be much better to catch such problems.

> 
> Btw., since the list does allow _plain_text_only, I wonder how do you
> quote?
> 
>> If not, then no.
>>
>>> I can imagine that it is preferable to use the
>>> original, but in my case it is a (not mounted) partition of a bigger
>>> hard drive, and the other partitions are in use. The "btrfs-image" seems
>>> inappropriate to me, "dd" will probably screw things up?
>>
>> Since the fs is too large, I don't think either way is good enough.
>>
>> So in this case, the best way for me to poke around is to give me a
>> caged container with only read access to the larger fs.
> 
> I am afraid that this machine is too weak for using containers on it
> (QNAP SS839Pro NAS, Intel Atom, 2GB RAM), and right now I do not have
> other machine, which could accommodate five hard drives. Let me consider
> how to organize this or give another idea. One way could be "async ssh"
> -  a private ssl-chat on one of my servers, so that you can write your
> commands there, I execute them on the machine as soon as I can and put
> the output back into the chat-window? Sounds silly, but could start
> immediately, and I have no better idea right now, sorry!

Your btrfs check output is already good enough to locate the problem.

The next thing would be just to help you recovery that image if that's
what you need.

The purposed idea is not that uncommon. In fact it's just another way of
"show commands, user execute and report, developer check the output" loop.

In your case, you just need latest btrfs-progs and re-run "btrfs check
--readonly" on it.

If it just shows the same result, meaning I can't get the info about
which tree block is corrupted, then you could try to mount it with -o ro
using *LATEST* kernel.

Latest kernel will report anything wrong pretty vocally, in that case,
dmesg would include the bytenr of corrupted tree block.

Then I could craft needed commands to further debug the fs.

Thanks,
Qu

> 
> Thank you for trying to improve btrfs!
> 
> Nik.
>>
>> Thanks,
>> Qu
> 
> You are not from the 007 - lab, are you? ;-)
> 
>>>
>>> Kind regards,
>>>
>>> Nik.
>>

signature.asc
Description: OpenPGP digital signature

Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

Reply via email to