Re: BTRFS RAID filesystem unmountable

Qu Wenruo Tue, 01 May 2018 18:32:32 -0700


On 2018年05月01日 23:50, Michael Wade wrote:
> Hi Qu,
> 
> Oh dear that is not good news!
> 
> I have been running the find root command since yesterday but it only
> seems to be only be outputting the following message:
> 
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096


It's mostly fine, as find-root will go through all tree blocks and try
to read them as tree blocks.
Although btrfs-find-root will suppress csum error output, but such basic
tree validation check is not suppressed, thus you get such message.

> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
> 
> I tried with the latest btrfs tools compiled from source and the ones
> I have installed with the same result. Is there a CLI utility I could
> use to determine if the log contains any other content?

Did it report any useful info at the end?

Thanks,
Qu

> 
> Kind regards
> Michael
> 
> 
> On 30 April 2018 at 04:02, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>
>>
>> On 2018年04月29日 22:08, Michael Wade wrote:
>>> Hi Qu,
>>>
>>> Got this error message:
>>>
>>> ./btrfs inspect dump-tree -b 20800943685632 /dev/md127
>>> btrfs-progs v4.16.1
>>> bytenr mismatch, want=20800943685632, have=3118598835113619663
>>> ERROR: cannot read chunk root
>>> ERROR: unable to open /dev/md127
>>>
>>> I have attached the dumps for:
>>>
>>> dd if=/dev/md127 of=/tmp/chunk_root.copy1 bs=1 count=32K skip=266325721088
>>> dd if=/dev/md127 of=/tmp/chunk_root.copy2 bs=1 count=32K skip=266359275520
>>
>> Unfortunately, both dumps are corrupted and contain mostly garbage.
>> I think it's the underlying stack (mdraid) has something wrong or failed
>> to recover its data.
>>
>> This means your last chance will be btrfs-find-root.
>>
>> Please try:
>> # btrfs-find-root -o 3 <device>
>>
>> And provide all the output.
>>
>> But please keep in mind, chunk root is a critical tree, and so far it's
>> already heavily damaged.
>> Although I could still continue try to recover, there is pretty low
>> chance now.
>>
>> Thanks,
>> Qu
>>>
>>> Kind regards
>>> Michael
>>>
>>>
>>> On 29 April 2018 at 10:33, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>>
>>>>
>>>> On 2018年04月29日 16:59, Michael Wade wrote:
>>>>> Ok, will it be possible for me to install the new version of the tools
>>>>> on my current kernel without overriding the existing install? Hesitant
>>>>> to update kernel/btrfs as it might break the ReadyNAS interface /
>>>>> future firmware upgrades.
>>>>>
>>>>> Perhaps I could grab this:
>>>>> https://github.com/kdave/btrfs-progs/releases/tag/v4.16.1 and
>>>>> hopefully build from source and then run the binaries directly?
>>>>
>>>> Of course, that's how most of us test btrfs-progs builds.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Kind regards
>>>>>
>>>>> On 29 April 2018 at 09:33, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 2018年04月29日 16:11, Michael Wade wrote:
>>>>>>> Thanks Qu,
>>>>>>>
>>>>>>> Please find attached the log file for the chunk recover command.
>>>>>>
>>>>>> Strangely, btrfs chunk recovery found no extra chunk beyond current
>>>>>> system chunk range.
>>>>>>
>>>>>> Which means, it's chunk tree corrupted.
>>>>>>
>>>>>> Please dump the chunk tree with latest btrfs-progs (which provides the
>>>>>> new --follow option).
>>>>>>
>>>>>> # btrfs inspect dump-tree -b 20800943685632 <device>
>>>>>>
>>>>>> If it doesn't work, please provide the following binary dump:
>>>>>>
>>>>>> # dd if=<dev> of=/tmp/chunk_root.copy1 bs=1 count=32K skip=266325721088
>>>>>> # dd if=<dev> of=/tmp/chunk_root.copy2 bs=1 count=32K skip=266359275520
>>>>>> (And will need to repeat similar dump for several times according to
>>>>>> above dump)
>>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Michael
>>>>>>>
>>>>>>> On 28 April 2018 at 12:38, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2018年04月28日 17:37, Michael Wade wrote:
>>>>>>>>> Hi Qu,
>>>>>>>>>
>>>>>>>>> Thanks for your reply. I will investigate upgrading the kernel,
>>>>>>>>> however I worry that future ReadyNAS firmware upgrades would fail on a
>>>>>>>>> newer kernel version (I don't have much linux experience so maybe my
>>>>>>>>> concerns are unfounded!?).
>>>>>>>>>
>>>>>>>>> I have attached the output of the dump super command.
>>>>>>>>>
>>>>>>>>> I did actually run chunk recover before, without the verbose option,
>>>>>>>>> it took around 24 hours to finish but did not resolve my issue. Happy
>>>>>>>>> to start that again if you need its output.
>>>>>>>>
>>>>>>>> The system chunk only contains the following chunks:
>>>>>>>> [0, 4194304]:           Initial temporary chunk, not used at all
>>>>>>>> [20971520, 29360128]:   System chunk created by mkfs, should be full
>>>>>>>>                         used up
>>>>>>>> [20800943685632, 20800977240064]:
>>>>>>>>                         The newly created large system chunk.
>>>>>>>>
>>>>>>>> The chunk root is still in 2nd chunk thus valid, but some of its leaf 
>>>>>>>> is
>>>>>>>> out of the range.
>>>>>>>>
>>>>>>>> If you can't wait 24h for chunk recovery to run, my advice would be 
>>>>>>>> move
>>>>>>>> the disk to some other computer, and use latest btrfs-progs to execute
>>>>>>>> the following command:
>>>>>>>>
>>>>>>>> # btrfs inpsect dump-tree -b 20800943685632 --follow
>>>>>>>>
>>>>>>>> If we're lucky enough, we may read out the tree leaf containing the new
>>>>>>>> system chunk and save a day.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Qu
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks so much for your help.
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>> On 28 April 2018 at 09:45, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2018年04月28日 16:30, Michael Wade wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I was hoping that someone would be able to help me resolve the 
>>>>>>>>>>> issues
>>>>>>>>>>> I am having with my ReadyNAS BTRFS volume. Basically my trouble
>>>>>>>>>>> started after a power cut, subsequently the volume would not mount.
>>>>>>>>>>> Here are the details of my setup as it is at the moment:
>>>>>>>>>>>
>>>>>>>>>>> uname -a
>>>>>>>>>>> Linux QAI 4.4.116.alpine.1 #1 SMP Mon Feb 19 21:58:38 PST 2018 
>>>>>>>>>>> armv7l GNU/Linux
>>>>>>>>>>
>>>>>>>>>> The kernel is pretty old for btrfs.
>>>>>>>>>> Strongly recommended to upgrade.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> btrfs --version
>>>>>>>>>>> btrfs-progs v4.12
>>>>>>>>>>
>>>>>>>>>> So is the user tools.
>>>>>>>>>>
>>>>>>>>>> Although I think it won't be a big problem, as needed tool should be 
>>>>>>>>>> there.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> btrfs fi show
>>>>>>>>>>> Label: '11baed92:data'  uuid: 20628cda-d98f-4f85-955c-932a367f8821
>>>>>>>>>>> Total devices 1 FS bytes used 5.12TiB
>>>>>>>>>>> devid    1 size 7.27TiB used 6.24TiB path /dev/md127
>>>>>>>>>>
>>>>>>>>>> So, it's btrfs on mdraid.
>>>>>>>>>> It would normally make things harder to debug, so I could only 
>>>>>>>>>> provide
>>>>>>>>>> advice from the respect of btrfs.
>>>>>>>>>> For mdraid part, I can't ensure anything.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here are the relevant dmesg logs for the current state of the 
>>>>>>>>>>> device:
>>>>>>>>>>>
>>>>>>>>>>> [   19.119391] md: md127 stopped.
>>>>>>>>>>> [   19.120841] md: bind<sdb3>
>>>>>>>>>>> [   19.121120] md: bind<sdc3>
>>>>>>>>>>> [   19.121380] md: bind<sda3>
>>>>>>>>>>> [   19.125535] md/raid:md127: device sda3 operational as raid disk 0
>>>>>>>>>>> [   19.125547] md/raid:md127: device sdc3 operational as raid disk 2
>>>>>>>>>>> [   19.125554] md/raid:md127: device sdb3 operational as raid disk 1
>>>>>>>>>>> [   19.126712] md/raid:md127: allocated 3240kB
>>>>>>>>>>> [   19.126778] md/raid:md127: raid level 5 active with 3 out of 3
>>>>>>>>>>> devices, algorithm 2
>>>>>>>>>>> [   19.126784] RAID conf printout:
>>>>>>>>>>> [   19.126789]  --- level:5 rd:3 wd:3
>>>>>>>>>>> [   19.126794]  disk 0, o:1, dev:sda3
>>>>>>>>>>> [   19.126799]  disk 1, o:1, dev:sdb3
>>>>>>>>>>> [   19.126804]  disk 2, o:1, dev:sdc3
>>>>>>>>>>> [   19.128118] md127: detected capacity change from 0 to 
>>>>>>>>>>> 7991637573632
>>>>>>>>>>> [   19.395112] Adding 523708k swap on /dev/md1.  Priority:-1 
>>>>>>>>>>> extents:1
>>>>>>>>>>> across:523708k
>>>>>>>>>>> [   19.434956] BTRFS: device label 11baed92:data devid 1 transid
>>>>>>>>>>> 151800 /dev/md127
>>>>>>>>>>> [   19.739276] BTRFS info (device md127): setting nodatasum
>>>>>>>>>>> [   19.740440] BTRFS critical (device md127): unable to find logical
>>>>>>>>>>> 3208757641216 len 4096
>>>>>>>>>>> [   19.740450] BTRFS critical (device md127): unable to find logical
>>>>>>>>>>> 3208757641216 len 4096
>>>>>>>>>>> [   19.740498] BTRFS critical (device md127): unable to find logical
>>>>>>>>>>> 3208757641216 len 4096
>>>>>>>>>>> [   19.740512] BTRFS critical (device md127): unable to find logical
>>>>>>>>>>> 3208757641216 len 4096
>>>>>>>>>>> [   19.740552] BTRFS critical (device md127): unable to find logical
>>>>>>>>>>> 3208757641216 len 4096
>>>>>>>>>>> [   19.740560] BTRFS critical (device md127): unable to find logical
>>>>>>>>>>> 3208757641216 len 4096
>>>>>>>>>>> [   19.740576] BTRFS error (device md127): failed to read chunk root
>>>>>>>>>>
>>>>>>>>>> This shows it pretty clear, btrfs fails to read chunk root.
>>>>>>>>>> And according your above "len 4096" it's pretty old fs, as it's still
>>>>>>>>>> using 4K nodesize other than 16K nodesize.
>>>>>>>>>>
>>>>>>>>>> According to above output, it means your superblock by somehow lacks 
>>>>>>>>>> the
>>>>>>>>>> needed system chunk mapping, which is used to initialize chunk 
>>>>>>>>>> mapping.
>>>>>>>>>>
>>>>>>>>>> Please provide the following command output:
>>>>>>>>>>
>>>>>>>>>> # btrfs inspect dump-super -fFa /dev/md127
>>>>>>>>>>
>>>>>>>>>> Also, please consider run the following command and dump all its 
>>>>>>>>>> output:
>>>>>>>>>>
>>>>>>>>>> # btrfs rescue chunk-recover -v /dev/md127.
>>>>>>>>>>
>>>>>>>>>> Please note that, above command can take a long time to finish, and 
>>>>>>>>>> if
>>>>>>>>>> it works without problem, it may solve your problem.
>>>>>>>>>> But if it doesn't work, the output could help me to manually craft a 
>>>>>>>>>> fix
>>>>>>>>>> to your super block.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Qu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> [   19.783975] BTRFS error (device md127): open_ctree failed
>>>>>>>>>>>
>>>>>>>>>>> In an attempt to recover the volume myself I run a few BTRFS 
>>>>>>>>>>> commands
>>>>>>>>>>> mostly using advice from here:
>>>>>>>>>>> https://lists.opensuse.org/opensuse/2017-02/msg00930.html. However
>>>>>>>>>>> that actually seems to have made things worse as I can no longer 
>>>>>>>>>>> mount
>>>>>>>>>>> the file system, not even in readonly mode.
>>>>>>>>>>>
>>>>>>>>>>> So starting from the beginning here is a list of things I have done 
>>>>>>>>>>> so
>>>>>>>>>>> far (hopefully I remembered the order in which I ran them!)
>>>>>>>>>>>
>>>>>>>>>>> 1. Noticed that my backups to the NAS were not running (didn't get
>>>>>>>>>>> notified that the volume had basically "died")
>>>>>>>>>>> 2. ReadyNAS UI indicated that the volume was inactive.
>>>>>>>>>>> 3. SSHed onto the box and found that the first drive was not marked 
>>>>>>>>>>> as
>>>>>>>>>>> operational (log showed I/O errors / UNKOWN (0x2003))  so I replaced
>>>>>>>>>>> the disk and let the array resync.
>>>>>>>>>>> 4. After resync the volume still was unaccessible so I looked at the
>>>>>>>>>>> logs once more and saw something like the following which seemed to
>>>>>>>>>>> indicate that the replay log had been corrupted when the power went
>>>>>>>>>>> out:
>>>>>>>>>>>
>>>>>>>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems
>>>>>>>>>>> is 0: block=232292352, root=7, slot=0
>>>>>>>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems
>>>>>>>>>>> is 0: block=232292352, root=7, slot=0
>>>>>>>>>>> BTRFS: error (device md127) in btrfs_replay_log:2524: errno=-5 IO
>>>>>>>>>>> failure (Failed to recover log tree)
>>>>>>>>>>> BTRFS error (device md127): pending csums is 155648
>>>>>>>>>>> BTRFS error (device md127): cleaner transaction attach returned -30
>>>>>>>>>>> BTRFS critical (device md127): corrupt leaf, non-root leaf's nritems
>>>>>>>>>>> is 0: block=232292352, root=7, slot=0
>>>>>>>>>>>
>>>>>>>>>>> 5. Then:
>>>>>>>>>>>
>>>>>>>>>>> btrfs rescue zero-log
>>>>>>>>>>>
>>>>>>>>>>> 6. Was then able to mount the volume in readonly mode.
>>>>>>>>>>>
>>>>>>>>>>> btrfs scrub start
>>>>>>>>>>>
>>>>>>>>>>> Which fixed some errors but not all:
>>>>>>>>>>>
>>>>>>>>>>> scrub status for 20628cda-d98f-4f85-955c-932a367f8821
>>>>>>>>>>>
>>>>>>>>>>> scrub started at Tue Apr 24 17:27:44 2018, running for 04:00:34
>>>>>>>>>>> total bytes scrubbed: 224.26GiB with 6 errors
>>>>>>>>>>> error details: csum=6
>>>>>>>>>>> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
>>>>>>>>>>>
>>>>>>>>>>> scrub status for 20628cda-d98f-4f85-955c-932a367f8821
>>>>>>>>>>> scrub started at Tue Apr 24 17:27:44 2018, running for 04:34:43
>>>>>>>>>>> total bytes scrubbed: 224.26GiB with 6 errors
>>>>>>>>>>> error details: csum=6
>>>>>>>>>>> corrected errors: 0, uncorrectable errors: 6, unverified errors: 0
>>>>>>>>>>>
>>>>>>>>>>> 6. Seeing this hanging I rebooted the NAS
>>>>>>>>>>> 7. Think this is when the volume would not mount at all.
>>>>>>>>>>> 8. Seeing log entries like these:
>>>>>>>>>>>
>>>>>>>>>>> BTRFS warning (device md127): checksum error at logical 
>>>>>>>>>>> 20800943685632
>>>>>>>>>>> on dev /dev/md127, sector 520167424: metadata node (level 1) in 
>>>>>>>>>>> tree 3
>>>>>>>>>>>
>>>>>>>>>>> I ran
>>>>>>>>>>>
>>>>>>>>>>> btrfs check --fix-crc
>>>>>>>>>>>
>>>>>>>>>>> And that brings us to where I am now: Some seemly corrupted BTRFS
>>>>>>>>>>> metadata and unable to mount the drive even with the recovery 
>>>>>>>>>>> option.
>>>>>>>>>>>
>>>>>>>>>>> Any help you can give is much appreciated!
>>>>>>>>>>>
>>>>>>>>>>> Kind regards
>>>>>>>>>>> Michael
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>>>>>>>> linux-btrfs" in
>>>>>>>>>>> the body of a message to majord...@vger.kernel.org
>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>

signature.asc
Description: OpenPGP digital signature

Re: BTRFS RAID filesystem unmountable

Reply via email to