Re: Any hope of pool recovery?

Donald Pearson Wed, 01 Jul 2015 14:36:12 -0700

Here is the result of the attempted rescue chunk-recover

[root@san01 btrfs-progs]# ./btrfs rescue chunk-recover -v /dev/sdc
All Devices:
        Device: id = 7, name = /dev/sdl
        Device: id = 8, name = /dev/sdm
        Device: id = 9, name = /dev/sdn
        Device: id = 3, name = /dev/sdf
        Device: id = 6, name = /dev/sdi
        Device: id = 4, name = /dev/sdg
        Device: id = 5, name = /dev/sdh
        Device: id = 2, name = /dev/sdd
        Device: id = 10, name = /dev/sdq
        Device: id = 1, name = /dev/sdc


*** Error in `./btrfs': free(): invalid next size (fast): 0x0000000001332100 ***
Segmentation fault

On Wed, Jul 1, 2015 at 2:05 PM, Donald Pearson
<donaldwhpear...@gmail.com> wrote:
> I should have thought to check this to add earlier.  I'm seeing errors
> for /dev/sdg in dmesg (not surprised, I wanted this drive out of the
> pool to begin with because it's sick).
>
> [  142.612988] BTRFS: open_ctree failed
> [11836.105577] sd 0:0:6:0: [sdg] FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [11836.105585] sd 0:0:6:0: [sdg] Sense Key : Medium Error [current]
> [11836.105589] sd 0:0:6:0: [sdg] Add. Sense: Unrecovered read error
> [11836.105592] sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a 5b f1 b8 00 01 00 00
> [11836.105596] blk_update_request: critical medium error, dev sdg,
> sector 1515975096
> [11839.044815] mpt2sas0: log_info(0x31080000): originator(PL),
> code(0x08), sub_code(0x0000)
> [11839.044843] sd 0:0:6:0: [sdg] FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [11839.044848] sd 0:0:6:0: [sdg] Sense Key : Medium Error [current]
> [11839.044857] sd 0:0:6:0: [sdg] Add. Sense: Unrecovered read error
> [11839.044862] sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a 5b f2 b8 00 01 00 00
> [11839.044865] blk_update_request: critical medium error, dev sdg,
> sector 1515975352
> [11842.009545] sd 0:0:6:0: [sdg] FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [11842.009554] sd 0:0:6:0: [sdg] Sense Key : Medium Error [current]
> [11842.009558] sd 0:0:6:0: [sdg] Add. Sense: Unrecovered read error
> [11842.009562] sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a 5b f2 80 00 00 08 00
> [11842.009565] blk_update_request: critical medium error, dev sdg,
> sector 1515975296
> [11842.009934] Buffer I/O error on dev sdg, logical block 189496912,
> async page read
>
> On Wed, Jul 1, 2015 at 1:58 PM, Donald Pearson
> <donaldwhpear...@gmail.com> wrote:
>> Small update on this, with no idea if this is useful information or not.
>>
>> At some point within the last hour iostat shows that /dev/sdg is no
>> longer under heavy reads.
>>
>> The other 9 drives however are still reading as fast as they are able.
>> There is no new output on the `btrfs rescue chunk-recover` screen so I
>> expect it's still running.
>>
>> There are 4 other drives with the same total capacity as sdg so I
>> would have expected then to normally all complete at about the same
>> time.
>>
>> Regards,
>> Donald
>>
>> On Wed, Jul 1, 2015 at 11:09 AM, Donald Pearson
>> <donaldwhpear...@gmail.com> wrote:
>>> Thanks Chris,
>>>
>>> To my shame it turns out darkling didn't drop off IRC after all; I'm
>>> new to all this and learning quickly that I need to sit on my hands.
>>> I admit despite darkling's suggestion that my usertools are probably
>>> fine I pulled down a newer kernel from elrepo so currently I'm running
>>> 4.1.1-1.el7.elrepo.x86_64
>>>
>>> I started with 4.0.2-1.el7.elrepo.x86_64
>>>
>>> I also do have btrfs-progs 4.1 that I got from git.
>>>
>>> Here is the 4.0 output
>>> [root@san01 btrfs-progs]# btrfs check /dev/sdc
>>> checksum verify failed on 21364736 found E4E3BDB6 wanted 00000000
>>> checksum verify failed on 21364736 found E4E3BDB6 wanted 00000000
>>> checksum verify failed on 21364736 found EC809498 wanted 0863292E
>>> checksum verify failed on 21364736 found 925303CE wanted 09150E74
>>> checksum verify failed on 21364736 found 925303CE wanted 09150E74
>>> bytenr mismatch, want=21364736, have=1065943040
>>> Couldn't read chunk tree
>>> Couldn't open file system
>>>
>>> Here is the 4.1 output
>>> [root@san01 btrfs-progs]# ./btrfs check /dev/sdc
>>> checksum verify failed on 21364736 found E4E3BDB6 wanted 00000000
>>> checksum verify failed on 21364736 found E4E3BDB6 wanted 00000000
>>> checksum verify failed on 21364736 found EC809498 wanted 0863292E
>>> checksum verify failed on 21364736 found 925303CE wanted 09150E74
>>> checksum verify failed on 21364736 found 925303CE wanted 09150E74
>>> bytenr mismatch, want=21364736, have=1065943040
>>> Couldn't read chunk tree
>>> Couldn't open file system
>>>
>>> Finally, before I learned of this mailing list I started a run of
>>> btrfs rescue chunk-recover
>>> [root@san01 btrfs-progs]# ./btrfs rescue chunk-recover -v /dev/sdc
>>>
>>> I can see now through iostat that all 10 drives are reading as fast as
>>> they can and my understanding is this will take a long time, but I've
>>> since learned (not only that darkling was still alive on IRC) that
>>> this probably won't solve my problem.
>>>
>>> Regards,
>>> Donald (seijirou)
>>>
>>> On Wed, Jul 1, 2015 at 10:50 AM, Chris Murphy <li...@colorremedies.com> 
>>> wrote:
>>>> btrfs-progs version is 4.0, what is the kernel versions you've tried
>>>> to mount with?
>>>>
>>>> I suggest running btrfs check (without --repair) and including the
>>>> full output. There are a lot of changes in btrfs-progs 4.1, but off
>>>> hand I don't know that they'd affect btrfs check results.
>>>>
>>>>
>>>> Chris Murphy
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Any hope of pool recovery?

Reply via email to