Re: RAID1 filesystem not mounting

Chris Murphy Sun, 03 Feb 2019 10:44:23 -0800

On Sat, Feb 2, 2019 at 10:40 PM Alan Hardman <al...@fastmail.com> wrote:


> # journalctl | grep -A 15 exception
> ...
> Jan 23 01:06:37 localhost kernel: ata3.00: status: { DRDY }
> Jan 23 01:06:37 localhost kernel: ata3.00: failed command: WRITE FPDMA QUEUED
> Jan 23 01:06:37 localhost kernel: ata3.00: cmd 
> 61/b0:98:ea:7a:48/00:00:0a:00:00/40 tag 19 ncq dma 90112 out
>                                            res 
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> --
> Jan 31 19:24:32 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 
> 0x0 action 0x6 frozen
> Jan 31 19:24:32 localhost kernel: ata5.00: failed command: READ DMA EXT
> Jan 31 19:24:32 localhost kernel: ata5.00: cmd 
> 25/00:08:a8:2a:81/00:00:a3:03:00/e0 tag 0 dma 4096 in
>                                            res 
> 40/00:01:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
> Jan 31 19:24:32 localhost kernel: ata5.00: status: { DRDY }
> Jan 31 19:24:32 localhost kernel: ata5: link is slow to respond, please be 
> patient (ready=0)
> Jan 31 19:24:32 localhost kernel: ata5: device not ready (errno=-16), forcing 
> hardreset
> Jan 31 19:24:32 localhost kernel: ata5: soft resetting link

This kind of error on read is common when there is a marginally bad
sector, and the drive is doing deep recovery, and the time for
recovery is longer than the SCSI command timer. The reset clears the
command queue, meaning it's no longer possible to find out what sector
is marginal, and Btrfs can't fix it up. When left for a long time, it
allows bad sectors to accumulate, including during scrubbing. I don't
actually know what Btrfs does in this case, if the scrub is aborted,
or if it continues and shows the error as uncorrectable.

This isn't the only possible explanation for this kind of error. But
the bottom line is that it's generic, indicates there is a problem,
and you need to find out what the real cause is. It's asking for
trouble to just leave this kind of error floating in the wind, even if
it seems like everything is otherwise working.


> Jan 31 19:24:32 localhost kernel: ata5.00: configured for UDMA/33
> Jan 31 19:24:32 localhost kernel: ata5.01: configured for UDMA/33
> Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 FAILED Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : Illegal 
> Request [current]
> Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 Add. Sense: 
> Unaligned write command
> Jan 31 19:24:32 localhost kernel: sd 4:0:0:0: [sde] tag#0 CDB: Read(16) 88 00 
> 00 00 00 03 a3 81 2a a8 00 00 00 08 00 00
>

I don't know what the portion of this error sequence means, "illegal
request" and "unaligned write command". On 512e Advanced Format hard
drives, which have 512 byte logical 4096 byte physical sectors, a 512
byte write must actually be converted to 4096 byte read by the drive
firmware in order to do a read write modify. It can't do an overwrite
of just those 512 bytes. If there's a problem reading that sector, it
comes back as a read error. But... I'm not sure any of that is what
this is. To my knowledge Btrfs never does 512 byte writes, the minimum
read or write is 4096 bytes. Moving on...

>From your attached log, there are lots of failed writes. That really
must be sorted out because for all raid, write failures are fatal.
With md based raid, it'll eject the drive as faulty on a single write
failure. Whereas Btrfs keeps trying, as it has no concept of faulty
drives still. So you've got recent missing writes to whatever drive
has all these write errors. The 'grep -A 15' limited the output so I
can't tell how the error ended up being handled either by libata or
Btrfs. You really need to find out what device has write problems as
the next priority, as this is decently likely to prevent any 'btrfs
check --repair' from succeeding.

Next the SCSI command timer (this is a kernel timeout per block
device) for all devices is the default of 30 seconds. Most drives have
SCT ERC of 70 deciseconds which is *good*. But two drives do not, and
instead I see:

SCT Error Recovery Control command not supported

That's not good because it's possible marginally bad sectors will
cause the drive firmwar

Next, one of those drives has some UDMA errors which suggests an
actual link problem between the drive's controller and the logic
board's controller - the most logical suspect is just reseating the
cable on both ends. But maybe the cable needs replacing. So you'll
want to keep an eye on whether these errors continue or not. They'll
trigger a libata error message and it'll either retry or reset the
link, and therefore might be what caused one of your link reset
errors.

199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    19

OK so you have a corrupt leaf probably from bad memory, which probably
corrupted both copies of the leaf, which is why we don't see any fixup
messages. I don't actually know if 'btrfs check --repair' can fix this
kind of bad memory induced corruption. But in any case, we've got to
get the other hardware problems figured out before a repair is assured
to stick.

> I have been able to successfully recover files via "btrfs restore ...", and 
> there doesn't seem to be anything essential missing from its full output with 
> -D, so if that's necessary to use to offload the entire filesystem, it at 
> least seems possible if it can't be recovered directly.

If the data is important, and if this is the only copy, I always argue
in favor of urgently making a backup. Set aside all other
troubleshooting.

-- 
Chris Murphy

Re: RAID1 filesystem not mounting

Reply via email to