On Wed, Sep 25, 2019 at 8:50 AM Pallissard, Matthew <m...@pallissard.net> wrote: > > Version: > Kernel: 5.2.2-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 21 19:18:34 UTC 2019 x86_64 > GNU/Linux
You need to upgrade to arch kernel 5.2.14 or newer (they backported the fix first appearing in stable 5.2.15). Or you need to downgrade to 5.1 series. https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdman...@kernel.org/T/#u That's a nasty bug. I don't offhand see evidence that you've hit this bug. But I'm not certain. So first thing should be to use a different kernel. Next, anytime there is a crash or powerfailur with Btrfs raid56, you need to do a complete scrub of the volume. Obviously will take time but that's what needs to be done first. OK actually, before the scrub you need to confirm that each drive's SCT ERC time is *less* than the kernel's SCSI command timer. e.g. # smartclt -l scterc /dev/sda # cat /sys/block/sda/device/timeout The SCT ERC value is in deciseconds so convert to seconds. The second value is in seconds. The first value must be shorter. By default the kernel's command timer per device is 30 seconds, typical consumer drives are much longer. So depending on the reply from your drive for that smart command, you might either change the drive timer or the SCSI command timer - or it might actually be perfect. NAS specific drives and nearline and SAS all tend to have short SCT ERC by default, around 7 second. That's fine. Note that the smart command is transient, when the drive powers off it goes back to a default. And on reboot, the kernel's command timer also resets. -- Chris Murphy