On Wed, Oct 10, 2018 at 12:31 PM, Larkin Lowrey <llow...@nuclearwinter.com> wrote:
> Interesting, because I do not see any indications of any other errors. The > fs is backed by an mdraid array and the raid checks always pass with no > mismatches, edac-util doesn't report any ECC errors, smartd doesn't report > any SMART errors, and I never see any raid controller errors. I have the > console connected through serial to a logging console server so if there > were errors reported I would have seen them. I think Holger is referring to the multiple reports like this: [ 817.883261] scsi_eh_0 S 0 141 2 0x80000000 [ 817.888866] Call Trace: [ 817.891391] ? __schedule+0x253/0x860 [ 817.895094] ? scsi_try_target_reset+0x90/0x90 [ 817.899631] ? scsi_eh_get_sense+0x220/0x220 [ 817.904045] schedule+0x28/0x80 [ 817.907260] scsi_error_handler+0x1d2/0x5b0 [ 817.911514] ? __schedule+0x25b/0x860 [ 817.915207] ? scsi_eh_get_sense+0x220/0x220 [ 817.919547] kthread+0x112/0x130 [ 817.922818] ? kthread_create_worker_on_cpu+0x70/0x70 [ 817.928015] ret_from_fork+0x22/0x40 That isn't a SCSI controller or drive error itself; it's a capture of a thread that's in the state of handling scsi errors (maybe). I'm finding scsi_try_target_reset here at line 855 https://github.com/torvalds/linux/blob/master/drivers/scsi/scsi_error.c And also line 2143 for scsi_error_handler https://github.com/torvalds/linux/blob/master/drivers/scsi/scsi_error.c Is the problem Btrfs on sysroot? Because if the sysroot file system is entirely error free, I'd expect to eventually get a lot more error information from the kernel even without sysrq+t rather than faceplanting. Can you post the entire dmesg? The posted one starts at ~815 seconds, and the problems definitely start before then but as it is we have nothing really to go on. -- Chris Murphy