On Wed, Oct 10, 2018 at 12:31 PM, Larkin Lowrey
<llow...@nuclearwinter.com> wrote:

> Interesting, because I do not see any indications of any other errors. The
> fs is backed by an mdraid array and the raid checks always pass with no
> mismatches, edac-util doesn't report any ECC errors, smartd doesn't report
> any SMART errors, and I never see any raid controller errors. I have the
> console connected through serial to a logging console server so if there
> were errors reported I would have seen them.

I think Holger is referring to the multiple reports like this:

[  817.883261] scsi_eh_0       S    0   141      2 0x80000000
[  817.888866] Call Trace:
[  817.891391]  ? __schedule+0x253/0x860
[  817.895094]  ? scsi_try_target_reset+0x90/0x90
[  817.899631]  ? scsi_eh_get_sense+0x220/0x220
[  817.904045]  schedule+0x28/0x80
[  817.907260]  scsi_error_handler+0x1d2/0x5b0
[  817.911514]  ? __schedule+0x25b/0x860
[  817.915207]  ? scsi_eh_get_sense+0x220/0x220
[  817.919547]  kthread+0x112/0x130
[  817.922818]  ? kthread_create_worker_on_cpu+0x70/0x70
[  817.928015]  ret_from_fork+0x22/0x40


That isn't a SCSI controller or drive error itself; it's a capture of
a thread that's in the state of handling scsi errors (maybe).

I'm finding scsi_try_target_reset here at line 855
https://github.com/torvalds/linux/blob/master/drivers/scsi/scsi_error.c

And also line 2143 for scsi_error_handler
https://github.com/torvalds/linux/blob/master/drivers/scsi/scsi_error.c

Is the problem Btrfs on sysroot? Because if the sysroot file system is
entirely error free, I'd expect to eventually get a lot more error
information from the kernel even without sysrq+t rather than
faceplanting. Can you post the entire dmesg? The posted one starts at
~815 seconds, and the problems definitely start before then but as it
is we have nothing really to go on.


-- 
Chris Murphy

Reply via email to