On 2020-05-05 at 16:05, Tollef Fog Heen wrote:

> ]] The Wanderer
> 
>> I'm not at all sure whether the underlying cause is still the same,
>> but I'm getting a very similar-looking failure - albeit with
>> different BUG details - after a reboot a few nights ago into a new
>> kernel.
> 
> My understanding is that when you get one of those, it indicates a
> kernel problem, isn't that so?  If so, it should probably just be
> reassigned to linux.

That's entirely possible, and I wouldn't object if that happens.

That said, in my case I'm now reasonably certain that the proximate
underlying cause is misbehaving (either buggy, or outright starting to
fail) storage-related hardware, as touched on at the end of my initial
comment.

After the RAID-array check completed that evening, I ran
/etc/cron.daily/mlocate by hand (as root), and the I/O freeze from
overdoing writes that I mentioned was triggered; the system kept running
for a few hours, but the gkrellm clock froze at one second to midnight,
and the system hadn't recovered by 6:30 the next morning. A hard
power-cycle and a full manual fsck (including fixing several errors on
one partition) got things working again, with no sign of current
problems. I haven't retried explicitly initiating the process again, but
it's been long enough that it would have run on its own, and nothing
seems to have gone awry.

The kernel probably still shouldn't BUG when this happens, but I don't
know how far it's reasonable to expect the kernel to go to avoid such
problems, and it's useful to get the notification of what's happened /
hung under the hood - so I wouldn't be too fussed if the kernel people
just declined this as being outside of their scope.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man.         -- George Bernard Shaw

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to