On 2020-05-05 at 16:05, Tollef Fog Heen wrote: > ]] The Wanderer > >> I'm not at all sure whether the underlying cause is still the same, >> but I'm getting a very similar-looking failure - albeit with >> different BUG details - after a reboot a few nights ago into a new >> kernel. > > My understanding is that when you get one of those, it indicates a > kernel problem, isn't that so? If so, it should probably just be > reassigned to linux.
That's entirely possible, and I wouldn't object if that happens. That said, in my case I'm now reasonably certain that the proximate underlying cause is misbehaving (either buggy, or outright starting to fail) storage-related hardware, as touched on at the end of my initial comment. After the RAID-array check completed that evening, I ran /etc/cron.daily/mlocate by hand (as root), and the I/O freeze from overdoing writes that I mentioned was triggered; the system kept running for a few hours, but the gkrellm clock froze at one second to midnight, and the system hadn't recovered by 6:30 the next morning. A hard power-cycle and a full manual fsck (including fixing several errors on one partition) got things working again, with no sign of current problems. I haven't retried explicitly initiating the process again, but it's been long enough that it would have run on its own, and nothing seems to have gone awry. The kernel probably still shouldn't BUG when this happens, but I don't know how far it's reasonable to expect the kernel to go to avoid such problems, and it's useful to get the notification of what's happened / hung under the hood - so I wouldn't be too fussed if the kernel people just declined this as being outside of their scope. -- The Wanderer The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw
signature.asc
Description: OpenPGP digital signature