Further progress trying to track this down:
I don't have to shutdown the system to have problems. "swapoff /dev/hd0s5"
is enough to cause problems, once enough swap is in use. After a failed
swapoff, I have an extra 98 storeio processes running!
I don't have to swapoff to have "symptoms". The kernel debugger normally
shows symbolic names, i.e:
Stopped at machine_idle+0xe: leave
machine_idle(0,81a2c630,3806f64,0,9b448b38)+0xe
idle_thread_continue(9fcbdde0,81028b50,9c0c7fe4,0,9c3d5548)+0x2a
Once I've got enough swap in use, though, it stops doing this. Now I see:
Stopped at 0x810000be: leave
0x810000be(0,0,9fcc5990,0,9fb90b30)
0x810293fa(9fcbdde0,81028b50,99526fe4,0,9c3d5548)
When I see a kernel page fault, it's always in strcmp()
It doesn't matter if an ssh session is open or not (Riccardo Mottola's
suggestion).
I can't task_terminate the auth server, as this typically does nothing once
I've started having symptoms, but I can kill the auth server from the
command line (just "kill 7") and that triggers a reboot that leaves the
disk in a clean state.
I'm just learning Hurd. Any ideas?
agape
brent