This is pretty nasty. This is using the stock kernel from Mandrake 8.2. I expect that similar problems exist in the cooker version.
What happens is that after some indeterminate time period, the system does not allow you to start new processes. Already existing processes run, but new processes will not start and you cannot restart new processes. Shutting down cannot happen because you can't start the shutdown script! After looking through the logs, I think I have found the cause of the problem. It appears that devfs is dying. It kills enough of the kernel to not work correctly, but not enough of the kernel to choke all together. (Enough to be frustrating.) It looks like the lethal combination is a remountable ide-scsi device, but that is only a guess at this point. I believe the cause is one of the patches added to the kernel. (Probably grsecurity.) I rebuilt the kernel using the stock 2.4.18 source from ftp.kernel.org, using the same configuration options needed to keep Mandrake happy. (Devfs and ide-scsi mostly.) That kernel has worked flawlessly. (The other kernel would not last more than a day. There is definitely a problem here. What the solution is will take more research. ksymoops log is attached. Please Cc me on all mail, as I do not read the cooker list very often. (Far too many other lists to keep up with...)
ksymoops 2.4.3 on i686 2.4.18-6mdk. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.18-6mdk/ (default) -m /boot/System.map-2.4.18-6mdk (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (compare_ksyms_lsmod): module ext3 is in lsmod but not in ksyms, probably no symbols exported Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c01ce310, System.map says c0157de0. Ignoring ksyms_base entry Apr 7 14:31:39 kludge kernel: Unable to handle kernel paging request at virtual address 204f2f8d Apr 7 14:31:39 kludge kernel: c0160783 Apr 7 14:31:39 kludge kernel: *pde = 00000000 Apr 7 14:31:39 kludge kernel: Oops: 0000 Apr 7 14:31:39 kludge kernel: CPU: 0 Apr 7 14:31:39 kludge kernel: EIP: 0010:[scan_dir_for_removable+19/64] Not tainted Apr 7 14:31:39 kludge kernel: EIP: 0010:[<c0160783>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 Apr 7 14:31:39 kludge kernel: EFLAGS: 00010202 Apr 7 14:31:39 kludge kernel: eax: cc181240 ebx: 204f2f49 ecx: 00000000 edx: cc181240 Apr 7 14:31:39 kludge kernel: esi: ce153840 edi: ce3647a0 ebp: ce5f32e0 esp: c8955f28 Apr 7 14:31:39 kludge kernel: ds: 0018 es: 0018 ss: 0018 Apr 7 14:31:39 kludge kernel: Process msec_find (pid: 2952, stackpage=c8955000) Apr 7 14:31:39 kludge kernel: Stack: ce153840 c0160c16 ce3647a0 c0265a40 00000000 ce153840 ce1538c0 ce1538ac Apr 7 14:31:39 kludge kernel: ce5f32e0 c0141690 ce5f32e0 c8955fa0 c0141b90 ce5f32e0 fffffff7 0000000d Apr 7 14:31:39 kludge kernel: bfffeac8 c0141d3f ce5f32e0 c0141b90 c8955fa0 ce02dbc0 c01338f7 ce02dbc0 Apr 7 14:31:39 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] [filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] Apr 7 14:31:39 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] [<c0141d3f>] [<c0141b90>] Apr 7 14:31:39 kludge kernel: [<c01338f7>] [<c0106f23>] Apr 7 14:31:39 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 43 10 04 74 >>EIP; c0160782 <scan_dir_for_removable+12/40> <===== Trace; c0160c16 <devfs_readdir+56/1c0> Trace; c0141690 <vfs_readdir+60/90> Trace; c0141b90 <filldir64+0/160> Trace; c0141d3e <sys_getdents64+4e/b8> Trace; c0141b90 <filldir64+0/160> Trace; c01338f6 <sys_fchdir+c6/e0> Trace; c0106f22 <system_call+32/40> Code; c0160782 <scan_dir_for_removable+12/40> 00000000 <_EIP>: Code; c0160782 <scan_dir_for_removable+12/40> <===== 0: 66 8b 43 44 mov 0x44(%ebx),%ax <===== Code; c0160786 <scan_dir_for_removable+16/40> 4: 25 00 f0 00 00 and $0xf000,%eax Code; c016078a <scan_dir_for_removable+1a/40> 9: 66 3d 00 60 cmp $0x6000,%ax Code; c016078e <scan_dir_for_removable+1e/40> d: 75 0d jne 1c <_EIP+0x1c> c016079e <scan_dir_for_removable+2e/40> Code; c0160790 <scan_dir_for_removable+20/40> f: f6 43 10 04 testb $0x4,0x10(%ebx) Code; c0160794 <scan_dir_for_removable+24/40> 13: 74 00 je 15 <_EIP+0x15> c0160796 <scan_dir_for_removable+26/40> Apr 7 14:39:49 kludge kernel: 8139too Fast Ethernet driver 0.9.24 Apr 7 14:39:57 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 (SigmaTel STAC9721/23) Apr 8 04:05:56 kludge kernel: Unable to handle kernel paging request at virtual address 204f2f8d Apr 8 04:05:56 kludge kernel: c0160783 Apr 8 04:05:56 kludge kernel: *pde = 00000000 Apr 8 04:05:56 kludge kernel: Oops: 0000 Apr 8 04:05:56 kludge kernel: CPU: 0 Apr 8 04:05:56 kludge kernel: EIP: 0010:[scan_dir_for_removable+19/64] Not tainted Apr 8 04:05:56 kludge kernel: EIP: 0010:[<c0160783>] Not tainted Apr 8 04:05:56 kludge kernel: EFLAGS: 00010202 Apr 8 04:05:56 kludge kernel: eax: cd855da0 ebx: 204f2f49 ecx: 00000000 edx: cd855da0 Apr 8 04:05:56 kludge kernel: esi: c40635c0 edi: ce8807a0 ebp: c81152c0 esp: c8671f28 Apr 8 04:05:56 kludge kernel: ds: 0018 es: 0018 ss: 0018 Apr 8 04:05:56 kludge kernel: Process msec_find (pid: 9575, stackpage=c8671000) Apr 8 04:05:56 kludge kernel: Stack: c40635c0 c0160c16 ce8807a0 c0265a40 00000000 c40635c0 c4063640 c406362c Apr 8 04:05:56 kludge kernel: c81152c0 c0141690 c81152c0 c8671fa0 c0141b90 c81152c0 fffffff7 0000000d Apr 8 04:05:56 kludge kernel: bfffece8 c0141d3f c81152c0 c0141b90 c8671fa0 c9e600a0 c01338f7 c9e600a0 Apr 8 04:05:56 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] [filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] Apr 8 04:05:56 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] [<c0141d3f>] [<c0141b90>] Apr 8 04:05:56 kludge kernel: [<c01338f7>] [<c0106f23>] Apr 8 04:05:56 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 43 10 04 74 >>EIP; c0160782 <scan_dir_for_removable+12/40> <===== Trace; c0160c16 <devfs_readdir+56/1c0> Trace; c0141690 <vfs_readdir+60/90> Trace; c0141b90 <filldir64+0/160> Trace; c0141d3e <sys_getdents64+4e/b8> Trace; c0141b90 <filldir64+0/160> Trace; c01338f6 <sys_fchdir+c6/e0> Trace; c0106f22 <system_call+32/40> Code; c0160782 <scan_dir_for_removable+12/40> 00000000 <_EIP>: Code; c0160782 <scan_dir_for_removable+12/40> <===== 0: 66 8b 43 44 mov 0x44(%ebx),%ax <===== Code; c0160786 <scan_dir_for_removable+16/40> 4: 25 00 f0 00 00 and $0xf000,%eax Code; c016078a <scan_dir_for_removable+1a/40> 9: 66 3d 00 60 cmp $0x6000,%ax Code; c016078e <scan_dir_for_removable+1e/40> d: 75 0d jne 1c <_EIP+0x1c> c016079e <scan_dir_for_removable+2e/40> Code; c0160790 <scan_dir_for_removable+20/40> f: f6 43 10 04 testb $0x4,0x10(%ebx) Code; c0160794 <scan_dir_for_removable+24/40> 13: 74 00 je 15 <_EIP+0x15> c0160796 <scan_dir_for_removable+26/40> Apr 8 10:37:11 kludge kernel: 8139too Fast Ethernet driver 0.9.24 Apr 8 10:37:19 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 (SigmaTel STAC9721/23) Apr 8 11:03:53 kludge kernel: 8139too Fast Ethernet driver 0.9.24 Apr 8 11:03:59 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 (SigmaTel STAC9721/23) Apr 8 14:54:09 kludge kernel: Unable to handle kernel paging request at virtual address 204f2f8d Apr 8 14:54:09 kludge kernel: c0160783 Apr 8 14:54:09 kludge kernel: *pde = 00000000 Apr 8 14:54:09 kludge kernel: Oops: 0000 Apr 8 14:54:09 kludge kernel: CPU: 0 Apr 8 14:54:09 kludge kernel: EIP: 0010:[scan_dir_for_removable+19/64] Not tainted Apr 8 14:54:09 kludge kernel: EIP: 0010:[<c0160783>] Not tainted Apr 8 14:54:09 kludge kernel: EFLAGS: 00010202 Apr 8 14:54:09 kludge kernel: eax: ceaa41e0 ebx: 204f2f49 ecx: 00000000 edx: ceaa41e0 Apr 8 14:54:09 kludge kernel: esi: cf47b040 edi: cf8496a0 ebp: c9a87ca0 esp: c248bf28 Apr 8 14:54:09 kludge kernel: ds: 0018 es: 0018 ss: 0018 Apr 8 14:54:09 kludge kernel: Process find (pid: 11002, stackpage=c248b000) Apr 8 14:54:09 kludge kernel: Stack: cf47b040 c0160c16 cf8496a0 c0265a40 00000000 cf47b040 cf47b0c0 cf47b0ac Apr 8 14:54:09 kludge kernel: c9a87ca0 c0141690 c9a87ca0 c248bfa0 c0141b90 c9a87ca0 fffffff7 00000004 Apr 8 14:54:09 kludge kernel: bfffead8 c0141d3f c9a87ca0 c0141b90 c248bfa0 0000057d c1406360 41ed0007 Apr 8 14:54:09 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] [filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] Apr 8 14:54:09 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] [<c0141d3f>] [<c0141b90>] Apr 8 14:54:09 kludge kernel: [<c0130001>] [<c0106f23>] Apr 8 14:54:09 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 43 10 04 74 >>EIP; c0160782 <scan_dir_for_removable+12/40> <===== Trace; c0160c16 <devfs_readdir+56/1c0> Trace; c0141690 <vfs_readdir+60/90> Trace; c0141b90 <filldir64+0/160> Trace; c0141d3e <sys_getdents64+4e/b8> Trace; c0141b90 <filldir64+0/160> Trace; c0130000 <sys_swapoff+170/280> Trace; c0106f22 <system_call+32/40> Code; c0160782 <scan_dir_for_removable+12/40> 00000000 <_EIP>: Code; c0160782 <scan_dir_for_removable+12/40> <===== 0: 66 8b 43 44 mov 0x44(%ebx),%ax <===== Code; c0160786 <scan_dir_for_removable+16/40> 4: 25 00 f0 00 00 and $0xf000,%eax Code; c016078a <scan_dir_for_removable+1a/40> 9: 66 3d 00 60 cmp $0x6000,%ax Code; c016078e <scan_dir_for_removable+1e/40> d: 75 0d jne 1c <_EIP+0x1c> c016079e <scan_dir_for_removable+2e/40> Code; c0160790 <scan_dir_for_removable+20/40> f: f6 43 10 04 testb $0x4,0x10(%ebx) Code; c0160794 <scan_dir_for_removable+24/40> 13: 74 00 je 15 <_EIP+0x15> c0160796 <scan_dir_for_removable+26/40> Apr 8 16:52:11 kludge kernel: 8139too Fast Ethernet driver 0.9.24 Apr 8 16:52:17 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 (SigmaTel STAC9721/23) 3 warnings issued. Results may not be reliable.