This is pretty nasty.

This is using the stock kernel from Mandrake 8.2.  I expect that similar
problems exist in the cooker version.

What happens is that after some indeterminate time period, the system
does not allow you to start new processes.  Already existing processes
run, but new processes will not start and you cannot restart new
processes.  Shutting down cannot happen because you can't start the
shutdown script!

After looking through the logs, I think I have found the cause of the
problem.  It appears that devfs is dying. It kills enough of the kernel
to not work correctly, but not enough of the kernel to choke all
together. (Enough to be frustrating.) It looks like the lethal
combination is a remountable ide-scsi device, but that is only a guess
at this point.

I believe the cause is one of the patches added to the kernel. (Probably
grsecurity.)

I rebuilt the kernel using the stock 2.4.18 source from ftp.kernel.org,
using the same configuration options needed to keep Mandrake happy.
(Devfs and ide-scsi mostly.)  That kernel has worked flawlessly.  (The
other kernel would not last more than a day.

There is definitely a problem here. What the solution is will take more
research.

ksymoops log is attached. Please Cc me on all mail, as I do not read the
cooker list very often.  (Far too many other lists to keep up with...)
ksymoops 2.4.3 on i686 2.4.18-6mdk.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.18-6mdk/ (default)
     -m /boot/System.map-2.4.18-6mdk (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Warning (compare_ksyms_lsmod): module ext3 is in lsmod but not in ksyms, probably no 
symbols exported
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says c01ce310, 
System.map says c0157de0.  Ignoring ksyms_base entry
Apr  7 14:31:39 kludge kernel: Unable to handle kernel paging request at virtual 
address 204f2f8d
Apr  7 14:31:39 kludge kernel: c0160783
Apr  7 14:31:39 kludge kernel: *pde = 00000000
Apr  7 14:31:39 kludge kernel: Oops: 0000
Apr  7 14:31:39 kludge kernel: CPU:    0
Apr  7 14:31:39 kludge kernel: EIP:    0010:[scan_dir_for_removable+19/64]    Not 
tainted
Apr  7 14:31:39 kludge kernel: EIP:    0010:[<c0160783>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Apr  7 14:31:39 kludge kernel: EFLAGS: 00010202
Apr  7 14:31:39 kludge kernel: eax: cc181240   ebx: 204f2f49   ecx: 00000000   edx: 
cc181240
Apr  7 14:31:39 kludge kernel: esi: ce153840   edi: ce3647a0   ebp: ce5f32e0   esp: 
c8955f28
Apr  7 14:31:39 kludge kernel: ds: 0018   es: 0018   ss: 0018
Apr  7 14:31:39 kludge kernel: Process msec_find (pid: 2952, stackpage=c8955000)
Apr  7 14:31:39 kludge kernel: Stack: ce153840 c0160c16 ce3647a0 c0265a40 00000000 
ce153840 ce1538c0 ce1538ac 
Apr  7 14:31:39 kludge kernel:        ce5f32e0 c0141690 ce5f32e0 c8955fa0 c0141b90 
ce5f32e0 fffffff7 0000000d 
Apr  7 14:31:39 kludge kernel:        bfffeac8 c0141d3f ce5f32e0 c0141b90 c8955fa0 
ce02dbc0 c01338f7 ce02dbc0 
Apr  7 14:31:39 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] 
[filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] 
Apr  7 14:31:39 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] 
[<c0141d3f>] [<c0141b90>] 
Apr  7 14:31:39 kludge kernel:    [<c01338f7>] [<c0106f23>] 
Apr  7 14:31:39 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 
43 10 04 74 

>>EIP; c0160782 <scan_dir_for_removable+12/40>   <=====
Trace; c0160c16 <devfs_readdir+56/1c0>
Trace; c0141690 <vfs_readdir+60/90>
Trace; c0141b90 <filldir64+0/160>
Trace; c0141d3e <sys_getdents64+4e/b8>
Trace; c0141b90 <filldir64+0/160>
Trace; c01338f6 <sys_fchdir+c6/e0>
Trace; c0106f22 <system_call+32/40>
Code;  c0160782 <scan_dir_for_removable+12/40>
00000000 <_EIP>:
Code;  c0160782 <scan_dir_for_removable+12/40>   <=====
   0:   66 8b 43 44               mov    0x44(%ebx),%ax   <=====
Code;  c0160786 <scan_dir_for_removable+16/40>
   4:   25 00 f0 00 00            and    $0xf000,%eax
Code;  c016078a <scan_dir_for_removable+1a/40>
   9:   66 3d 00 60               cmp    $0x6000,%ax
Code;  c016078e <scan_dir_for_removable+1e/40>
   d:   75 0d                     jne    1c <_EIP+0x1c> c016079e 
<scan_dir_for_removable+2e/40>
Code;  c0160790 <scan_dir_for_removable+20/40>
   f:   f6 43 10 04               testb  $0x4,0x10(%ebx)
Code;  c0160794 <scan_dir_for_removable+24/40>
  13:   74 00                     je     15 <_EIP+0x15> c0160796 
<scan_dir_for_removable+26/40>

Apr  7 14:39:49 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr  7 14:39:57 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 
(SigmaTel STAC9721/23)
Apr  8 04:05:56 kludge kernel: Unable to handle kernel paging request at virtual 
address 204f2f8d
Apr  8 04:05:56 kludge kernel: c0160783
Apr  8 04:05:56 kludge kernel: *pde = 00000000
Apr  8 04:05:56 kludge kernel: Oops: 0000
Apr  8 04:05:56 kludge kernel: CPU:    0
Apr  8 04:05:56 kludge kernel: EIP:    0010:[scan_dir_for_removable+19/64]    Not 
tainted
Apr  8 04:05:56 kludge kernel: EIP:    0010:[<c0160783>]    Not tainted
Apr  8 04:05:56 kludge kernel: EFLAGS: 00010202
Apr  8 04:05:56 kludge kernel: eax: cd855da0   ebx: 204f2f49   ecx: 00000000   edx: 
cd855da0
Apr  8 04:05:56 kludge kernel: esi: c40635c0   edi: ce8807a0   ebp: c81152c0   esp: 
c8671f28
Apr  8 04:05:56 kludge kernel: ds: 0018   es: 0018   ss: 0018
Apr  8 04:05:56 kludge kernel: Process msec_find (pid: 9575, stackpage=c8671000)
Apr  8 04:05:56 kludge kernel: Stack: c40635c0 c0160c16 ce8807a0 c0265a40 00000000 
c40635c0 c4063640 c406362c 
Apr  8 04:05:56 kludge kernel:        c81152c0 c0141690 c81152c0 c8671fa0 c0141b90 
c81152c0 fffffff7 0000000d 
Apr  8 04:05:56 kludge kernel:        bfffece8 c0141d3f c81152c0 c0141b90 c8671fa0 
c9e600a0 c01338f7 c9e600a0 
Apr  8 04:05:56 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] 
[filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] 
Apr  8 04:05:56 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] 
[<c0141d3f>] [<c0141b90>] 
Apr  8 04:05:56 kludge kernel:    [<c01338f7>] [<c0106f23>] 
Apr  8 04:05:56 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 
43 10 04 74 

>>EIP; c0160782 <scan_dir_for_removable+12/40>   <=====
Trace; c0160c16 <devfs_readdir+56/1c0>
Trace; c0141690 <vfs_readdir+60/90>
Trace; c0141b90 <filldir64+0/160>
Trace; c0141d3e <sys_getdents64+4e/b8>
Trace; c0141b90 <filldir64+0/160>
Trace; c01338f6 <sys_fchdir+c6/e0>
Trace; c0106f22 <system_call+32/40>
Code;  c0160782 <scan_dir_for_removable+12/40>
00000000 <_EIP>:
Code;  c0160782 <scan_dir_for_removable+12/40>   <=====
   0:   66 8b 43 44               mov    0x44(%ebx),%ax   <=====
Code;  c0160786 <scan_dir_for_removable+16/40>
   4:   25 00 f0 00 00            and    $0xf000,%eax
Code;  c016078a <scan_dir_for_removable+1a/40>
   9:   66 3d 00 60               cmp    $0x6000,%ax
Code;  c016078e <scan_dir_for_removable+1e/40>
   d:   75 0d                     jne    1c <_EIP+0x1c> c016079e 
<scan_dir_for_removable+2e/40>
Code;  c0160790 <scan_dir_for_removable+20/40>
   f:   f6 43 10 04               testb  $0x4,0x10(%ebx)
Code;  c0160794 <scan_dir_for_removable+24/40>
  13:   74 00                     je     15 <_EIP+0x15> c0160796 
<scan_dir_for_removable+26/40>

Apr  8 10:37:11 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr  8 10:37:19 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 
(SigmaTel STAC9721/23)
Apr  8 11:03:53 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr  8 11:03:59 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 
(SigmaTel STAC9721/23)
Apr  8 14:54:09 kludge kernel: Unable to handle kernel paging request at virtual 
address 204f2f8d
Apr  8 14:54:09 kludge kernel: c0160783
Apr  8 14:54:09 kludge kernel: *pde = 00000000
Apr  8 14:54:09 kludge kernel: Oops: 0000
Apr  8 14:54:09 kludge kernel: CPU:    0
Apr  8 14:54:09 kludge kernel: EIP:    0010:[scan_dir_for_removable+19/64]    Not 
tainted
Apr  8 14:54:09 kludge kernel: EIP:    0010:[<c0160783>]    Not tainted
Apr  8 14:54:09 kludge kernel: EFLAGS: 00010202
Apr  8 14:54:09 kludge kernel: eax: ceaa41e0   ebx: 204f2f49   ecx: 00000000   edx: 
ceaa41e0
Apr  8 14:54:09 kludge kernel: esi: cf47b040   edi: cf8496a0   ebp: c9a87ca0   esp: 
c248bf28
Apr  8 14:54:09 kludge kernel: ds: 0018   es: 0018   ss: 0018
Apr  8 14:54:09 kludge kernel: Process find (pid: 11002, stackpage=c248b000)
Apr  8 14:54:09 kludge kernel: Stack: cf47b040 c0160c16 cf8496a0 c0265a40 00000000 
cf47b040 cf47b0c0 cf47b0ac 
Apr  8 14:54:09 kludge kernel:        c9a87ca0 c0141690 c9a87ca0 c248bfa0 c0141b90 
c9a87ca0 fffffff7 00000004 
Apr  8 14:54:09 kludge kernel:        bfffead8 c0141d3f c9a87ca0 c0141b90 c248bfa0 
0000057d c1406360 41ed0007 
Apr  8 14:54:09 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] 
[filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] 
Apr  8 14:54:09 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] 
[<c0141d3f>] [<c0141b90>] 
Apr  8 14:54:09 kludge kernel:    [<c0130001>] [<c0106f23>] 
Apr  8 14:54:09 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 
43 10 04 74 

>>EIP; c0160782 <scan_dir_for_removable+12/40>   <=====
Trace; c0160c16 <devfs_readdir+56/1c0>
Trace; c0141690 <vfs_readdir+60/90>
Trace; c0141b90 <filldir64+0/160>
Trace; c0141d3e <sys_getdents64+4e/b8>
Trace; c0141b90 <filldir64+0/160>
Trace; c0130000 <sys_swapoff+170/280>
Trace; c0106f22 <system_call+32/40>
Code;  c0160782 <scan_dir_for_removable+12/40>
00000000 <_EIP>:
Code;  c0160782 <scan_dir_for_removable+12/40>   <=====
   0:   66 8b 43 44               mov    0x44(%ebx),%ax   <=====
Code;  c0160786 <scan_dir_for_removable+16/40>
   4:   25 00 f0 00 00            and    $0xf000,%eax
Code;  c016078a <scan_dir_for_removable+1a/40>
   9:   66 3d 00 60               cmp    $0x6000,%ax
Code;  c016078e <scan_dir_for_removable+1e/40>
   d:   75 0d                     jne    1c <_EIP+0x1c> c016079e 
<scan_dir_for_removable+2e/40>
Code;  c0160790 <scan_dir_for_removable+20/40>
   f:   f6 43 10 04               testb  $0x4,0x10(%ebx)
Code;  c0160794 <scan_dir_for_removable+24/40>
  13:   74 00                     je     15 <_EIP+0x15> c0160796 
<scan_dir_for_removable+26/40>

Apr  8 16:52:11 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr  8 16:52:17 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609 
(SigmaTel STAC9721/23)

3 warnings issued.  Results may not be reliable.

Reply via email to