-----BEGIN PGP SIGNED MESSAGE-----
Hello all,
We've experienced many crashes lately with 2.2.x kernels.
This is a bug report, maybe somebody with more clues than me :-)
can figure it out. I've tried the best I could to generate a
good bug report. Please be forgiving as this is the first bug
I am trying to hunt down :-) Apologies if I've just added to the
noise.
TIA for any help.
(Anything else I could do which is not well documented ?
Like trying to run gdb on vmlinux for more precise location of the
buggy code ? The Documentation/oops-tracing.txt was not detailed enough
for a clueless like me :-)
=====================================================================
Bug Report
1. SUMMARY:
Kernel 2.2.12 crash under heavy load, NULL pointer dereference
2. DESCRIPTION:
Kernel 2.2.12 on a dual pentium pro dies
tryinig to dereference a null pointer, also:
"Kernel panic: Attempted to kill the idle task!"
My (more or less) educated guess is that we've hit a bug
in the core/basic/generic networking code wich is triggered by
heavy load on an SMP machine. The reasons to say this are:
- out of three dual processor machines we've experienced crashes
only with 2 of them, involved in heavy networking.
- kernel was stable when compiled monoprocessor.
(Of course I may be way off, I don't pretend to be an expert :-)
just trying to hunt down a very annoying bug)
3. KEYWORDS:
networking, SMP
4. KERNEL VERSION:
2.2.12
5. OUTPUT OF OOPS:
(Notes: we've switched to a uniprocessor machine after the last crash
so this report was run under 2.2.5 uniprocessor but with the right
parameters, I hope.)
ksymoops 0.7c on i686 2.2.5-22. Options used
-v /usr/src/linux/vmlinux (specified)
-K (specified)
-L (specified)
-o /lib/modules/2.2.12/ (specified)
-m /boot/System.map-2.2.12 (specified)
No modules in ksyms, skipping objects
Unable to handle kernel NULL pointer dereference at virtual address 0000001f
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c0132788>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010212
eax: 00005f7f ebx: 0000001f ecx: c6c32800 edx: 00000000
esi: 015b5872 edi: 00000008 ebp: 015b5785 esp: c0235edc
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c0235000)
Stack: ceba4c00 0000001d c0161663 ca44691c 00000002 c9f89f20 c0151ae3 c8f9ab00
c0152476 c9f89f20 c64da380 00000008 d08884bd c9f89f20 c692a320 04000001
00000004 0000000a c90b94a0 c6b65ba0 00000034 c0235f5c c90b94a0 00000020
Call Trace: [<c0161663>] [<c0151ae3>] [<c0152476>] [<d08884bd>] [<c010c2fd>]
[<c0110f0e>] [<c010c46f>]
[<c010af48>] [<c01087d1>] [<c0106000>] [<c0106000>] [<c01001b1>]
Code: 81 3b 01 46 00 00 74 10 68 60 ad 1e c0 e8 be 45 fe ff 83 c4
>>EIP; c0132788 <kill_fasync+c/44> <=====
Trace; c0161663 <tcp_write_space+4f/54>
Trace; c0151ae3 <sock_wfree+17/1c>
Trace; c0152476 <__kfree_skb+36/a8>
Trace; d08884bd <END_OF_CODE+1060fe65/????>
Trace; c010c2fd <handle_IRQ_event+55/88>
Trace; c0110f0e <do_level_ioapic_IRQ+62/a0>
Trace; c010c46f <do_IRQ+3b/5c>
Trace; c010af48 <common_interrupt+18/20>
Trace; c01087d1 <cpu_idle+41/54>
Trace; c0106000 <get_options+0/74>
Trace; c0106000 <get_options+0/74>
Trace; c01001b1 <L6+0/2>
Code; c0132788 <kill_fasync+c/44>
00000000 <_EIP>:
Code; c0132788 <kill_fasync+c/44> <=====
0: 81 3b 01 46 00 00 cmpl $0x4601,(%ebx) <=====
Code; c013278e <kill_fasync+12/44>
6: 74 10 je 18 <_EIP+0x18> c01327a0 <kill_fasync+24/44>
Code; c0132790 <kill_fasync+14/44>
8: 68 60 ad 1e c0 pushl $0xc01ead60
Code; c0132795 <kill_fasync+19/44>
d: e8 be 45 fe ff call fffe45d0 <_EIP+0xfffe45d0> c0116d58
<printk+0/18c>
Code; c013279a <kill_fasync+1e/44>
12: 83 c4 00 addl $0x0,%esp
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing
6. REPLICATION:
IMHO not exactly easy:
sometimes it may go for 7-10 days, sometimes may have several crashes
a day. Maybe if a SMP box is hammered _really_ heavily then it may
be possible to replicate it. Unfortunately I don't have a spare machine
to fiddle with.
The code sequence is always the same (we had several crashes).
7. ENVIRONMENT:
7.1 ver_linux: NOTE: this was run under 2.2.5-uniprocessor
- -- Versions installed: (if some fields are empty or looks
- -- unusual then possibly you have very old versions)
Linux yukawa 2.2.5-22 #1 Wed Jun 2 08:45:51 EDT 1999 i686 unknown
Kernel modules 2.1.121
Gnu C egcs-2.91.66
Binutils 2.9.1.0.23
Linux C Library 2.1.1
Dynamic linker ldd (GNU libc) 2.1.1
Procps 2.0.2
Mount 2.9o
Net-tools 1.52
Console-tools 1999.03.02
Sh-utils 1.16
Modules Loaded nfsd nfs lockd sunrpc 3c59x aic7xxx
7.2 CPU info
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 1
model name : Pentium Pro
stepping : 9
cpu MHz : 199.434997
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
bogomips : 199.07
7.3 Modules
nfsd
nfs
lockd
sunrpc
3c59x
aic7xxx
7.4 SCSI
Attached devices:
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: DORS-32160 Rev: WA6A
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: IBM OEM Model: DCHS04U Rev: 2626
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: IBM OEM Model: DCHS04U Rev: 2626
Type: Direct-Access ANSI SCSI revision: 02
7.5 Other information
Output of /proc/pci (gives exact hardware models of SCSI and network cards)
PCI devices found:
Bus 0, device 0, function 0:
Host bridge: Intel 82441FX Natoma (rev 2).
Medium devsel. Fast back-to-back capable. Master Capable. Latency=32.
Bus 0, device 1, function 0:
ISA bridge: Intel 82371SB PIIX3 ISA (rev 1).
Medium devsel. Fast back-to-back capable. Master Capable. No bursts.
Bus 0, device 1, function 1:
IDE interface: Intel 82371SB PIIX3 IDE (rev 0).
Medium devsel. Fast back-to-back capable. Master Capable. Latency=32.
I/O at 0xe800 [0xe801].
Bus 0, device 9, function 0:
SCSI storage controller: Adaptec AIC-7881U (rev 0).
Medium devsel. Fast back-to-back capable. IRQ 9. Master Capable. Latency=32.
Min Gnt=8.Max Lat=8.
I/O at 0xe000 [0xe001].
Non-prefetchable 32 bit memory at 0xfb000000 [0xfb000000].
Bus 0, device 10, function 0:
Ethernet controller: 3Com 3C905B 100bTX (rev 0).
Medium devsel. IRQ 10. Master Capable. Latency=32. Min Gnt=10.Max Lat=10.
I/O at 0xd800 [0xd801].
Non-prefetchable 32 bit memory at 0xfa800000 [0xfa800000].
Bus 0, device 12, function 0:
VGA compatible controller: S3 Inc. ViRGE (rev 6).
Medium devsel. IRQ 11. Master Capable. Latency=32. Min Gnt=4.Max Lat=255.
Non-prefetchable 32 bit memory at 0xf4000000 [0xf4000000].
Cheers,
______________________________________________________________________
Ryurick M. Hristev ()..()/^\/^\ -<:-)
[EMAIL PROTECTED] \/ \#/\#/\) What opinions ?
______________________________________________________________________
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: noconv
iQEVAwUBOAJHJTmhfXaEzMQNAQH80gf/Z87dLuQhxpgP89GV7IH5P5whJsC5OFkW
AVVJrubtjfVRx4Fg4A5K2uQBD2RYgpMWm9OQcdMLr+sABxZb5C6WYL15Gd2ct+Je
Gan7yECdKgKpLQbTVrBuDUAHKVZ1sJ9I1fWm0RvMAxXM4otgYMurHN96n3Kx/kz4
2UIIyFEqB3cQuc/WfJ0SwcpsmZpfs6r3pgNDJIQTBksw0jRaRXPXZ80483CXaYsb
7fpV7O2bjMYiuj3i0Oo1KLBMlMFK+8LPtru/K9eZAz7QFHPBWrUfgMeGWBs/jeJc
szqxSUEa68WFdCRouEBOYOHrAyG+ZkEHW5nejDp/ybnARA3EWR2Rrg==
=yvZX
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]