[1.] One line summary of the problem: PROBLEM: various oops's in 2.2.19 SMP kernel. [2.] Full description of the problem/report: The machine runs along fine for a couple of weeks, and will eventually hang with an oops. I have had this happen approximately 5 times to date with different 2.2.x kernels, but have not reported it until now. The ooops's are not always the same. Latest oops (2.2.19 SMP) ------------------------------------------- Unable to handle kernel NULL pointer dereference at virtual address 00000100 current->tss.cr3 = 1c6ad000, %%cr3 = 1c6ad000 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[find_buffer+104/144] EFLAGS: 00010206 eax: 00000100 ebx: 00000007 ecx: 0005ff98 edx: 00000100 esi: 0000000d edi: 00003006 ebp: 0003198f esp: e426dee4 ds: 0018 es: 0018 ss: 0018 Process postmaster (pid: 13314, process nr: 30, stackpage=e426d000) Stack: edffc498 00000000 0005ff98 c012bd04 00003006 0003198f 00001000 00000000 c0142d42 00003006 0003198f 00001000 000003fc 00000000 c0142ece edffc498 d9eecff0 00000000 0000001b 00000000 edffc498 00000000 d18c5320 c0142f4e Call Trace: [get_hash_table+24/76] [sync_block+46/216] [sync_indirect+78/128] [sync_dindirect+78/128] [ext2_sync_file+103/164] [sys_fsync+143/200] [system_call+52/56] [_stext+43/164] ------------------------------------------- An older oops (2.2.16 SMP (redhat's kernel - redhats patches)) ------------------------------------------- Unable to handle kernel NULL pointer dereference at virtual address 00000134 current->tss.cr3 = 03b8f000, %%cr3 = 03b8f000 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[remove_from_queues+169/328] EFLAGS: 00010206 eax: 00000100 ebx: de47ae40 ecx: de47ae40 edx: e4b886c0 esi: 0000000c edi: 00000000 ebp: 00000007 esp: c2339ed0 ds: 0018 es: 0018 ss: 0018 Process postmaster (pid: 18747, process nr: 54, stackpage=c2339000) Stack: 0007a671 c012b752 de47ae40 de47ae40 ece48420 c012bf5a de47ae40 c0148b19 de47ae40 00000000 eb28c000 00000000 eb28c0d0 00001000 ec54d0b8 00000008 00000400 0007a66a 00000000 0000002e c0148edf eb28c000 0000000c eb28c0cc Call Trace: [put_last_free+50/124] [__bforget+34/40] [trunc_indirect+505/692] [ext2_truncate+107/516] [do_truncate+85/132] [do_truncate+105/132] [filp_open+172/240] [sys_ftruncate+322/368] [system_call+52/56] Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d ------------------------------------------- THe machine is a dual CPU (both PIII 500's) SMP machine with a Mylex DAC960 raid controller [3.] Keywords (i.e., modules, networking, kernel): linux kernel 2.2.19 SMP oops 0000 [find_buffer+104/144] loaded modules: 3c59x DAC960 [4.] Kernel version (from /proc/version): Linux version 2.2.19 ([EMAIL PROTECTED]) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Fri Apr 27 10:49:53 CDT 2001 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) ----------------------------------------------------------------------- Latest oops (2.2.19 SMP vanilla) Options used: -V (default) -o /lib/modules/2.2.19/ (default) -k /proc/ksyms (default) -l /proc/modules (default) -m /boot/System.map (specified) -c 1 (default) Unable to handle kernel NULL pointer dereference at virtual address 00000134 current->tss.cr3 = 03b8f000, %%cr3 = 03b8f000 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[remove_from_queues+169/328] EFLAGS: 00010206 eax: 00000100 ebx: de47ae40 ecx: de47ae40 edx: e4b886c0 esi: 0000000c edi: 00000000 ebp: 00000007 esp: c2339ed0 ds: 0018 es: 0018 ss: 0018 Process postmaster (pid: 18747, process nr: 54, stackpage=c2339000) Stack: 0007a671 c012b752 de47ae40 de47ae40 ece48420 c012bf5a de47ae40 c0148b19 de47ae40 00000000 eb28c000 00000000 eb28c0d0 00001000 ec54d0b8 00000008 00000400 0007a66a 00000000 0000002e c0148edf eb28c000 0000000c eb28c0cc Call Trace: [put_last_free+50/124] [__bforget+34/40] [trunc_indirect+505/692] [ext2_truncate+107/516] [do_truncate+85/132] [do_truncate+105/132] [filp_open+172/240] [sys_ftruncate+322/368] [system_call+52/56] Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d Code: 00000000 Before first symbol 00000000 <_IP>: <=== Code: 00000000 Before first symbol 0: 89 50 34 mov %edx,0x34(%eax) <=== Code: 00000003 Before first symbol 3: c7 01 00 00 00 00 movl $0x0,(%ecx) Code: 00000009 Before first symbol 9: 89 02 mov %eax,(%edx) Code: 0000000b Before first symbol b: c7 41 34 00 00 00 00 movl $0x0,0x34(%ecx) Code: 00000012 Before first symbol 12: ff 0d 00 00 00 00 decl 0x0 833 warnings issued. Results may not be reliable. ----------------------------------------------------------------------- Older oops (2.2.16-3 SMP (redhat stock) Options used: -V (default) -o /lib/modules/2.2.19/ (default) -k /proc/ksyms (default) -l /proc/modules (default) -m /boot/System.map-2.2.16-3smp (specified) -c 1 (default) Unable to handle kernel NULL pointer dereference at virtual address 00000134 current->tss.cr3 = 03b8f000, %%cr3 = 03b8f000 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[remove_from_queues+169/328] EFLAGS: 00010206 eax: 00000100 ebx: de47ae40 ecx: de47ae40 edx: e4b886c0 esi: 0000000c edi: 00000000 ebp: 00000007 esp: c2339ed0 ds: 0018 es: 0018 ss: 0018 Process postmaster (pid: 18747, process nr: 54, stackpage=c2339000) Stack: 0007a671 c012b752 de47ae40 de47ae40 ece48420 c012bf5a de47ae40 c0148b19 de47ae40 00000000 eb28c000 00000000 eb28c0d0 00001000 ec54d0b8 00000008 00000400 0007a66a 00000000 0000002e c0148edf eb28c000 0000000c eb28c0cc Call Trace: [put_last_free+50/124] [__bforget+34/40] [trunc_indirect+505/692] [ext2_truncate+107/516] [do_truncate+85/132] [do_truncate+105/132] [filp_open+172/240] [sys_ftruncate+322/ 368] [system_call+52/56] Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d Code: 00000000 Before first symbol 00000000 <_IP>: <=== Code: 00000000 Before first symbol 0: 89 50 34 mov %edx,0x34(%eax) <=== Code: 00000003 Before first symbol 3: c7 01 00 00 00 00 movl $0x0,(%ecx) Code: 00000009 Before first symbol 9: 89 02 mov %eax,(%edx) Code: 0000000b Before first symbol b: c7 41 34 00 00 00 00 movl $0x0,0x34(%ecx) Code: 00000012 Before first symbol 12: ff 0d 00 00 00 00 decl 0x0 843 warnings issued. Results may not be reliable. [6.] A small shell script or example program which triggers the problem (if possible) Not applicable. Cant reproduce easily. The oops's have been happening anytime after 3 to 6 weeks of uptime. [7.] Environment Machine is an SMP dual PIII 500 machine with 768MB RAM, Mylex DAC960 RAID controller in RAID 5 configuration with 5 active disks, 1 standby disk. this machine functions as a postgresql (7.0.3) server. Nothing else signifigant is running on this machine. The machine is running redhat 6.2 with all of the released updates applied, and is running a stock 2.2.19 kernel that was downloaded from ftp.us.kernel.org and was compiled on this same machine. [7.1.] Software (add the output of the ver_linux script here) Linux deathstar.gkg-com.com 2.2.19 #1 SMP Fri Apr 27 10:49:53 CDT 2001 i686 unknown Gnu C egcs-2.91.66 Gnu make 3.78.1 binutils 2.9.5.0.22 util-linux 2.10r modutils 2.3.21 e2fsprogs 1.18 pcmcia-cs 3.1.8 Linux C Library 2.1.3 Dynamic linker (ldd) 2.1.3 Procps 2.0.6 Net-tools 1.54 Console-tools 0.3.3 Sh-utils 2.0 Modules Loaded 3c59x DAC960 [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping : 3 cpu MHz : 501.147 cache size : 512 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm bogomips : 999.42 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping : 3 cpu MHz : 501.147 cache size : 512 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm bogomips : 999.42 [7.3.] Module information (from /proc/modules): 3c59x 22480 2 (autoclean) DAC960 60848 3 [7.4.] SCSI information (from /proc/scsi/scsi) /proc/scsi/scsi: Attached devices: none /proc/rd/c0/current_status: ***** DAC960 RAID Driver Version 2.2.10 of 1 February 2001 ***** Copyright 1998-2001 by Leonard N. Zubkoff <[EMAIL PROTECTED]> Configuring Mylex DAC960PTL1 PCI RAID Controller Firmware Version: 4.07-0-29, Channels: 1, Memory Size: 8MB PCI Bus: 0, Device: 18, Function: 1, I/O Address: Unassigned PCI Address: 0xFC8FE000 mapped at 0xF0810000, IRQ Channel: 18 Controller Queue Depth: 124, Maximum Blocks per Command: 128 Driver Queue Depth: 123, Scatter/Gather Limit: 33 of 33 Segments Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 Physical Devices: 0:0 Vendor: SEAGATE Model: ST39175LW Revision: 0001 Serial Number: 3AL0NXK100007008KQ52 Disk Status: Online, 17782784 blocks 0:1 Vendor: SEAGATE Model: ST39175LW Revision: 0001 Serial Number: 3AL0P60T00007012R69K Disk Status: Online, 17782784 blocks 0:2 Vendor: SEAGATE Model: ST39175LW Revision: 0001 Serial Number: 3AL0P4VE00007012R7QV Disk Status: Online, 17782784 blocks 0:3 Vendor: SEAGATE Model: ST39175LW Revision: 0001 Serial Number: 3AL0P3V700001005HKUC Disk Status: Standby, 17782784 blocks 0:4 Vendor: SEAGATE Model: ST39175LW Revision: 0001 Serial Number: 3AL0P3SW000070113RRF Disk Status: Online, 17782784 blocks 0:5 Vendor: SEAGATE Model: ST39175LW Revision: 0001 Serial Number: 3AL0P1MV00007012RDGX Disk Status: Online, 17782784 blocks Logical Drives: /dev/rd/c0d0: RAID-5, Online, 71131136 blocks, Write Thru No Rebuild or Consistency Check in Progress [7.5.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): None.S [X.] Other notes, patches, fixes, workarounds: None. Please email me directly if you need any additional information on this problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/