RE: 2.4.0 oops bdflush
> From: Stephen Clouse [mailto:[EMAIL PROTECTED]] > We have a development SMP machine which runs a myriad of > server applications for > our development purposes -- Apache, Oracle, several others. > Under 2.4.0 the > machine locks up, seemingly at random. Usually it simply > stops responding > without fanfare -- you can, oddly enough, switch consoles > with Alt+F?, but > typing gets no response and all network services have stopped > responding. I've seen exactly this same behavior, on an 8-way Xeon (Dell PowerEdge 8450), with 8GB RAM, but never with either 512MB or 1GB, running 20 instances of a copy-and-compare script using /usr/share/doc from Red Hat Linux 7 as the data source. I see you're using IDE disks, which makes me feel better, as I was testing the new megaraid driver. Magic sysrq works in my case. I've never gotten the oops though, I used magic sysrq to print the IP several times and then tried to look it up. For me, lockup happens in the first few minutes of running this test. I'm happy to try to reproduce it if anyone has suggestions. Thanks, Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0 oops bdflush
We have a development SMP machine which runs a myriad of server applications for our development purposes -- Apache, Oracle, several others. Under 2.4.0 the machine locks up, seemingly at random. Usually it simply stops responding without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but typing gets no response and all network services have stopped responding. However, on the most recent failure I was lucky enough to find that it had managed to spit out a kernel oops message before biting it, which I have (hopefully) decoded (properly): root@fs1:/usr/src/linux.2.4.0# ksymoops -v /usr/src/linux.2.4.0/vmlinux -m \ /usr/src/linux.2.4.0/System.map -o /lib/modules/2.4.0/ ksymoops 2.3.7 on i686 2.2.18. Options used -v /usr/src/linux.2.4.0/vmlinux (specified) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (specified) -m /usr/src/linux.2.4.0/System.map (specified) Error (regular_file): read_ksyms stat /proc/ksyms failed ksymoops: No such file or directory No modules in ksyms, skipping objects No ksyms, skipping lsmod Reading Oops report from the terminal invalid operand: CPU:0 EIP:0010:[] EFLAGS: 00010296 eax: 001c ebx: c1068518 ecx: edx: 0026 esi: c10684fc edi: 021c ebp: 0001 esp: c14f9fa4 ds: 0018 es: 0018 ss: 0018 Process bdflush (pid: 5, stackpage=c14f9000) Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e 0008e000 0004 0040 0110 c01369c2 0007 00010f00 cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc Call Trace: [] [] Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24 invalid operand: CPU:0 EIP:0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010296 eax: 001c ebx: c1068518 ecx: edx: 0026 esi: c10684fc edi: 021c ebp: 0001 esp: c14f9fa4 ds: 0018 es: 0018 ss: 0018 Process bdflush (pid: 5, stackpage=c14f9000) Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e 0008e000 0004 0040 0110 c01369c2 0007 00010f00 cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc Call Trace: [] [] Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24 >>EIP; c012c37e<= Trace; c01369c2 Trace; c0107507 Code; c012c37e <_EIP>: Code; c012c37e<= 0: 0f 0b ud2a <= Code; c012c380 2: 83 c4 0c addl $0xc,%esp Code; c012c383 5: 90nop Code; c012c384 6: 8b 46 14 movl 0x14(%esi),%eax Code; c012c387 9: 85 c0 testl %eax,%eax Code; c012c389 b: 75 19 jne26 <_EIP+0x26> c012c3a4 Code; c012c38b d: 68 99 02 00 00pushl $0x299 Code; c012c390 12: 68 24 00 00 00pushl $0x24 This machine has been running flawlessly on 2.2.18 for weeks now, which seems to preclude a hardware issue. And since I've been personally running 2.4.0 on my uniprocessor machine since day one without incident, I suspect some bizarre interaction in SMP-land. But I'm hardly a kernel programmer Unforunately I can't find exact specs on the machine; it's a Dell Precision 420, most likely built with the hardware du jour about six months ago. The config options used are below: CONFIG_X86=y CONFIG_ISA=y CONFIG_UID16=y CONFIG_M686FXSR=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_PGE=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_FXSR=y CONFIG_X86_XMM=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_NOHIGHMEM=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NET=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_NAMES=y CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_BLK_DEV_FD=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK=y CONFIG_FILTER=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_SYN_COOKIES=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_IDETAPE=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_BLK_DEV_PIIX=y CONFIG_PIIX_TUNING=y CONFIG_IDEDMA_AUTO=y CONFIG_BLK_DEV_IDE_MODES=y CONFIG_NETDEVICES=y CONFIG_NET_ETHERNET=y CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_UNIX98_PTY_COUNT=256 CONFIG_MOUSE=y CONFIG_PSMOUSE=y CONFIG_RTC=y CONFIG_QUOTA=y CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_MINIX_FS=y
2.4.0 oops bdflush
We have a development SMP machine which runs a myriad of server applications for our development purposes -- Apache, Oracle, several others. Under 2.4.0 the machine locks up, seemingly at random. Usually it simply stops responding without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but typing gets no response and all network services have stopped responding. However, on the most recent failure I was lucky enough to find that it had managed to spit out a kernel oops message before biting it, which I have (hopefully) decoded (properly): root@fs1:/usr/src/linux.2.4.0# ksymoops -v /usr/src/linux.2.4.0/vmlinux -m \ /usr/src/linux.2.4.0/System.map -o /lib/modules/2.4.0/ ksymoops 2.3.7 on i686 2.2.18. Options used -v /usr/src/linux.2.4.0/vmlinux (specified) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (specified) -m /usr/src/linux.2.4.0/System.map (specified) Error (regular_file): read_ksyms stat /proc/ksyms failed ksymoops: No such file or directory No modules in ksyms, skipping objects No ksyms, skipping lsmod Reading Oops report from the terminal invalid operand: CPU:0 EIP:0010:[c012c37e] EFLAGS: 00010296 eax: 001c ebx: c1068518 ecx: edx: 0026 esi: c10684fc edi: 021c ebp: 0001 esp: c14f9fa4 ds: 0018 es: 0018 ss: 0018 Process bdflush (pid: 5, stackpage=c14f9000) Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e 0008e000 0004 0040 0110 c01369c2 0007 00010f00 cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc Call Trace: [c01369c2] [c0107507] Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24 invalid operand: CPU:0 EIP:0010:[c012c37e] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010296 eax: 001c ebx: c1068518 ecx: edx: 0026 esi: c10684fc edi: 021c ebp: 0001 esp: c14f9fa4 ds: 0018 es: 0018 ss: 0018 Process bdflush (pid: 5, stackpage=c14f9000) Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e 0008e000 0004 0040 0110 c01369c2 0007 00010f00 cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc Call Trace: [c01369c2] [c0107507] Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24 EIP; c012c37e page_launder+716/868 = Trace; c01369c2 bdflush+96/dc Trace; c0107507 kernel_thread+23/30 Code; c012c37e page_launder+716/868 _EIP: Code; c012c37e page_launder+716/868 = 0: 0f 0b ud2a = Code; c012c380 page_launder+718/868 2: 83 c4 0c addl $0xc,%esp Code; c012c383 page_launder+71b/868 5: 90nop Code; c012c384 page_launder+71c/868 6: 8b 46 14 movl 0x14(%esi),%eax Code; c012c387 page_launder+71f/868 9: 85 c0 testl %eax,%eax Code; c012c389 page_launder+721/868 b: 75 19 jne26 _EIP+0x26 c012c3a4 page_launder+73c/868 Code; c012c38b page_launder+723/868 d: 68 99 02 00 00pushl $0x299 Code; c012c390 page_launder+728/868 12: 68 24 00 00 00pushl $0x24 This machine has been running flawlessly on 2.2.18 for weeks now, which seems to preclude a hardware issue. And since I've been personally running 2.4.0 on my uniprocessor machine since day one without incident, I suspect some bizarre interaction in SMP-land. But I'm hardly a kernel programmer Unforunately I can't find exact specs on the machine; it's a Dell Precision 420, most likely built with the hardware du jour about six months ago. The config options used are below: CONFIG_X86=y CONFIG_ISA=y CONFIG_UID16=y CONFIG_M686FXSR=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_PGE=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_FXSR=y CONFIG_X86_XMM=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_NOHIGHMEM=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NET=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_NAMES=y CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_BLK_DEV_FD=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK=y CONFIG_FILTER=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_SYN_COOKIES=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_IDETAPE=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_BLK_DEV_PIIX=y CONFIG_PIIX_TUNING=y CONFIG_IDEDMA_AUTO=y CONFIG_BLK_DEV_IDE_MODES=y CONFIG_NETDEVICES=y CONFIG_NET_ETHERNET=y
RE: 2.4.0 oops bdflush
From: Stephen Clouse [mailto:[EMAIL PROTECTED]] We have a development SMP machine which runs a myriad of server applications for our development purposes -- Apache, Oracle, several others. Under 2.4.0 the machine locks up, seemingly at random. Usually it simply stops responding without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but typing gets no response and all network services have stopped responding. I've seen exactly this same behavior, on an 8-way Xeon (Dell PowerEdge 8450), with 8GB RAM, but never with either 512MB or 1GB, running 20 instances of a copy-and-compare script using /usr/share/doc from Red Hat Linux 7 as the data source. I see you're using IDE disks, which makes me feel better, as I was testing the new megaraid driver. Magic sysrq works in my case. I've never gotten the oops though, I used magic sysrq to print the IP several times and then tried to look it up. For me, lockup happens in the first few minutes of running this test. I'm happy to try to reproduce it if anyone has suggestions. Thanks, Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/