RE: 2.4.0 oops bdflush

2001-01-13 Thread Matt_Domsch

> From: Stephen Clouse [mailto:[EMAIL PROTECTED]]
> We have a development SMP machine which runs a myriad of 
> server applications for
> our development purposes -- Apache, Oracle, several others.  
> Under 2.4.0 the
> machine locks up, seemingly at random.  Usually it simply 
> stops responding
> without fanfare -- you can, oddly enough, switch consoles 
> with Alt+F?, but
> typing gets no response and all network services have stopped
> responding.

I've seen exactly this same behavior, on an 8-way Xeon (Dell PowerEdge
8450), with 8GB RAM, but never with either 512MB or 1GB, running 20
instances of a copy-and-compare script using /usr/share/doc from Red Hat
Linux 7 as the data source.  I see you're using IDE disks, which makes me
feel better, as I was testing the new megaraid driver.  Magic sysrq works in
my case.  I've never gotten the oops though, I used magic sysrq to print the
IP several times and then tried to look it up.  For me, lockup happens in
the first few minutes of running this test.  I'm happy to try to reproduce
it if anyone has suggestions.

Thanks,
Matt






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0 oops bdflush

2001-01-13 Thread Stephen Clouse

We have a development SMP machine which runs a myriad of server applications for
our development purposes -- Apache, Oracle, several others.  Under 2.4.0 the
machine locks up, seemingly at random.  Usually it simply stops responding
without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but
typing gets no response and all network services have stopped
responding.  However, on the most recent failure I was lucky enough to find that
it had managed to spit out a kernel oops message before biting it, which I have 
(hopefully) decoded (properly):

root@fs1:/usr/src/linux.2.4.0# ksymoops -v /usr/src/linux.2.4.0/vmlinux -m \
 /usr/src/linux.2.4.0/System.map -o /lib/modules/2.4.0/
ksymoops 2.3.7 on i686 2.2.18.  Options used
 -v /usr/src/linux.2.4.0/vmlinux (specified)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (specified)
 -m /usr/src/linux.2.4.0/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Reading Oops report from the terminal
invalid operand: 
CPU:0
EIP:0010:[]
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [] []
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24
invalid operand: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [] []
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24

>>EIP; c012c37e<=
Trace; c01369c2 
Trace; c0107507 
Code;  c012c37e 
 <_EIP>:
Code;  c012c37e<=
   0:   0f 0b ud2a  <=
Code;  c012c380 
   2:   83 c4 0c  addl   $0xc,%esp
Code;  c012c383 
   5:   90nop
Code;  c012c384 
   6:   8b 46 14  movl   0x14(%esi),%eax
Code;  c012c387 
   9:   85 c0 testl  %eax,%eax
Code;  c012c389 
   b:   75 19 jne26 <_EIP+0x26> c012c3a4 
Code;  c012c38b 
   d:   68 99 02 00 00pushl  $0x299
Code;  c012c390 
  12:   68 24 00 00 00pushl  $0x24

This machine has been running flawlessly on 2.2.18 for weeks now, which seems to
preclude a hardware issue.  And since I've been personally running 2.4.0 on my
uniprocessor machine since day one without incident, I suspect some bizarre
interaction in SMP-land.  But I'm hardly a kernel programmer

Unforunately I can't find exact specs on the machine; it's a Dell Precision 420,
most likely built with the hardware du jour about six months ago.  The config
options used are below:

CONFIG_X86=y
CONFIG_ISA=y
CONFIG_UID16=y
CONFIG_M686FXSR=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_FXSR=y
CONFIG_X86_XMM=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_SMP=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_NET=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_BLK_DEV_FD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_FILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_SYN_COOKIES=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDETAPE=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_PIIX_TUNING=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_QUOTA=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_MINIX_FS=y

2.4.0 oops bdflush

2001-01-13 Thread Stephen Clouse

We have a development SMP machine which runs a myriad of server applications for
our development purposes -- Apache, Oracle, several others.  Under 2.4.0 the
machine locks up, seemingly at random.  Usually it simply stops responding
without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but
typing gets no response and all network services have stopped
responding.  However, on the most recent failure I was lucky enough to find that
it had managed to spit out a kernel oops message before biting it, which I have 
(hopefully) decoded (properly):

root@fs1:/usr/src/linux.2.4.0# ksymoops -v /usr/src/linux.2.4.0/vmlinux -m \
 /usr/src/linux.2.4.0/System.map -o /lib/modules/2.4.0/
ksymoops 2.3.7 on i686 2.2.18.  Options used
 -v /usr/src/linux.2.4.0/vmlinux (specified)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (specified)
 -m /usr/src/linux.2.4.0/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Reading Oops report from the terminal
invalid operand: 
CPU:0
EIP:0010:[c012c37e]
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [c01369c2] [c0107507]
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24
invalid operand: 
CPU:0
EIP:0010:[c012c37e]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [c01369c2] [c0107507]
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24

EIP; c012c37e page_launder+716/868   =
Trace; c01369c2 bdflush+96/dc
Trace; c0107507 kernel_thread+23/30
Code;  c012c37e page_launder+716/868
 _EIP:
Code;  c012c37e page_launder+716/868   =
   0:   0f 0b ud2a  =
Code;  c012c380 page_launder+718/868
   2:   83 c4 0c  addl   $0xc,%esp
Code;  c012c383 page_launder+71b/868
   5:   90nop
Code;  c012c384 page_launder+71c/868
   6:   8b 46 14  movl   0x14(%esi),%eax
Code;  c012c387 page_launder+71f/868
   9:   85 c0 testl  %eax,%eax
Code;  c012c389 page_launder+721/868
   b:   75 19 jne26 _EIP+0x26 c012c3a4 page_launder+73c/868
Code;  c012c38b page_launder+723/868
   d:   68 99 02 00 00pushl  $0x299
Code;  c012c390 page_launder+728/868
  12:   68 24 00 00 00pushl  $0x24

This machine has been running flawlessly on 2.2.18 for weeks now, which seems to
preclude a hardware issue.  And since I've been personally running 2.4.0 on my
uniprocessor machine since day one without incident, I suspect some bizarre
interaction in SMP-land.  But I'm hardly a kernel programmer

Unforunately I can't find exact specs on the machine; it's a Dell Precision 420,
most likely built with the hardware du jour about six months ago.  The config
options used are below:

CONFIG_X86=y
CONFIG_ISA=y
CONFIG_UID16=y
CONFIG_M686FXSR=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_FXSR=y
CONFIG_X86_XMM=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_SMP=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_NET=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_BLK_DEV_FD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_FILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_SYN_COOKIES=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDETAPE=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_PIIX_TUNING=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y

RE: 2.4.0 oops bdflush

2001-01-13 Thread Matt_Domsch

 From: Stephen Clouse [mailto:[EMAIL PROTECTED]]
 We have a development SMP machine which runs a myriad of 
 server applications for
 our development purposes -- Apache, Oracle, several others.  
 Under 2.4.0 the
 machine locks up, seemingly at random.  Usually it simply 
 stops responding
 without fanfare -- you can, oddly enough, switch consoles 
 with Alt+F?, but
 typing gets no response and all network services have stopped
 responding.

I've seen exactly this same behavior, on an 8-way Xeon (Dell PowerEdge
8450), with 8GB RAM, but never with either 512MB or 1GB, running 20
instances of a copy-and-compare script using /usr/share/doc from Red Hat
Linux 7 as the data source.  I see you're using IDE disks, which makes me
feel better, as I was testing the new megaraid driver.  Magic sysrq works in
my case.  I've never gotten the oops though, I used magic sysrq to print the
IP several times and then tried to look it up.  For me, lockup happens in
the first few minutes of running this test.  I'm happy to try to reproduce
it if anyone has suggestions.

Thanks,
Matt






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/