my apologies for the cross-posting ... but since that matter is urgent for 
me and noone on the smp mailinglist have answered so far, i hope that i 
will find somebody on this list who can give me advice :)

thanks!

---------- Forwarded Message ----------
Date: 10/05/00 00:19:53 +0200
From: Matthias Weidle <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: OOPS after upgrading CPU's ...

hi there!

there is some strange stuff going on here and after checking all sources of
information (without success) i hope that one of you may have the answer
... :)

ok, here is the problem:

i'm running a smp server machine (mostly doing file server stuff) which was
running pretty stable with 2 celeron-400 cpu's. i got about 60 days uptime
without problems - even under heavy load! a few weeks ago i decided to
upgrade the celeron cpu's to some older p3's (those with 512kb cache, no
coppermine) and did not expect any complications with that upgrade. but
since then i can't get the machine up for more than a couple of days
(depending on the load). sooner or later it locks with the following kernel
oops message:

ksymoops 2.3.4 on i686 2.2.15pre19ext3.  Options used
      -v /usr/src/linux/vmlinux (specified)
      -k /proc/ksyms (default)
      -l /proc/modules (default)
      -o /lib/modules/2.2.15pre19ext3/ (default)
      -m /usr/src/linux/System.map (default)

Warning (compare_ksyms_lsmod): module i2c-isa is in lsmod but not in ksyms,
probably no symbols exported Warning (compare_ksyms_lsmod): module
i2c-piix4 is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module nfsd is in lsmod but not in ksyms,
probably no symbols exported Warning (compare_ksyms_lsmod): module w83781d
is in lsmod but not in ksyms, probably no symbols exported Unable to handle
kernel NULL pointer dereference at virtual address 00000013
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<c01100c5>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: 00000013   ebx: 00000260   ecx: cc100480   edx: cbffa000
esi: cbffa000   edi: 00000013   ebp: cbffbf74   esp: cbffbf4c
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, process nr: 1, stackpage=cbffb000)
Stack: 00000013 c01104c1 00000013 cbffbf7c cbffa000 c0226020 c010b850
00000013 cbffbf7c cbffa000 00000000 c010a328 cbffa000 cbffa000 cbffa000
        cbffa000 c0226020 00000000 00000080 00000018 cbff0018 ffffff13
        c0107b15 00000010 Call Trace: [<c01104c1>] [<c010b850>]
[<c010a328>] [<c0107b15>] [<c019c875>] [<c01166b7>]
Code: e0 28 21 c0 8b 04 85 e4 28 21 c0 83 f8 ff 74 53 bf 00 e0 ff

>>EIP; c01100c5 <mask_IO_APIC_irq+d/84>   <=====
Trace; c01104c1 <do_level_ioapic_IRQ+21/98>
Trace; c010b850 <do_IRQ+38/58>
Trace; c010a328 <common_interrupt+18/20>
Trace; c0107b15 <cpu_idle+3d/50>
Trace; c019c875 <vt_console_print+2fd/314>
Trace; c01166b7 <printk+177/184>
Code;  c01100c5 <mask_IO_APIC_irq+d/84>
00000000 <_EIP>:
Code;  c01100c5 <mask_IO_APIC_irq+d/84>   <=====
    0:   e0 28                     loopne 2a <_EIP+0x2a> c01100ef
    <mask_IO_APIC_irq+37/84> <===== Code;  c01100c7 <mask_IO_APIC_irq+f/84>
    2:   21 c0                     andl   %eax,%eax
Code;  c01100c9 <mask_IO_APIC_irq+11/84>
    4:   8b 04 85 e4 28 21 c0      movl   0xc02128e4(,%eax,4),%eax
Code;  c01100d0 <mask_IO_APIC_irq+18/84>
    b:   83 f8 ff                  cmpl   $0xffffffff,%eax
Code;  c01100d3 <mask_IO_APIC_irq+1b/84>
    e:   74 53                     je     63 <_EIP+0x63> c0110128
    <mask_IO_APIC_irq+70/84> Code;  c01100d5 <mask_IO_APIC_irq+1d/84>
   10:   bf 00 e0 ff 00            movl   $0xffe000,%edi

Kernel panic: Attempted to kill the idle task!

4 warnings issued.  Results may not be reliable.


there have been 4-5 lockups since the upgrade and it was always the same
oops message.


for the record some additional data about the server box:

soltek sl-68a dual slot1 motherboard (with latest h4 bios)
2 p3-550 with 512kb cache
promise udma66 controler
intel etherexpress nic
64 + 128 mb ram (pc100)
6 hdd's (maxtor and ibm drives)

kernel: 2.2.15pre20 (thats pretty much 2.2.16 i guess)
+ ide patch
+ ext3 patch
+ ppdd patch


if you need any additional data please don't hesitate to contact me for
that!

is it really possible to break the stability of a box by simply upgrading
to a better cpu? my first idea was bad ram ... because it is running at 100
mhz now (66 with the celerons). but then i realized that this would be
pretty unlikely considering the same oops message all the time.

is there somebody out there who can help me?


best regards,
matt.



---------- End Forwarded Message ----------




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to