It this a system you are just bringing up or one that's been running 
for a while.  It really seems like memory corruption of some form.  
I'd suggest checking memory controller settings.

Also, what happens if you disassemble the kernel image and look at 
the addresses pointed to by NIP:
C00DEE18 & C002CE68.

- k
Dear Kumar:
 
We have two systems. One based on an 8241, and one based on an 8541. The 8241 
has been running for some time with Linux 2.4 and the 8541 is coming up. Both 
are using the 2.6.17.11 kernel from kernel.org with modifications for our 
hardware.
 
In the case of the 8241, I started out with the 2.4 modifications, which were 
originally based on the 8260 and ported them to 2.6. In the case of the 8541, I 
started out with the embedded planet 8555EP 2.6 kernel source and added that to 
the 2.6.
 
I dont see this exception in the 8541, although extensive testing has not yet 
been completed. The 8241 exhibits this exception on three different 8241 
boards, so I dont suspect the hardware.
 
We are using the Montavista toolchain and their root filesystem including 'tar' 
and 'cp' which are the programs that currently exhibit the fault.
 
Yesterday, when I saw an NIP at 0x900, I was ready to jump on the interrupts 
not being setup correctly, but after a few hours of going through that, I am 
now convinced the interrupts are setup correctly, so it is something more 
subtle.
 
Certainly, memory corruption is the next thing to be concerned with. 
 
One thing that has concerned me a bit is that we have no swap space available 
at all. This is an embedded system with 64MByte of RAM and JFFS2 NAND flash 
with no swap partitions.
 
I suspect auditing the MMU setup differences between the original 2.4 kernel 
and the new 2.6 kernel for the 8241 board is the next step.
 
The three exceptions I saw yesterday were 1)0x900 in the timer_interrupt, 2) 
C00DEE18 (inside the tar program) and 3) C002CE68 (in one of the kernel 
routines). 
 
I suspect the actual addresses are red-herrings and this exception can occur at 
any address. This certainly would tend to indicate some sort of memory setup 
issue.
 
Changing the Oops logic to printout the NextInstruction as well as the NIP 
might be helpful so I could discern the difference between what the program is 
trying to do and what it is really doing.
 
Are there any other thoughts you might have on diagnosis techniques at this 
point?
 
Charles
 
 
In the meantime, any thoughts you might have on methods to di
_______________________________________________
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

Reply via email to