Re: SA110/SA1100 possible bug or kernel bug? (long) ...

Anonymous Sat, 12 Jun 1999 07:05:09 -0700
Nicolas Pitre writes:
> lpn: memory violation at pc=0x020140c0, lr=0x020282f0 
>     (bad address=0x020140c0, code 1)
> pc : [<020140c0>]    lr : [<020282f0>]
> r7 : 01fa582c  r6 : 020f4090  r5 : 020adf88  r4 : 00001800

Well, looking at the register dump, the page fault handler has
not been called via the prefetch handler.  Also, according to
my ARM ARM documentation, the FAR is not updated on prefetch
abort handlers, but we know how reliable that is.

>  20140c0:       e5864000        str     r4, [r6]

When your hack is triggered, could you try calling
show_pte(current->mm, regs->ARM_pc) please?  This should tell
you what's in the page tables for the task.

Also, could you try to find out what the value is in memory
there when the handler is called?

> What is really weird about it is the fact that the faulty address is equal
> to the pc.  However, r6 which is used to do the str contains actually a
> good address and is quite different from the pc.

Is there any chance you could try modifing the instruction to
be pre-indexed instead of post-indexed?

> 1) In some situations, the CPU generates a data abort exception
> instead of a prefetch abort exception as it should be.  This
> would explain why the faulty address is equal to the pc.  And
> since this happens in the middle of a page and there is no way
> to jump exacly there from another page, this should hapen right
> after a context switch.  However the data abort handler gets
> the offending memory address from the FAR register but the
> documentation says that it is used only for data abort exceptions.
> So is the FAR updated for prefetch abort exception too?  If not,
> this might not be a wrongly identified prefetch exception but
> really a data abort exception.  And since the data abort handler
> substract 8 from the pc instead of 4, the pc and faulting address
> shouldn't match.

A way of checking this would be to introduce a new field in the task
structure which contains the PC that the context switch switched to.
This can be found on the kernel stack, at stack_base+4084.  Then, when
the problem occurs, you can find out where the context switch returned
control to.

> 2) In some situations, maybe when the process is restarted after
> a context switch or similar, the str opcode takes the pc register
> instead of the r6 register in this case to dereference the address
> to use for storing.  This would fault since the text segment is
> mapped read-only.  But here if the pc register was actually used
> it would have been 8 bytes ahead from the instruction's address,
> which isn't the case.

It indeed would fault, and the conditions that the register dump
are indicating are in fact indicating a user mode store to the
current PC location.

My `bug' on just one NetWinder (but not another) seemed to be
an apparant random pipeline error.  I never did get this resolved
by CCC/HCC/whoever it is, and it's still sitting around here.
Unfortunately, when I sent it back to them, they just tested it
with their stuff, and didn't find anything wrong.  Yet, the same
code running on two supposed identical NetWinders caused one to
crash but not the other.  I'm not certain what I can do about this
NetWinder now - I now use it solely for testing kernels on, but
nothing else since it can't be trusted.
   _____
  |_____| ------------------------------------------------- ---+---+-
  |   |        Russell King       [EMAIL PROTECTED]      --- ---
  | | | |  http://www.arm.linux.org.uk/~rmk/armlinux.html    /  /  |
  | +-+-+                                                     --- -+-
  /   |               THE developer of ARM Linux              |+| /|\
 /  | | |                                                     ---  |
    +-+-+ -------------------------------------------------  /\\\  |
unsubscribe: body of `unsubscribe linux-arm' to [EMAIL PROTECTED]
Re: SA110/SA1100 possible bug or kernel bug? (long) ...

Reply via email to