Philippe Gerum wrote:
On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote:
This problem has probably been solved years ago, but Google and
searching this list didn't find me anything.
I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the
Adeos patches on an MPC5200 (ppc).
Every now and then when I stress the system it crashes because
"ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this
system) in this code:
<snip>
So it looks like the interrupt is happening in hardware and being queued
and THEN it is being deregistered (with the handler being set to zero in
ipipe_virtualize_irq()) and then it is being pulled from the pipe, run
and (usually) crashes.
<snip>
I'd be interested in any observations, comments or pointers to the "real
cause" and any other "real fixes".
ipipe_virtualize_irq() is an internal service which should be called for
unregistering an IRQ only after the source was shut at device level, and
possibly masked on the interrupt controller. It must be called with
interrupt enabled for the domain which owns the unregistered handler.
On uniprocessor systems, these two conditions are enough to make sure
that no IRQ is lingering in the interrupt log after the handler was
nullified.
I can't spot the routines appearing in the backtrace you sent in the
vanilla linux/xenomai code I have at hand, but if this is a real-time
CAN stack, you may want to check whether the device is properly quiesced
and the IRQ line masked prior to unregistering the interrupt in the
pipeline.
Thanks for your prompt and detailed reply.
Yes, it is a real time CAN stack. It supports Philips SJA1000 CAN chips
on Peak Systems PCMCIA cards connected through TI PCI1520 PCI bridge
chips (using the "Yenta" drivers) through a Freescale MPC5200's PCI
interface The four CAN chips and the PCMCIA Bridge all use a single
shared interrupt. The CAN chip interrupts are real-time, but if they
find none of the CAN chips are responsible the interrupt is handballed
(via XN_ISR_PROPAGATE) to the Linux-based PCMCIA code to see if it was a
card insert event.
There's a lot to go wrong. Frankly it is amazing it works as well as it
does. I can't guarantee that "all interrupt sources are shut down" at
the time of the ipipe_virtualize_irq() because of the PCMCIA sharing.
I've found that checking for a null interrupt vector and ignoring it
solves my problem, and makes the code more robust against any other
corner cases.
Tom
_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main