Philippe Gerum wrote:
On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote:
This problem has probably been solved years ago, but Google and searching this list didn't find me anything.

I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the Adeos patches on an MPC5200 (ppc).

Every now and then when I stress the system it crashes because "ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this system) in this code:

<snip>

So it looks like the interrupt is happening in hardware and being queued and THEN it is being deregistered (with the handler being set to zero in ipipe_virtualize_irq()) and then it is being pulled from the pipe, run and (usually) crashes.

<snip>

I'd be interested in any observations, comments or pointers to the "real cause" and any other "real fixes".

ipipe_virtualize_irq() is an internal service which should be called for
unregistering an IRQ only after the source was shut at device level, and
possibly masked on the interrupt controller. It must be called with
interrupt enabled for the domain which owns the unregistered handler.
On uniprocessor systems, these two conditions are enough to make sure
that no IRQ is lingering in the interrupt log after the handler was
nullified.

I can't spot the routines appearing in the backtrace you sent in the
vanilla linux/xenomai code I have at hand, but if this is a real-time
CAN stack, you may want to check whether the device is properly quiesced
and the IRQ line masked prior to unregistering the interrupt in the
pipeline.

Thanks for your prompt and detailed reply.

Yes, it is a real time CAN stack. It supports Philips SJA1000 CAN chips on Peak Systems PCMCIA cards connected through TI PCI1520 PCI bridge chips (using the "Yenta" drivers) through a Freescale MPC5200's PCI interface The four CAN chips and the PCMCIA Bridge all use a single shared interrupt. The CAN chip interrupts are real-time, but if they find none of the CAN chips are responsible the interrupt is handballed (via XN_ISR_PROPAGATE) to the Linux-based PCMCIA code to see if it was a card insert event.

There's a lot to go wrong. Frankly it is amazing it works as well as it does. I can't guarantee that "all interrupt sources are shut down" at the time of the ipipe_virtualize_irq() because of the PCMCIA sharing.

I've found that checking for a null interrupt vector and ignoring it solves my problem, and makes the code more robust against any other corner cases.

Tom

_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main

Reply via email to