On Wed, 2012-04-11 at 11:33 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-04-11 at 11:13 +1000, Benjamin Herrenschmidt wrote:
> > On Sat, 2012-04-07 at 14:27 +0200, Andreas Schwab wrote:
> > > Benjamin Herrenschmidt <b...@kernel.crashing.org> writes:
> > > 
> > > > It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
> > > > still ... it's been around for ever and things worked :-) So something
> > > > -else- is causing the problem and I'd like to understand what exactly.
> > > 
> > > AFAICS before a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5
> > > irq_set_irq_type(,NONE) was actually a no-op.
> > 
> > So I'm still a bit baffled... ie, I understand some of what's happening
> > but not why it breaks things, I haven't yet managed to reproduce but I
> > haven't tried too hard just yet (was away from the HW) :
> 
> Allright, I have a repro-case, I'll dig.

Ok, so it's Grant's fault :-)

So basically, it's quite subtle and I'm only 99% sure of the details but
I believe what happens is:

 - The audio interrupts get virq 64 and 65 (so above NR_IRQS)

 - The reverse map isn't pre-filled at map time (we should probably do
it nowadays), we do it lazily so ...

 - The audio interrupt happens, it tries to pre-fill the reverse map
and ... fails (see below)

 - This cause irq_linear_revmap() to returns 0

 - This hits another bug in mpic where when that happens it doesn't EOI
the interrupt, which means the priority remains stuck high and we don't
get any other interrupt.

There's thus two bugs here:

 - We don't properly establish the reverse mapping for 64 and 65 (well,
not always, again see below)

 - We don't EOI an interrupt we can't reverse map (we should warn & EOI,
I'll send a patch for that).

The second problem is a bug we shouldn't hit if the first one didn't
happen, the first one comes from way irq_find_mapping() works. Basically
when the revmap entry is 0, we call it to find the mapping & populate
the revmap... and that fails.

That fails because it will only search up to irq_virq_count which is
statically initialized to NR_IRQS, which in our case is 64 so it won't
find our interrupts 64 and 65... 

The reason it works if you don't do the set_type(NONE) is a fluke, it
will crash as soon as you actually do audio:

The set_type(NONE) has the effect in mpic to configure the interrupt to
level low (default). This is also the idle state of the audio interrupt
(which is positive edge), and the MPIC appears to be latching it (yeah
odd, it does seem to latch level interrupts).

So as soon as the audio driver enables it, it fires, triggering the bug.
Without the set_type(NONE) the occurence of the bug is delayed to the
fist time you get a real audio interrupt.

Now what is the solution ? Well, there's several things I think we need
to do:

 - MPIC should properly EOI interrupts it can't revmap (I'll fix that)

 - MPIC should probably not do the set_type(NONE) on map, it's not
useful

 - However, we do have a risk of discrepency between the default trigger
type in the irq_desc vs. the default HW value at startup/map time, so we
do need to do something to reconcile them (that was the intend of NONE I
think)... without that, we risk having irq_create_of_mapping() not
setting the trigger because it thinks it thinks it's right... What do
you think is best here ? We can probably read the HW config and sync the
irq desc appropriately ...

 - The whole business with irq_virq_count needs fixing. Basically the
default value shouldn't be NR_IRQ. I suggest making it 0 and have all
the use sites do something like:

        max = irq_virq_count ? irq_virq_count : nr_irqs;

(Grant, can you take care of that ?)

 - We still need to clear up all the NR_IRQS occurences in arch/powerpc

Cheers,
Ben.


_______________________________________________
devicetree-discuss mailing list
devicetree-discuss@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/devicetree-discuss

Reply via email to