Re: [PATCH] x86_64 irq: check remote IRR bit before migrating level triggered irq (v2)
On Thu, May 31, 2007 at 07:29:43AM -0600, Eric W. Biederman wrote: > Andi Kleen <[EMAIL PROTECTED]> writes: > > > On Thursday 31 May 2007 13:34:21 Eric W. Biederman wrote: > >> > >> On x86_64 kernel, level triggered irq migration gets initiated in the > >> context > >> of that interrupt(after executing the irq handler) and following steps are > >> followed to do the irq migration. > > > > What's the confidence level in the patch? Is it a .22 candidate? > > Yes. > > I just ran it for a little while and nothing fell over, and level > triggered irqs continued to migrate. So it's not horribly broken. > > Outside of implementation errors the worst case that can happen with > this code is the current behavior, with a small slow down. > > So since I can not reproduce the original problem I would appreciate > an eyeball or two more just to make certain I reimplemented this > correctly and didn't do anything stupid. But I don't expect anything > to be found. The code is pretty straight forward. Yes. We want this to get included in .22. We did extensive testing with our patch before. We will also do more tests with this version, however. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64 irq: check remote IRR bit before migrating level triggered irq (v3)
On Thu, May 31, 2007 at 07:50:58AM -0600, Eric W. Biederman wrote: > > On x86_64 kernel, level triggered irq migration gets initiated in the context > of that interrupt(after executing the irq handler) and following steps are > followed to do the irq migration. > > 1. mask IOAPIC RTE entry; // write to IOAPIC RTE > 2. EOI; // processor EOI write > 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and > // and interrupt vector due to per cpu vector > // allocation. > 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE > > Because of the per cpu vector allocation in x86_64 kernels, when the irq > migrates to a different cpu, new vector(corresponding to the new cpu) will > get allocated. > > An EOI write to local APIC has a side effect of generating an EOI write > for level trigger interrupts (normally this is a broadcast to all IOAPICs). > The EOI broadcast generated as a side effect of EOI write to processor may > be delayed while the other IOAPIC writes (step 3 and 4) can go through. > > Normally, the EOI generated by local APIC for level trigger interrupt > contains vector number. The IOAPIC will take this vector number and > search the IOAPIC RTE entries for an entry with matching vector number and > clear the remote IRR bit (indicate EOI). However, if the vector number is > changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI > is received later. This will cause the remote IRR to get stuck causing the > interrupt hang (no more interrupt from this RTE). > > Current x86_64 kernel assumes that remote IRR bit is cleared by the time > IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR > bit and if it still set, delay the irq migration to the next interrupt > arrival event(hopefully, next time remote IRR bit will get cleared > before the IOAPIC RTE is reprogrammed). > > Initial analysis and patch from Nanhai. > > Clean up patch from Suresh. > > Rewritten to be less intrusive, and to contain a big fat comment by Eric. Acked-by: Suresh Siddha <[EMAIL PROTECTED]> Thanks Eric. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64 irq: check remote IRR bit before migrating level triggered irq (v3)
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > Initial analysis and patch from Nanhai. > > Clean up patch from Suresh. > > Rewritten to be less intrusive, and to contain a big fat comment by > Eric. thanks Erik! Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64 irq: check remote IRR bit before migrating level triggered irq (v2)
looks good to me: Acked-by: Ingo Molnar <[EMAIL PROTECTED]> with a few minor style nits: > +static int io_apic_level_ack_pending(unsigned int irq) > +{ > + struct irq_pin_list *entry; > + unsigned long flags; > + int pending = 0; > + spin_lock_irqsave(&ioapic_lock, flags); newline after variable sections please. > + entry = irq_2_pin + irq; > + for (;;) { > + unsigned int reg; > + int pin; > + pin = entry->pin; ditto. > + if (pin == -1) > + break; > + reg = io_apic_read(entry->apic, 0x10 + pin*2); > + /* Is the remote IRR bit set? */ > + pending |= (reg >> 14) & 1; > + if (!entry->next) > + break; > + entry = irq_2_pin + entry->next; > + } > + spin_unlock_irqrestore(&ioapic_lock, flags); > + return pending; optional: it looks a bit better with a newline before the 'return' statement. > +} > + > + and here there's one too much newline :-) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64 irq: check remote IRR bit before migrating level triggered irq (v2)
Andi Kleen <[EMAIL PROTECTED]> writes: > On Thursday 31 May 2007 13:34:21 Eric W. Biederman wrote: >> >> On x86_64 kernel, level triggered irq migration gets initiated in the context >> of that interrupt(after executing the irq handler) and following steps are >> followed to do the irq migration. > > What's the confidence level in the patch? Is it a .22 candidate? Yes. I just ran it for a little while and nothing fell over, and level triggered irqs continued to migrate. So it's not horribly broken. Outside of implementation errors the worst case that can happen with this code is the current behavior, with a small slow down. So since I can not reproduce the original problem I would appreciate an eyeball or two more just to make certain I reimplemented this correctly and didn't do anything stupid. But I don't expect anything to be found. The code is pretty straight forward. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64 irq: check remote IRR bit before migrating level triggered irq (v2)
On Thursday 31 May 2007 13:34:21 Eric W. Biederman wrote: > > On x86_64 kernel, level triggered irq migration gets initiated in the context > of that interrupt(after executing the irq handler) and following steps are > followed to do the irq migration. What's the confidence level in the patch? Is it a .22 candidate? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > Eric, > > On Fri, May 18, 2007 at 07:40:53AM -0700, Eric W. Biederman wrote: >> Still in any of those I don't see a problem with switching to edge >> triggered mode and then back again. Either Remote IRR will keep >> it's current state or it will be cleared. Remote IRR should not >> get set (when it was clear) by such a procedure because that >> would mess up the normal interrupt enable sequence that happens >> on boot. So I'm pretty certain toggling the edge bit is harmless >> and it may actually clear Remote IRR for us. > ... >> >> I think going more the way that this code has gone on arch/i386 with >> real functions is preferable. > > There is an issue with this suggestion. We have an inflight > EOI broadcast msg to this IOAPIC (that got delayed but still alive) and > that can cause problem if we use edge and back to level to reset remote IRR > bit. > > If the vector number stays same during irq migration and if we reset remote > IRR bit using the above method(edge and then back to level) during > irq migration, then we have a problem. A new interrupt arriving on a new > cpu will set the remote IRR bit and now the old inflight EOI broadcast > reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the > broadcast msg is same), while the kernel code still assumes that the remote > IRR bit is still set. This will lead to more problems and issues. Ok. I will agree that if the level->edge->level switch does not reset Remote_IRR and the EOI arrives in that window. We have another condition in which Remote_IRR can get stuck on. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On Fri, May 18, 2007 at 12:02:16PM -0700, Eric W. Biederman wrote: > I will look closer but I do believe that from the ioapic to the cpu the 2.6.21 > code should be fairly robust with respect to inflight messages from the ioapic > to the local apics and the cpus. What I failed to consider were inflight > messages in the other direction arriving out of order. Yes. Inflight messages from cpu to ioapic was what I was referring to. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > On Fri, May 18, 2007 at 11:28:25AM -0700, Yinghai Lu wrote: >> On 5/18/07, Siddha, Suresh B <[EMAIL PROTECTED]> wrote: >> > >> > If the vector number stays same during irq migration and if we reset remote >> > IRR bit using the above method(edge and then back to level) during >> > irq migration, then we have a problem. A new interrupt arriving on a new >> > cpu will set the remote IRR bit and now the old inflight EOI broadcast >> > reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the >> > broadcast msg is same), while the kernel code still assumes that the remote >> > IRR bit is still set. This will lead to more problems and issues. >> >> coud add some line __assign_irq_vector. to make sure old_vector!=vector. > > hmm.. > what happens when there is second(and very quick) irq migration which brings > the > irq back to old cpu(or to a third cpu) with old vector. > > Point is, we are not taking care of the inflight messages(which can perhaps, > theoretically, can get delayed for long time) I will look closer but I do believe that from the ioapic to the cpu the 2.6.21 code should be fairly robust with respect to inflight messages from the ioapic to the local apics and the cpus. What I failed to consider were inflight messages in the other direction arriving out of order. Part of the problem in the general case is the only way you can tell an inflight message was transmitted is that a message that you can prove followed the first message arrives somewhere. I'm in the middle of tracking two other problems so I can't review this in detail until later today at the earliest. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On Fri, May 18, 2007 at 11:28:25AM -0700, Yinghai Lu wrote: > On 5/18/07, Siddha, Suresh B <[EMAIL PROTECTED]> wrote: > > > > If the vector number stays same during irq migration and if we reset remote > > IRR bit using the above method(edge and then back to level) during > > irq migration, then we have a problem. A new interrupt arriving on a new > > cpu will set the remote IRR bit and now the old inflight EOI broadcast > > reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the > > broadcast msg is same), while the kernel code still assumes that the remote > > IRR bit is still set. This will lead to more problems and issues. > > coud add some line __assign_irq_vector. to make sure old_vector!=vector. hmm.. what happens when there is second(and very quick) irq migration which brings the irq back to old cpu(or to a third cpu) with old vector. Point is, we are not taking care of the inflight messages(which can perhaps, theoretically, can get delayed for long time) thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On 5/18/07, Siddha, Suresh B <[EMAIL PROTECTED]> wrote: If the vector number stays same during irq migration and if we reset remote IRR bit using the above method(edge and then back to level) during irq migration, then we have a problem. A new interrupt arriving on a new cpu will set the remote IRR bit and now the old inflight EOI broadcast reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the broadcast msg is same), while the kernel code still assumes that the remote IRR bit is still set. This will lead to more problems and issues. coud add some line __assign_irq_vector. to make sure old_vector!=vector. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Yinghai Lu" <[EMAIL PROTECTED]> writes: > On 5/18/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: >> We can solve the problem without doing that, and keeping the same >> vector number during migration keeps x86 from scaling. > > I mean ioapic level irq couls be limited. new device could use MSI or > HT irq directly and less irq routing problem. Possibly. It really doesn't buy us anything until most irqs are MSI which they are not yet. >> Personally I would prefer to disallow irq migration. > ? typo? > For amd platform with different hypertransport chain on different > nodes, irq migration is needed. irqs not on cpu0 are needed. irq migration is less necessary, and I periodically think we are insane for supporting it. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
Eric, On Fri, May 18, 2007 at 07:40:53AM -0700, Eric W. Biederman wrote: > Still in any of those I don't see a problem with switching to edge > triggered mode and then back again. Either Remote IRR will keep > it's current state or it will be cleared. Remote IRR should not > get set (when it was clear) by such a procedure because that > would mess up the normal interrupt enable sequence that happens > on boot. So I'm pretty certain toggling the edge bit is harmless > and it may actually clear Remote IRR for us. ... > > I think going more the way that this code has gone on arch/i386 with > real functions is preferable. There is an issue with this suggestion. We have an inflight EOI broadcast msg to this IOAPIC (that got delayed but still alive) and that can cause problem if we use edge and back to level to reset remote IRR bit. If the vector number stays same during irq migration and if we reset remote IRR bit using the above method(edge and then back to level) during irq migration, then we have a problem. A new interrupt arriving on a new cpu will set the remote IRR bit and now the old inflight EOI broadcast reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the broadcast msg is same), while the kernel code still assumes that the remote IRR bit is still set. This will lead to more problems and issues. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On 5/18/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: We can solve the problem without doing that, and keeping the same vector number during migration keeps x86 from scaling. I mean ioapic level irq couls be limited. new device could use MSI or HT irq directly and less irq routing problem. Personally I would prefer to disallow irq migration. ? typo? For amd platform with different hypertransport chain on different nodes, irq migration is needed. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Yinghai Lu" <[EMAIL PROTECTED]> writes: > Eric, > > ioapic_level irq is limited, So if we keep vector number not changed > when imgration to other cpus. It that could help. We can solve the problem without doing that, and keeping the same vector number during migration keeps x86 from scaling. Personally I would prefer to disallow irq migration. > it will need modify a little with assign_irq_vector and > irq_complete_move/smp_irq_move_cleanup_interrupt. because it assume > vector must be changed. Yes it does. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
Eric, ioapic_level irq is limited, So if we keep vector number not changed when imgration to other cpus. It that could help. it will need modify a little with assign_irq_vector and irq_complete_move/smp_irq_move_cleanup_interrupt. because it assume vector must be changed. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > On Thu, May 17, 2007 at 05:30:13PM -0700, Eric W. Biederman wrote: >> So why does any of this matter? >> >> My memory says that the ioapic state for sending irqs gets reset when we >> unmask the irq. > > No. Atleast not on the platform we have tested. Ok. It simply have the test for an external interrupt getting reset not the delivery state machine. Otherwise you would not have a problem in the first place. >> If not I expect we can use the mask and edge, unmask and level >> work-around from arch/i386/ioapic.c. That looks like it would >> be less code, less error prone and easier to implement then the >> current work around. > > hmm. We have to check and see if it can help. But ioapic spec doesn't > say that the Remote IRR gets reset when the trigger mode changes. Not sure > if we can apply the logic widely. Worst case it is a noop, so wide deployment should be safe. The original definition of the Remote IRR is: > 14 Remote IRR--RO. This bit is used for level triggered >interrupts. Its meaning is undefined for edge triggered >interrupts. For level triggered interrupts, this bit is set to 1 >when local APIC(s) accept the level interrupt sent by the >IOAPIC. The Remote IRR bit is set to 0 when an EOI message with a >matching interrupt vector is received from a local APIC. We know it is not used to throttle edge triggered interrupts. The question is how does the hardware implementation make that come about? By ignoring Remote IRR when transmitting edge triggered interrupts? By insisting Remote IRR is clear when doing a mode transition? By clearing Remote IRR when doing a mode transition? At least historically there is some evidence that Intel's ioapics did the latter. I don't know if there is a defacto standard for how ioapic mode transitions happen. Still in any of those I don't see a problem with switching to edge triggered mode and then back again. Either Remote IRR will keep it's current state or it will be cleared. Remote IRR should not get set (when it was clear) by such a procedure because that would mess up the normal interrupt enable sequence that happens on boot. So I'm pretty certain toggling the edge bit is harmless and it may actually clear Remote IRR for us. This does seem to be a general applicable problem (at least in theory) because we have no useful ordering rules we can take advantage of so something that works across the board does make sense. With a little care I think we can safely send an ack to the new vector just in case this race triggers. Before we send an ipi to ourselves we need to ensure in the vector allocator that the vector is either just for level or just for edge triggered interrupts. Then we can just send an ipi to our current cpu ack it and we ensure the interrupt it acked. "Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > On Thu, May 17, 2007 at 04:58:02PM -0700, Eric W. Biederman wrote: >> In essence this makes sense, and it may be the best work around for >> buggy hardware available. However I am not convinced that the remote >> IRR on ioapics works reliably enough to be used for anything. I >> tested this earlier and I could not successfully poll the remote irr >> bit to see if an ioapic had received an irq acknowledgement. Instead >> I locked up irq controllers. >> >> If remote IRR worked reliably I could have pulled the irq migration >> out of irq context. So this fix looks dubious to me. > > We are just reading IOAPIC RTE's. So not sure why this can lead to lock > up's. This patch is not reading any new HW registers. It just > checks for certain bit values from the register that the code reads > anyhow. > > Remote IRR should work reliably otherwise we will see lot more issues, > I guess. The test I performed was outside of interrupt context to mask an interrupt, poll Remote IRR until it was clear, and them migrate the interrupt. That does not work correctly. If that could be made to work correctly we would not need to interrupt migration in interrupt context. So while that is not conclusive evidence that Remote IRR is not a reliable indication of when it is safe to do things it is a strong suggestion that it is not a good indicator. Currently we don't look at Remote IRR so we have no evidence that it is good for anything. >> Why is the EOI delayed? Can we work around that? > > Under some stress conditions, EOI can get retried because of which > it is getting delayed. Ok. So that is why EOI is delayed. >> It would be nice if ioapics and local apics actually obeyed the pci >> ordering rules where a read would flush all traffic. And we do have a >> read in there. >> >> I'm assuming the symptom you are seeing on current kernels is that >> occasionally >> the irq gets stuck and never fires again? > > yes. > >> I'm not certain I like the patch either, but I need to look more closely. >> You are mixing changes to generic and arch specific code together. >>
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
Andi Kleen <[EMAIL PROTECTED]> writes: > On Friday 18 May 2007 01:03, Siddha, Suresh B wrote: > >> Normally, the EOI generated by local APIC for level trigger interrupt >> contains vector number. The IOAPIC will take this vector number and >> search the IOAPIC RTE entries for an entry with matching vector number and >> clear the remote IRR bit (indicate EOI). However, if the vector number is >> changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI >> is received later. This will cause the remote IRR to get stuck causing the >> interrupt hang (no more interrupt from this RTE). > > Does this happen often or did you only see it in some extreme or obscure > case? It is obscure. Someone may have a case that makes it easy to reproduce but it won't happen frequently. So we should have enough time to review and fix this carefully, instead of needing to rush a fix out. >> +/* >> + * If the EOI still didn't reach the RTE corresponding to the >> + * level triggered irq, postpone the irq migration to the next >> + * irq arrival event. >> + */ >> +if (pending_eoi(irq)) { >> +irq_desc[irq].status |= IRQ_MOVE_PENDING; >> +return; > > Other code seems to have similar problems, but we don't have any lock > protecting that bitmap against parallel updates outside the irq itself, don't > we? Perhaps it needs to be all set_bit() Huh? The irq lock is sufficient for this. This is not a code software problem, this is a software hardware interaction problem. This is a problem in that there is no way to guarantee that ioapics see a series of manipulations in the same order that the cpu issues those manipulations. In particular the ioapic is seeing the ack after we reprogram the ioapic, which causes the ioapic to not accept the ack (because it's vector number changed) and then the ioapic waits forever for an ack that isn't coming. The other half of the problem is pending_eoi() is testing a bit in the hardware that in my experience is not sufficient to guarantee that the hardware is in a state where action may be taken. This is a very delicate problem because there is little if any guaranteed ordering between cpu operations and ioapic operations, especially because there are two channels of communication the ``apic bus'' and the memory mapped ioapic registers. So what happens on one channel has no ordering requirements of any kind with what happens on the other channel. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On Friday 18 May 2007 01:03, Siddha, Suresh B wrote: > Normally, the EOI generated by local APIC for level trigger interrupt > contains vector number. The IOAPIC will take this vector number and > search the IOAPIC RTE entries for an entry with matching vector number and > clear the remote IRR bit (indicate EOI). However, if the vector number is > changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI > is received later. This will cause the remote IRR to get stuck causing the > interrupt hang (no more interrupt from this RTE). Does this happen often or did you only see it in some extreme or obscure case? > + /* > + * If the EOI still didn't reach the RTE corresponding to the > + * level triggered irq, postpone the irq migration to the next > + * irq arrival event. > + */ > + if (pending_eoi(irq)) { > + irq_desc[irq].status |= IRQ_MOVE_PENDING; > + return; Other code seems to have similar problems, but we don't have any lock protecting that bitmap against parallel updates outside the irq itself, don't we? Perhaps it needs to be all set_bit() -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On Thu, May 17, 2007 at 05:30:13PM -0700, Eric W. Biederman wrote: > So why does any of this matter? > > My memory says that the ioapic state for sending irqs gets reset when we > unmask the irq. No. Atleast not on the platform we have tested. > If not I expect we can use the mask and edge, unmask and level > work-around from arch/i386/ioapic.c. That looks like it would > be less code, less error prone and easier to implement then the > current work around. hmm. We have to check and see if it can help. But ioapic spec doesn't say that the Remote IRR gets reset when the trigger mode changes. Not sure if we can apply the logic widely. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
On Thu, May 17, 2007 at 04:58:02PM -0700, Eric W. Biederman wrote: > In essence this makes sense, and it may be the best work around for > buggy hardware available. However I am not convinced that the remote > IRR on ioapics works reliably enough to be used for anything. I > tested this earlier and I could not successfully poll the remote irr > bit to see if an ioapic had received an irq acknowledgement. Instead > I locked up irq controllers. > > If remote IRR worked reliably I could have pulled the irq migration > out of irq context. So this fix looks dubious to me. We are just reading IOAPIC RTE's. So not sure why this can lead to lock up's. This patch is not reading any new HW registers. It just checks for certain bit values from the register that the code reads anyhow. Remote IRR should work reliably otherwise we will see lot more issues, I guess. > > Why is the EOI delayed? Can we work around that? Under some stress conditions, EOI can get retried because of which it is getting delayed. > > It would be nice if ioapics and local apics actually obeyed the pci > ordering rules where a read would flush all traffic. And we do have a > read in there. > > I'm assuming the symptom you are seeing on current kernels is that > occasionally > the irq gets stuck and never fires again? yes. > I'm not certain I like the patch either, but I need to look more closely. > You are mixing changes to generic and arch specific code together. > > I think pending_eoi should not try to reuse __DO_ACTION as that > helper bit of code does not seem appropriate. Wanted to minimize the code duplicacy and hence overloaded existing macros. > > It would probably be best if the pending_eoi check was in > ack_apic_level with the rest of the weird logic we are working around. Just wanted to make sure that we give some time for the EOI to actually reach the IO-APIC. > > Could we please have more detail on this hardware behavior. Why is > the EIO write write delayed? Why can't we just issue reads in > appropriate places to flush the write? Because the EOI to IOAPIC gets generated as a side effect of processor EOI. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > on x86_64 kernel, level triggered irq migration gets initiated in the context > of that interrupt(after executing the irq handler) and following steps are > followed to do the irq migration. > > 1. mask IOAPIC RTE entry; // write to IOAPIC RTE > 2. EOI; // processor EOI write > 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and > // and interrupt vector due to per cpu vector > // allocation. > 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE > > Because of the per cpu vector allocation in x86_64 kernels, when the irq > migrates to a different cpu, new vector(corresponding to the new cpu) will > get allocated. > > An EOI write to local APIC has a side effect of generating an EOI write > for level trigger interrupts (normally this is a broadcast to all IOAPICs). > The EOI broadcast generated as a side effect of EOI write to processor may > be delayed while the other IOAPIC writes (step 3 and 4) can go through. > > Normally, the EOI generated by local APIC for level trigger interrupt > contains vector number. The IOAPIC will take this vector number and > search the IOAPIC RTE entries for an entry with matching vector number and > clear the remote IRR bit (indicate EOI). However, if the vector number is > changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI > is received later. This will cause the remote IRR to get stuck causing the > interrupt hang (no more interrupt from this RTE). > > Current x86_64 kernel assumes that remote IRR bit is cleared by the time > IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR > bit and if it still set, delay the irq migration to the next interrupt > arrival event(hopefully, next time remote IRR bit will get cleared > before the IOAPIC RTE is reprogrammed). > > Initial analysis and patch from Nanhai. So why does any of this matter? My memory says that the ioapic state for sending irqs gets reset when we unmask the irq. If not I expect we can use the mask and edge, unmask and level work-around from arch/i386/ioapic.c. That looks like it would be less code, less error prone and easier to implement then the current work around. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq
"Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > on x86_64 kernel, level triggered irq migration gets initiated in the context > of that interrupt(after executing the irq handler) and following steps are > followed to do the irq migration. > > 1. mask IOAPIC RTE entry; // write to IOAPIC RTE > 2. EOI; // processor EOI write > 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and > // and interrupt vector due to per cpu vector > // allocation. > 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE > > Because of the per cpu vector allocation in x86_64 kernels, when the irq > migrates to a different cpu, new vector(corresponding to the new cpu) will > get allocated. > > An EOI write to local APIC has a side effect of generating an EOI write > for level trigger interrupts (normally this is a broadcast to all IOAPICs). > The EOI broadcast generated as a side effect of EOI write to processor may > be delayed while the other IOAPIC writes (step 3 and 4) can go through. > > Normally, the EOI generated by local APIC for level trigger interrupt > contains vector number. The IOAPIC will take this vector number and > search the IOAPIC RTE entries for an entry with matching vector number and > clear the remote IRR bit (indicate EOI). However, if the vector number is > changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI > is received later. This will cause the remote IRR to get stuck causing the > interrupt hang (no more interrupt from this RTE). > > Current x86_64 kernel assumes that remote IRR bit is cleared by the time > IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR > bit and if it still set, delay the irq migration to the next interrupt > arrival event(hopefully, next time remote IRR bit will get cleared > before the IOAPIC RTE is reprogrammed). > > Initial analysis and patch from Nanhai. In essence this makes sense, and it may be the best work around for buggy hardware available. However I am not convinced that the remote IRR on ioapics works reliably enough to be used for anything. I tested this earlier and I could not successfully poll the remote irr bit to see if an ioapic had received an irq acknowledgement. Instead I locked up irq controllers. If remote IRR worked reliably I could have pulled the irq migration out of irq context. So this fix looks dubious to me. Why is the EOI delayed? Can we work around that? It would be nice if ioapics and local apics actually obeyed the pci ordering rules where a read would flush all traffic. And we do have a read in there. I'm assuming the symptom you are seeing on current kernels is that occasionally the irq gets stuck and never fires again? I'm not certain I like the patch either, but I need to look more closely. You are mixing changes to generic and arch specific code together. I think pending_eoi should not try to reuse __DO_ACTION as that helper bit of code does not seem appropriate. It would probably be best if the pending_eoi check was in ack_apic_level with the rest of the weird logic we are working around. Could we please have more detail on this hardware behavior. Why is the EIO write write delayed? Why can't we just issue reads in appropriate places to flush the write? I need to think about this some more to see if there are any other possible ways we could address this issue that would be more robust. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/