Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
> > The version I would up testing is below, and it doesn't work. > I still get "No irq handler for vector" warnings as well as > a couple of complaints from lock/irq debugging.The debugging > doesn't worry me. The fact that I don't have a good way to ensure > I have no more irqs in flight does. > > So unless someone can find a sure way to drain the irqs in flight, > I can't migrate an irq from process context, and looking at irr and > handling a pending irq appears required. ' Bah. I had not taken into account that the local apic despite being tightly coupled with the cpu is for programming purposes an asynchronous device. If I want to give it time to react to something I need to read from it. The routine below actually works. My remaining practical question is can this been done cleanly. Ingo's lock debugging dislikes this routine. By using raw_local_irq_enable I have avoided all but a message on the irq return path, I haven't quite worked out where. But at least this version feels like it could be done better (less inline? different helpers?) someone. For interrupts coming through a sane interrupt controller moving this into process context would certainly simplify things. For edge triggered interrupts coming through an io_apic I'm not at all certain what makes sense. When the routine below is used to ack an edge triggered interrupt it runs before the edge triggered interrupt handler so losing an edge shouldn't happen (we haven't acknowledged the hardware yet) and even if we do the device driver gets to run at least once. So doing migration in the irq handler still looks like the best solution even if it is ugly. As long as the little bit of stack overhead isn't a problem I think enabling interrupts to clear out any pending irqs certainly looks simpler. In another vein. I went and looked through all of Intel's and AMD's public errata that I could find and there weren't any associated with irr or isr, so I think my previous version of the code is still sane, and not likely to break. I can improve it a little by getting the vector as: "vector = ~ get_irq_regs()->orig_rax;" instead of reading ISR. That still leaves reading the pending bit in ISR and the other funny tricks. I'm conflicted between the two approaches a little because playing games with enabling interrupts in an interrupt handler seems to have some weird corner cases. static void ack_apic(unsigned int irq) { #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) struct irq_desc *desc; desc = irq_desc + irq; if (likely(!(desc->status & IRQ_MOVE_PENDING))) goto simple; if (hardirq_count() != HARDIRQ_OFFSET) goto simple; desc->chip->mask(irq); ack_APIC_irq(); /* Ensure all of the irq handlers for this irq have completed * before we migrate it. */ spin_unlock(&desc->lock); raw_local_irq_enable(); apic_read(APIC_ID); raw_local_irq_disable(); spin_lock(&desc->lock); move_masked_irq(irq); desc->chip->unmask(irq); return; simple: #endif ack_APIC_irq(); } BUG: at /home/eric/projects/linux/linux-2.6-devel/kernel/lockdep.c:1860 trace_hardirqs_on() Call Trace: [] trace_hardirqs_on_thunk+0x35/0x37 [] generic_delete_inode+0x0/0x13e [] restore_args+0x0/0x30 [] generic_delete_inode+0x0/0x13e [] ack_apic+0x63/0x99 [] ack_apic+0x5b/0x99 [] handle_fasteoi_irq+0xc1/0xd1 [] generic_delete_inode+0x0/0x13e [] do_IRQ+0x89/0xf3 [] default_idle+0x35/0x51 [] default_idle+0x0/0x51 [] ret_from_intr+0x0/0xf [] generic_delete_inode+0x0/0x13e [] default_idle+0x0/0x51 [] default_idle+0x35/0x51 [] default_idle+0x37/0x51 [] default_idle+0x35/0x51 [] cpu_idle+0x56/0x75 [] start_secondary+0x481/0x490 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
[EMAIL PROTECTED] (Eric W. Biederman) writes: > Ingo Molnar <[EMAIL PROTECTED]> writes: > >> * Eric W. Biederman <[EMAIL PROTECTED]> wrote: >> >>> Ingo would it be reasonable to get a wait queue so I can wait for an >>> irq that needs the delayed disable action to actually become masked? >> >> that might make sense, but what will do the wakeup - incidental IRQ >> arriving on the new CPU? Isnt that a bit risky - maybe the device wont >> generate IRQs for a really long time. > > I still need to test this, but I believe I have found a simpler > way to avoid irr problems during migration, and I believe the code > works equally well with either edge or level triggered interrupts. > > The idea is this: Instead of trying test for and handle when irr > occurs, simply enable local interrupts after disabling and > acknowledging the irq so that anything pending will be processed, > before we perform the migration operation. > > I don't think the edge case cares about the mask/ack order but > but masking before acking appears important for the level triggered > case, so we might as well use that order for both. > > Does this look like a sane way to handle this? The version I would up testing is below, and it doesn't work. I still get "No irq handler for vector" warnings as well as a couple of complaints from lock/irq debugging.The debugging doesn't worry me. The fact that I don't have a good way to ensure I have no more irqs in flight does. So unless someone can find a sure way to drain the irqs in flight, I can't migrate an irq from process context, and looking at irr and handling a pending irq appears required. ' Eric static void ack_apic(unsigned int irq) { #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) if (unlikely((irq_desc[irq].status & IRQ_MOVE_PENDING) && (hardirq_count() == HARDIRQ_OFFSET))) { struct irq_desc *desc = irq_desc + irq; desc->chip->mask(irq); ack_APIC_irq(); /* Ensure all of the irq handlers for this irq have completed * before we migrate it. */ raw_local_irq_enable(); cpu_relax(); raw_local_irq_disable(); synchronize_irq(irq); move_masked_irq(irq); desc->chip->unmask(irq); return; } #endif ack_APIC_irq(); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Ingo Molnar <[EMAIL PROTECTED]> writes: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> Ingo would it be reasonable to get a wait queue so I can wait for an >> irq that needs the delayed disable action to actually become masked? > > that might make sense, but what will do the wakeup - incidental IRQ > arriving on the new CPU? Isnt that a bit risky - maybe the device wont > generate IRQs for a really long time. I still need to test this, but I believe I have found a simpler way to avoid irr problems during migration, and I believe the code works equally well with either edge or level triggered interrupts. The idea is this: Instead of trying test for and handle when irr occurs, simply enable local interrupts after disabling and acknowledging the irq so that anything pending will be processed, before we perform the migration operation. I don't think the edge case cares about the mask/ack order but but masking before acking appears important for the level triggered case, so we might as well use that order for both. Does this look like a sane way to handle this? static void ack_apic(unsigned int irq) { struct irq_desc *desc = irq_desc + irq; int do_unmask_irq = 0; if (unlikely((irq_desc[irq].status & IRQ_MOVE_PENDING) && !in_irq()) { do_unmask_irq = 1; desc->chip->mask(irq); } ack_APIC_irq(); if (unlikely(do_unmask_irq)) { /* Don't let pending irqs accumulate */ local_irq_enable(); syncrhonize_irq(irq); move_masked_irq(irq); local_irq_disable(); desc->chip->unmask(irq); } } Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Ingo Molnar <[EMAIL PROTECTED]> writes: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> Ingo would it be reasonable to get a wait queue so I can wait for an >> irq that needs the delayed disable action to actually become masked? > > that might make sense, but what will do the wakeup - incidental IRQ > arriving on the new CPU? That is what I was thinking. > Isnt that a bit risky - maybe the device wont > generate IRQs for a really long time. Well this is in a user space context called from user space and it exactly matches the semantics we have now. If we make it an interruptible sleep the user space process shouldn't block. I guess the other thing to do is do it in a non-block fashion and just call schedule_work from the interrupt context when the irq is disabled. For i386 with it's in kernel irq scheduler that might be better. I think the nasty case is probably what do we do when it is the timer interrupt we are dealing with. Hmm. I think I should look up what the rules are for calling local_irq_enable when in interrupt context. That might be another way to satisfy this problem. If local irqs are enabled I don't have to worry about the irr register. You've got me brainstorming now. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > Ingo would it be reasonable to get a wait queue so I can wait for an > irq that needs the delayed disable action to actually become masked? that might make sense, but what will do the wakeup - incidental IRQ arriving on the new CPU? Isnt that a bit risky - maybe the device wont generate IRQs for a really long time. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Ingo would it be reasonable to get a wait queue so I can wait for an irq that needs the delayed disable action to actually become masked? There are a lot of practical reasons while I cannot reprogram an unmasked irq. However if we can wait it should be able to get all of the convoluted irq migration logic out of interrupt context, and into process context. If know interrupts have been enabled on the cpus where an irq could be delivered after the irq has been masked, I know that the irq is not currently in progress. Therefore I know the irq will not be in progress again until the irq is unmasked. Once I know the irq will not be received again it is safe to modify the irq handling data structures and the interrupt controller. I think the only generic piece missing from this equation is a wait queue so I can wait for the irq to become masked. Say a generic sleeping masq_irq() call? Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
"Luigi Genoni" <[EMAIL PROTECTED]> writes: > btw, I tested in on ( CPU and no way to reproduce the bug, (no messages > appeared and load average was quite as high as exspected). > Going to test an a 16 CPU (the same who triggered the bug at the beginning) > immediatelly when it will be free. Thanks. Hopefully that will confirm I have properly tracked this bug. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Ingo thanks for the review. Ingo Molnar <[EMAIL PROTECTED]> writes: > * Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> When making the interrupt vectors per cpu I failed to handle a case >> during irq migration. If the same interrupt comes in while we are >> servicing the irq but before we migrate it the pending bit in the >> local apic IRR register will be set for that irq. > > hm. I think this needs more work. For example this part of the fix looks > quite ugly to me: I'm not at all certain I can make an ugly reality look beautiful. >> +static unsigned apic_in_service_vector(void) >> +{ >> +unsigned isr, vector; >> +/* Figure out which vector we are servicing */ >> + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector >> += > 32) { >> +isr = apic_read(APIC_ISR + ((vector/32) * 0x10)); >> +if (isr) >> +break; >> +} >> +/* Find the low bits of the vector we are servicing */ >> +vector += __ffs(isr); >> +return vector; > > so we read the hardware to figure out what the hell we are doing. The > problem is - why doesnt the kernel know at this point what it is doing? > It knows the CPU and it should know all the vector numbers. It also has > an irq number. Yes. And by adding a percpu global I can do this. If figured since this should be a rare case it would be equally simple to read this from the hardware, as it already has the information stored in roughly the form I would need to store it in. If there errata in this area and I am likely to hit them then it is probably a good idea to look at a different implementation. >> +/* If the irq we are servicing has moved and is not pending >> + * free it's vector. >> + */ >> +irr = apic_read(APIC_IRR + ((vector/32) * 0x10)); > > the IRR is quite fragile. It's a rarely used hardware field and it has > erratums attached to it. Again, it seems like this function too tries to > recover information it /should/ already have. If I am servicing an interrupt and that same interrupt comes in again before I acknowledge it how /should/ I know that? The only way I know to find that information is to ask the hardware. >> @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq) >> >> static void ack_apic_edge(unsigned int irq) >> { >> -move_native_irq(irq); >> +if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) { >> +move_native_irq(irq); >> +apic_handle_pending_vector(apic_in_service_vector()); >> +} >> ack_APIC_irq(); > > this looks a bit ugly and a bit fragile. We had a simple > 'move_native_irq()' call for IRQ balancing, which is straightforward, > but now we've got this complex condition open coded. Well the condition is testing a single bit so I don't think it is that complex. Maybe taking it out of line will help or maybe that will obscure things. I'm inclined to believe hiding the irq migration logic will obscure things and make it harder to debug. Now part of the reason I did it this way is I have at least 3 set_affinity implementations and this issue really has nothing to do with the external interrupt controller but everything to do with the cpu local interrupt controller, so this did not seem like something that was reasonably buried in set_affinity. > If then this should be done in some sort of helper - but even then, the > whole approach looks a bit erroneous. > > To me the cleanest way to migrate an IRQ between two different vectors > on two different CPUs would be to first mask the IRQ source in the PIC, > then to do the migration /atomically/ (no intermediary state), and then > unmask. If the PIC loses edges while masked then that's a problem of the > irq chip implementation of the PIC: its ->set_affinity() method should > refuse to migrate edge-triggered IRQs if it can lose edges while > unmasked! Ingo I believe what you have described is essentially what we are doing before my patches, or what we were doing in even older versions that had other races and problems. To some extent I have inherited the current design that mostly works. The only known reliable way to block and edge triggered irq is to be servicing it. The practical problem there is that when we sit on an irq the irq can come in again and queue up in irr. Which means that once we have updated the data structures acked the irq and returned the irq will come in again in the old location because the io_apic has a 1 deep queue for each irq. Of course for the irq controllers that can be masked safely your proposed change to disable irq is unfortunate. You appear to be lobbying for disabling the irq asynchronously to all of the irq reception activity. That would seem to me to require running on the cpu where the irq is currently programmed to be delivered disabling local interrupts, as well as disabling the irq in the interrupt controller before reprogramming it. For the irqs that applies
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > When making the interrupt vectors per cpu I failed to handle a case > during irq migration. If the same interrupt comes in while we are > servicing the irq but before we migrate it the pending bit in the > local apic IRR register will be set for that irq. hm. I think this needs more work. For example this part of the fix looks quite ugly to me: > +static unsigned apic_in_service_vector(void) > +{ > + unsigned isr, vector; > + /* Figure out which vector we are servicing */ > + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; > vector += 32) { > + isr = apic_read(APIC_ISR + ((vector/32) * 0x10)); > + if (isr) > + break; > + } > + /* Find the low bits of the vector we are servicing */ > + vector += __ffs(isr); > + return vector; so we read the hardware to figure out what the hell we are doing. The problem is - why doesnt the kernel know at this point what it is doing? It knows the CPU and it should know all the vector numbers. It also has an irq number. > + /* If the irq we are servicing has moved and is not pending > + * free it's vector. > + */ > + irr = apic_read(APIC_IRR + ((vector/32) * 0x10)); the IRR is quite fragile. It's a rarely used hardware field and it has erratums attached to it. Again, it seems like this function too tries to recover information it /should/ already have. > @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq) > > static void ack_apic_edge(unsigned int irq) > { > - move_native_irq(irq); > + if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) { > + move_native_irq(irq); > + apic_handle_pending_vector(apic_in_service_vector()); > + } > ack_APIC_irq(); this looks a bit ugly and a bit fragile. We had a simple 'move_native_irq()' call for IRQ balancing, which is straightforward, but now we've got this complex condition open coded. If then this should be done in some sort of helper - but even then, the whole approach looks a bit erroneous. To me the cleanest way to migrate an IRQ between two different vectors on two different CPUs would be to first mask the IRQ source in the PIC, then to do the migration /atomically/ (no intermediary state), and then unmask. If the PIC loses edges while masked then that's a problem of the irq chip implementation of the PIC: its ->set_affinity() method should refuse to migrate edge-triggered IRQs if it can lose edges while unmasked! Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, February 05, 2007 3:03 PM >Nope. irq routines are a stack. if apic_in_service_vector could return >the wrong value. ack_APIC_irq() which use the same information would >acknowledge the wrong irq. If there was actually any danger of >mis-computing that information I would just pass it from the interrupt >service routine stash it in a per cpu variable and then read it out. >But the apic already has registers doing that, so I was lazy and used >what was available. It should be the common case that we need that >information. OK. I wonder if current kernel support different cpu handle irq request for different device at the same time. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
"Lu, Yinghai" <[EMAIL PROTECTED]> writes: > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, February 05, 2007 12:37 PM > > >>The only corner case I can see that might potentially happen is >>"apic_in_service_vector() != irq_vector[irq]" and if that is the case >>we don't want to migrate, because the precondition that we are in the >>irq handler servicing the expected irq isn't true. > > Reuse vector could help in that case. A little but you are still on the wrong cpu, and that corner case might even be worth looking at on i386. I haven't assessed what might go wrong yet but the current strategy of migrating irqs is based in the interrupt service routine being where new instances of that irq will be delivered. What I do know is all of this is a very narrow window, and insanely hard to trigger with consequences that shouldn't hang the machine. The only reason we are even getting a reasonable number of irqs being in service and in the irr is because we are acking a level triggered vector used by two irqs and then the pending irq retriggers. Avoiding that case would certainly be an efficiency improvement. Basically I think it takes the alignment of several 1 in a million chances before the code I posted will have problems with this theoretical case. > In another case, if two irq are migrated from one cpu to another cpu. > ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1, > and handle that to clear irr for irq1. instead of irq2. Nope. irq routines are a stack. if apic_in_service_vector could return the wrong value. ack_APIC_irq() which use the same information would acknowledge the wrong irq. If there was actually any danger of mis-computing that information I would just pass it from the interrupt service routine stash it in a per cpu variable and then read it out. But the apic already has registers doing that, so I was lazy and used what was available. It should be the common case that we need that information. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, February 05, 2007 12:37 PM >The only corner case I can see that might potentially happen is >"apic_in_service_vector() != irq_vector[irq]" and if that is the case >we don't want to migrate, because the precondition that we are in the >irq handler servicing the expected irq isn't true. Reuse vector could help in that case. In another case, if two irq are migrated from one cpu to another cpu. ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1, and handle that to clear irr for irq1. instead of irq2. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
"Lu, Yinghai" <[EMAIL PROTECTED]> writes: > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >>> How about let apic_hangle_pending_vector take the irq too? > >>We can't compute the vector by reading the hardware registers after >>we have acknowledged the irq. > >>I hope that was the answer you were looking for I'm not quite certain >>what you mean by take. > > I mean I don't see the point. We would have to miscompute the vector that we are servicing or improperly fill in vector_irq for that to be useful. The only corner case I can see that might potentially happen is "apic_in_service_vector() != irq_vector[irq]" and if that is the case we don't want to migrate, because the precondition that we are in the irq handler servicing the expected irq isn't true. Of course someone would have to set IRQ_MOVE_PENDING pretty fast for that case to hit, it's a tiny hole probably worth closing though. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >> How about let apic_hangle_pending_vector take the irq too? >We can't compute the vector by reading the hardware registers after >we have acknowledged the irq. >I hope that was the answer you were looking for I'm not quite certain >what you mean by take. I mean +static void apic_handle_pending_vector(unsigned vector) +{ + unsigned irr; + int irq; + + irq = __get_cpu_var(vector_irq)[vector]; + if (irq >= 0) + return; + + /* If the irq we are servicing has moved and is not pending +* free it's vector. +*/ ==> +static void apic_handle_pending_vector(unsigned vector, unsigned irqx) +{ + unsigned irr; + int irq; + + irq = __get_cpu_var(vector_irq)[vector]; + if ( (-irq) != irqx) + return; + + + /* If the irq we are servicing has moved and is not pending +* free it's vector. +*/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
"Lu, Yinghai" <[EMAIL PROTECTED]> writes: > Eric, > > How about let apic_hangle_pending_vector take the irq too? We can't compute the vector by reading the hardware registers after we have acknowledged the irq. I hope that was the answer you were looking for I'm not quite certain what you mean by take. > I wonder if there any chance that you have two IRQ_MOVE_PENDING? I'm not quite certain what you are asking on the face of it there is one bit per irq so it is impossible. I don't think we can get into any trouble we don't let a vector be reused on a cpu until we know it isn't in irr. So even if things come in highly out of order the should still function. I also have seen irqs migrate on nearly every irq received. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Eric, How about let apic_hangle_pending_vector take the irq too? I wonder if there any chance that you have two IRQ_MOVE_PENDING? BYW, Andi, Do you want to try "reuse vector when do the irq-balance between CPUS"? If So, I will update the patch that you dropped some months ago. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
As I reported when I tested this patch, it works, but I could see an abnormally high load averay while triggering the error message. anyway, it is better to have an high load averag three or four times higher than what you would expect then a crash/reboot. isn't it? :) Luigi Genoni p.s. will test the other definitive patch on montday on both 8 and 16 CPU system. On Sat, 3 Feb 2007, Eric W. Biederman wrote: Date: Sat, 03 Feb 2007 00:55:11 -0700 From: Eric W. Biederman <[EMAIL PROTECTED]> To: Arjan van de Ven <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]>, linux-kernel@vger.kernel.org, "Lu, Yinghai" <[EMAIL PROTECTED]>, Luigi Genoni <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>, Natalie Protasevich <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]> Subject: Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration. Resent-Date: Sat, 03 Feb 2007 09:05:10 +0100 Resent-From: <[EMAIL PROTECTED]> Arjan van de Ven <[EMAIL PROTECTED]> writes: Once the migration operation is complete we know we will receive no more interrupts on this vector so the irq pending state for this irq will no longer be updated. If the irq is not pending and we are in the intermediate state we immediately free the vector, otherwise in we free the vector in do_IRQ when the pending irq arrives. So is this a for-2.6.20 thing? The bug was present in 2.6.19, so I assume it doesn't affect many people? I got a few reports of this; irqbalance may trigger this kernel bug it seems... I would suggest to consider this for 2.6.20 since it's a hard-hang case Yes. The bug I fixed will not happen if you don't migrate irqs. At the very least we want the patch below (already in -mm) that makes it not a hard hang case. Subject: [PATCH] x86_64: Survive having no irq mapping for a vector Occasionally the kernel has bugs that result in no irq being found for a given cpu vector. If we acknowledge the irq the system has a good chance of continuing even though we dropped an missed an irq message. If we continue to simply print a message and drop and not acknowledge the irq the system is likely to become non-responsive shortly there after. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- arch/x86_64/kernel/irq.c | 11 --- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c index 0c06af6..648055a 100644 --- a/arch/x86_64/kernel/irq.c +++ b/arch/x86_64/kernel/irq.c @@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs) if (likely(irq < NR_IRQS)) generic_handle_irq(irq); - else if (printk_ratelimit()) - printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n", - __func__, smp_processor_id(), vector); + else { + if (!disable_apic) + ack_APIC_irq(); + + if (printk_ratelimit()) + printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n", + __func__, smp_processor_id(), vector); + } irq_exit(); -- 1.4.4.1.g278f - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
On Saturday 03 February 2007 11:22, Eric W. Biederman wrote: > Andi Kleen <[EMAIL PROTECTED]> writes: > > >> Once the migration operation is complete we know we will receive > >> no more interrupts on this vector so the irq pending state for > >> this irq will no longer be updated. If the irq is not pending and > >> we are in the intermediate state we immediately free the vector, > >> otherwise in we free the vector in do_IRQ when the pending irq > >> arrives. > > > > Ok for me, although the magic numbers are a little nasty. > > You must be talking about (vector/32) *0x10. No I meant the -1s -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Andi Kleen <[EMAIL PROTECTED]> writes: >> Once the migration operation is complete we know we will receive >> no more interrupts on this vector so the irq pending state for >> this irq will no longer be updated. If the irq is not pending and >> we are in the intermediate state we immediately free the vector, >> otherwise in we free the vector in do_IRQ when the pending irq >> arrives. > > Ok for me, although the magic numbers are a little nasty. You must be talking about (vector/32) *0x10. The 32 is the number of bits and 0x10 the gap between apic registers. I couldn't think of a better form.If someone can think of a better way it probably warrants a cleanup patch at some point. > What about i386? i386 does not handle this case but since it is still globally allocating all of it's vectors and never changes it's vectors during migration it is totally harmless when an irq comes in on a cpu other than the one we are expecting it on. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
> Once the migration operation is complete we know we will receive > no more interrupts on this vector so the irq pending state for > this irq will no longer be updated. If the irq is not pending and > we are in the intermediate state we immediately free the vector, > otherwise in we free the vector in do_IRQ when the pending irq > arrives. Ok for me, although the magic numbers are a little nasty. What about i386? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Arjan van de Ven <[EMAIL PROTECTED]> writes: >> > Once the migration operation is complete we know we will receive >> > no more interrupts on this vector so the irq pending state for >> > this irq will no longer be updated. If the irq is not pending and >> > we are in the intermediate state we immediately free the vector, >> > otherwise in we free the vector in do_IRQ when the pending irq >> > arrives. >> >> So is this a for-2.6.20 thing? The bug was present in 2.6.19, so >> I assume it doesn't affect many people? > > I got a few reports of this; irqbalance may trigger this kernel bug it > seems... I would suggest to consider this for 2.6.20 since it's a > hard-hang case Yes. The bug I fixed will not happen if you don't migrate irqs. At the very least we want the patch below (already in -mm) that makes it not a hard hang case. Subject: [PATCH] x86_64: Survive having no irq mapping for a vector Occasionally the kernel has bugs that result in no irq being found for a given cpu vector. If we acknowledge the irq the system has a good chance of continuing even though we dropped an missed an irq message. If we continue to simply print a message and drop and not acknowledge the irq the system is likely to become non-responsive shortly there after. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- arch/x86_64/kernel/irq.c | 11 --- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c index 0c06af6..648055a 100644 --- a/arch/x86_64/kernel/irq.c +++ b/arch/x86_64/kernel/irq.c @@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs) if (likely(irq < NR_IRQS)) generic_handle_irq(irq); - else if (printk_ratelimit()) - printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n", - __func__, smp_processor_id(), vector); + else { + if (!disable_apic) + ack_APIC_irq(); + + if (printk_ratelimit()) + printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n", + __func__, smp_processor_id(), vector); + } irq_exit(); -- 1.4.4.1.g278f - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
> > Once the migration operation is complete we know we will receive > > no more interrupts on this vector so the irq pending state for > > this irq will no longer be updated. If the irq is not pending and > > we are in the intermediate state we immediately free the vector, > > otherwise in we free the vector in do_IRQ when the pending irq > > arrives. > > So is this a for-2.6.20 thing? The bug was present in 2.6.19, so > I assume it doesn't affect many people? I got a few reports of this; irqbalance may trigger this kernel bug it seems... I would suggest to consider this for 2.6.20 since it's a hard-hang case - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
On Fri, 02 Feb 2007 18:39:15 -0700 [EMAIL PROTECTED] (Eric W. Biederman) wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > So is this a for-2.6.20 thing? The bug was present in 2.6.19, so > > I assume it doesn't affect many people? > > If it's not to late, and this patch isn't too scary. > > It's a really rare set of circumstances that trigger it, but the > possibility of being hit is pretty widespread, anything with > more than one cpu, and more then one irq could see this. > > The easiest way to trigger this is to have two level triggered irqs on > two different cpus using the same vector. In that case if one acks > it's irq while the other irq is migrating to a different cpu 2.6.19 > get completely confused and stop handling interrupts properly. > > With my previous bug fix (not to drop the ack when we are confused) > the machine will stay up, and that is obviously correct and can't > affect anything else so is probably a candidate for the stable tree. > > With this fix everything just works. > > I don't know how often a legitimate case of the exact same irq > going off twice in a row is, but that is a possibility as well > especially with edge triggered interrupts. > > Setting up the test scenario was a pain, but by extremely limiting > my choice of vectors I was able to confirm I survived several hundred > of these events with in a couple of minutes no problem. > OK, thanks. Let's await Andi's feedback. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
Andrew Morton <[EMAIL PROTECTED]> writes: > So is this a for-2.6.20 thing? The bug was present in 2.6.19, so > I assume it doesn't affect many people? If it's not to late, and this patch isn't too scary. It's a really rare set of circumstances that trigger it, but the possibility of being hit is pretty widespread, anything with more than one cpu, and more then one irq could see this. The easiest way to trigger this is to have two level triggered irqs on two different cpus using the same vector. In that case if one acks it's irq while the other irq is migrating to a different cpu 2.6.19 get completely confused and stop handling interrupts properly. With my previous bug fix (not to drop the ack when we are confused) the machine will stay up, and that is obviously correct and can't affect anything else so is probably a candidate for the stable tree. With this fix everything just works. I don't know how often a legitimate case of the exact same irq going off twice in a row is, but that is a possibility as well especially with edge triggered interrupts. Setting up the test scenario was a pain, but by extremely limiting my choice of vectors I was able to confirm I survived several hundred of these events with in a couple of minutes no problem. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
On Fri, 02 Feb 2007 17:35:31 -0700 [EMAIL PROTECTED] (Eric W. Biederman) wrote: > When making the interrupt vectors per cpu I failed to handle a case > during irq migration. If the same interrupt comes in while we are > servicing the irq but before we migrate it the pending bit in the > local apic IRR register will be set for that irq. > > After migrating the irq to another cpu and or vector the data > structures will no longer be setup to handle this pending irq. Then as > soon as we return from servicing the irq we just migrated we will get > a nasty: "No irq handler for vector" message. > > Since we do not disable irqs for edge triggered interrupts except > in the smallest possible window during migration I cannot avoid > migrating an irq in the unlikely event that it becomes pending. > This is because by the time the irq could no longer become pending > I have already updated all of my data structures. > > Therefore this patch introduces an intermediate state that > exists soley on the cpu where we are handling the irq during > migration. The irq number is changed to negative in the > vector_irq data structure. > > Once the migration operation is complete we know we will receive > no more interrupts on this vector so the irq pending state for > this irq will no longer be updated. If the irq is not pending and > we are in the intermediate state we immediately free the vector, > otherwise in we free the vector in do_IRQ when the pending irq > arrives. So is this a for-2.6.20 thing? The bug was present in 2.6.19, so I assume it doesn't affect many people? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
When making the interrupt vectors per cpu I failed to handle a case during irq migration. If the same interrupt comes in while we are servicing the irq but before we migrate it the pending bit in the local apic IRR register will be set for that irq. After migrating the irq to another cpu and or vector the data structures will no longer be setup to handle this pending irq. Then as soon as we return from servicing the irq we just migrated we will get a nasty: "No irq handler for vector" message. Since we do not disable irqs for edge triggered interrupts except in the smallest possible window during migration I cannot avoid migrating an irq in the unlikely event that it becomes pending. This is because by the time the irq could no longer become pending I have already updated all of my data structures. Therefore this patch introduces an intermediate state that exists soley on the cpu where we are handling the irq during migration. The irq number is changed to negative in the vector_irq data structure. Once the migration operation is complete we know we will receive no more interrupts on this vector so the irq pending state for this irq will no longer be updated. If the irq is not pending and we are in the intermediate state we immediately free the vector, otherwise in we free the vector in do_IRQ when the pending irq arrives. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- arch/x86_64/kernel/io_apic.c | 56 ++--- arch/x86_64/kernel/irq.c | 19 +- 2 files changed, 69 insertions(+), 6 deletions(-) diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c index d1fcd4b..ffcb5f6 100644 --- a/arch/x86_64/kernel/io_apic.c +++ b/arch/x86_64/kernel/io_apic.c @@ -730,9 +730,17 @@ next: /* Found one! */ current_vector = vector; current_offset = offset; - for_each_cpu_mask(old_cpu, old_mask) - per_cpu(vector_irq, old_cpu)[old_vector] = -1; - for_each_cpu_mask(new_cpu, new_mask) + for_each_cpu_mask(old_cpu, old_mask) { + int free = -1; + /* When migrating we need to preserve the old +* vector until we have processed all of the +* pending irqs. +*/ + if (old_cpu == smp_processor_id()) + free = -irq; + per_cpu(vector_irq, old_cpu)[old_vector] = free; + } + for_each_cpu_mask(new_cpu, new_mask) per_cpu(vector_irq, new_cpu)[vector] = irq; irq_vector[irq] = vector; irq_domain[irq] = domain; @@ -1389,6 +1397,37 @@ static int ioapic_retrigger_irq(unsigned int irq) return 1; } +static unsigned apic_in_service_vector(void) +{ + unsigned isr, vector; + /* Figure out which vector we are servicing */ + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector += 32) { + isr = apic_read(APIC_ISR + ((vector/32) * 0x10)); + if (isr) + break; + } + /* Find the low bits of the vector we are servicing */ + vector += __ffs(isr); + return vector; +} + +static void apic_handle_pending_vector(unsigned vector) +{ + unsigned irr; + int irq; + + irq = __get_cpu_var(vector_irq)[vector]; + if (irq >= 0) + return; + + /* If the irq we are servicing has moved and is not pending +* free it's vector. +*/ + irr = apic_read(APIC_IRR + ((vector/32) * 0x10)); + if (!(irr & (1 << (vector % 32 + __get_cpu_var(vector_irq)[vector] = -1; +} + /* * Level and edge triggered IO-APIC interrupts need different handling, * so we use two separate IRQ descriptors. Edge triggered IRQs can be @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq) static void ack_apic_edge(unsigned int irq) { - move_native_irq(irq); + if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) { + move_native_irq(irq); + apic_handle_pending_vector(apic_in_service_vector()); + } ack_APIC_irq(); } static void ack_apic_level(unsigned int irq) { int do_unmask_irq = 0; + unsigned int vector = 0; #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE) /* If we are moving the irq we need to mask it */ if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) { do_unmask_irq = 1; mask_IO_APIC_irq(irq); + vector = apic_in_service_vector(); } #endif @@ -1424,8 +1468,10 @@ static void ack_apic_level(unsigned int irq) /* Now we can move and renable the irq */ move_masked_irq(irq); - if (unlikely(do_unmask_irq)) + if