Re:[patch] genirq: temporary fix for level-triggered IRQ resend
Hi, I see there is a bit of complaining on this original resend temporary patch. But, since it seems to do a good job for some people, here is my proposal to limit the 'range of fire' a little bit. Marcin and Jean-Baptiste: try to test this with 2.6.23-rc2, please. (Unless Ingo or Thomas have other plans with this problem?) 2.6.23-rc2 + this patch, and the box's cards are still networking after 20 hours. RX bytes:452423991847 (421.3 GiB) TX bytes:13464471620 (12.5 GiB) still testing. Jb - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: temporary fix for level-triggered IRQ resend
Ingo Molnar wrote: Linus, with -rc2 approaching i think we should apply the minimal fix below to get Marcin's ne2k-pci networking back in working order. The WARN_ON_ONCE() will not prevent the system from working and it will be a reminder. a better workaround would be to inhibit the resent vector via the IO-APIC irqchip - but i'd still like to have the patch below because the ne2k driver _should_ be able to survive the spurious irq that happens. (even on Marcin's system that ne2k-pci irq line is shared with another networking card, so an irq could happen at any moment - it's just that with the delayed-disable logic it happens _all the time_.) I get a warning on each boot now with this patch .. [ 63.686613] WARNING: at kernel/irq/resend.c:70 check_irq_resend() [ 63.686636] [c013c55c] check_irq_resend+0x8c/0xa0 [ 63.686653] [c013c15f] enable_irq+0xad/0xb3 [ 63.686662] [e886481e] vortex_timer+0x20c/0x3d5 [3c59x] [ 63.686675] [c01164b9] scheduler_tick+0x154/0x273 [ 63.686685] [c012fed1] getnstimeofday+0x34/0xe3 [ 63.686697] [c0121f4a] run_timer_softirq+0x137/0x197 [ 63.686709] [e8864612] vortex_timer+0x0/0x3d5 [3c59x] [ 63.686720] [c011ed09] __do_softirq+0x75/0xe1 [ 63.686729] [c011edac] do_softirq+0x37/0x3d [ 63.686735] [c011ef85] irq_exit+0x7c/0x7e [ 63.686740] [c010e013] smp_apic_timer_interrupt+0x59/0x84 [ 63.686751] [c0103428] apic_timer_interrupt+0x28/0x30 [ 63.686759] [c0101355] default_idle+0x0/0x3f [ 63.686767] [c0101385] default_idle+0x30/0x3f [ 63.686773] [c0100c19] cpu_idle+0x5e/0x8e [ 63.686779] [c03fdc5f] start_kernel+0x2d7/0x368 That means ?:) Ingo Gabriel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: temporary fix for level-triggered IRQ resend
* Gabriel C [EMAIL PROTECTED] wrote: I get a warning on each boot now with this patch .. [ 63.686613] WARNING: at kernel/irq/resend.c:70 check_irq_resend() [ 63.686636] [c013c55c] check_irq_resend+0x8c/0xa0 [ 63.686653] [c013c15f] enable_irq+0xad/0xb3 [ 63.686662] [e886481e] vortex_timer+0x20c/0x3d5 [3c59x] [ 63.686675] [c01164b9] scheduler_tick+0x154/0x273 [ 63.686685] [c012fed1] getnstimeofday+0x34/0xe3 [ 63.686697] [c0121f4a] run_timer_softirq+0x137/0x197 [ 63.686709] [e8864612] vortex_timer+0x0/0x3d5 [3c59x] [ 63.686720] [c011ed09] __do_softirq+0x75/0xe1 [ 63.686729] [c011edac] do_softirq+0x37/0x3d [ 63.686735] [c011ef85] irq_exit+0x7c/0x7e [ 63.686740] [c010e013] smp_apic_timer_interrupt+0x59/0x84 [ 63.686751] [c0103428] apic_timer_interrupt+0x28/0x30 [ 63.686759] [c0101355] default_idle+0x0/0x3f [ 63.686767] [c0101385] default_idle+0x30/0x3f [ 63.686773] [c0100c19] cpu_idle+0x5e/0x8e [ 63.686779] [c03fdc5f] start_kernel+0x2d7/0x368 That means ?:) if your network still works fine then you can ignore it :-) we are still trying to figure out what happens with ne2k-pci. The message will vanish soon. Ingo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: temporary fix for level-triggered IRQ resend
* Ingo Molnar [EMAIL PROTECTED] wrote: Linus, with -rc2 approaching i think we should apply the minimal fix below to get Marcin's ne2k-pci networking back in working order. The WARN_ON_ONCE() will not prevent the system from working and it will be a reminder. there's one more test-patch that Marcin has not tested yet (see below) - perhaps a POST artifact in ne2k could explain this bug. Ingo - * Alan Cox [EMAIL PROTECTED] wrote: Ok the logic behind the 8390 is very simple: thanks for the explanation Alan! A few comments and a question: Things to know - IRQ delivery is asynchronous to the PCI bus - Blocking the local CPU IRQ via spin locks was too slow - The chip has register windows needing locking work So the path was once (I say once as people appear to have changed it in the mean time and it now looks rather bogus if the changes to use disable_irq_nosync_irqsave are disabling the local IRQ) Take the page lock Mask the IRQ on chip Disable the IRQ (but not mask locally- someone seems to have broken this with the lock validator stuff) [This must be _nosync as the page lock may otherwise deadlock us] ( side-note: you can ignore the lock validator stuff here, the validator changes are supposed to a NOP on the !lockdep case. Local irqs will only be disabled if the validator is running. This could cause dropped serial irqs on very old boxes but i doubt anyone will want to run the validator on those. ) Drop the page lock and turn IRQs back on At this point an existing IRQ may still be running but we can't get a new one Take the lock (so we know the IRQ has terminated) but don't mask the IRQs on the processor Set irqlock [for debug] Transmit (slow as ) re-enable the IRQ We have to use disable_irq because otherwise you will get delayed interrupts on the APIC bus deadlocking the transmit path. Quite hairy but the chip simply wasn't designed for SMP and you can't even ACK an interrupt without risking corrupting other parallel activities on the chip. So the whole locking is to be able to keep irqs enabled for a long time, without risking entry of the same IRQ handler on this same CPU, correct? Marcin's test results suggest that if an IRQ is resent right at the enable_irq() point [be that via the hw irq-resend mechanism or the sw irq-resend mechanism], the hang happens. In the previous 2.6.20 logic we'd not normally generate an IRQ at that point (because we masked the irq and the card itself deasserts the line so any level-triggered irq is now moot). Once Thomas hacked off this resend mechanism for level-triggered irqs, Marcin saw the hangs go away. So it seems to me that maybe the driver could be surprised via these spurious interrupts that happen right after the irq_enable(). Does the patch below make any sense in your opinion? Ingo Index: linux/drivers/net/lib8390.c === --- linux.orig/drivers/net/lib8390.c +++ linux/drivers/net/lib8390.c @@ -375,6 +375,8 @@ static int ei_start_xmit(struct sk_buff /* Turn 8390 interrupts back on. */ ei_local-irqlock = 0; ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR); + /* force POST: */ + ei_inb_p(e8390_base + EN0_IMR); spin_unlock(ei_local-page_lock); enable_irq_lockdep_irqrestore(dev-irq, flags); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html