Hi, Sorry for late response, I was trying to find some h/w so as to retest this before responding. (Which I wasn't able to find, and so haven't retested yet).
On Thu, Jan 19, 2006 at 05:23:07AM +0100, Andi Kleen was heard to remark: > > > > > +#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY > > + if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) { > > + if (eeh_slot_is_isolated (adapter->pdev)) > > + // disable_irq_nosync (adapter->pdev->irq); > > + return IRQ_NONE; /* Not our interrupt */ > > > So does the return belong below the first or the second if()? It certainly > looks weird. This is left-over crud (its not compiled) that I should have removed from the patch. I had left it in as a "note to self" because, at one point, it cured a hang. Background: (I think you get this but let me repeat): A variety of device drivers have a "design flaw" in that they will poll an interrupt status register in an infinite loop in the interrupt handler, waiting for a certain value to be set/cleared. This would be a deadly behaviour for any PCMCIA card that might get yanked at any time. Similarly, its a problem for PCI error recovery: when the PCI device is "frozen", all further reads return 0xffffff, and so the interrupt status won't ever change. With just the right timing, one hangs in a spinloop with interrupts disabled. :( > And returning IRQ_NONE looks wrong too - if means if > the hardware is broken and there is nobody else on the same interrupt > then the kernel will complain about buggy drivers, which is not true > here. Probably it needs a new IRQ_ERR return value or somesuch that > stops the complaining and acts otherwise like IRQ_NONE. Ah!! I was struggling with trying to figure out the "right" way of dealing with this case. I kept at it with various hacky attempts, none terribly satisfying. I'll take a look at the "new return code" idea. --linas - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html