On Tue, May 13, 2014 at 11:08:27AM -0400, Alan Stern wrote: > Please CC: your patches to the maintainer of the driver you are > changing. > > On Tue, 13 May 2014, Dr. Werner Fink wrote: > > > Hi, > > > > this bug hits my system now a long time. I had found e.g. this > > > > speedy kernel: [ 9575.033019] irq 16: nobody cared (try booting with the > > "irqpoll" option) > > speedy kernel: [ 9575.033022] Pid: 0, comm: swapper/0 Not tainted > > 3.7.10-1.1-desktop #1 > > The 3.7 kernel is fairly old. It's entirely possible that the problem > has already been fixed in 3.14.
The patch I've attached is for 3.14 and AFAICS it is likely not fixed. In past
I had reported this problem more than once and got always the same answer that
the new kernel will not show this problem.
> > speedy kernel: [ 9575.033023] Call Trace:
> > speedy kernel: [ 9575.033031] [<ffffffff81004818>] dump_trace+0x88/0x300
> > speedy kernel: [ 9575.033035] [<ffffffff8158b033>] dump_stack+0x69/0x6f
> > speedy kernel: [ 9575.033038] [<ffffffff810d6c56>]
> > __report_bad_irq+0x36/0xe0
> > speedy kernel: [ 9575.033041] [<ffffffff810d7158>]
> > note_interrupt+0x1e8/0x240
> > speedy kernel: [ 9575.033045] [<ffffffff810d4772>]
> > handle_irq_event_percpu+0xc2/0x250
> > speedy kernel: [ 9575.033047] [<ffffffff810d4947>]
> > handle_irq_event+0x47/0x70
> > speedy kernel: [ 9575.033049] [<ffffffff810d7c50>]
> > handle_fasteoi_irq+0x60/0x100
> > speedy kernel: [ 9575.033051] [<ffffffff810046c8>] handle_irq+0x18/0x30
> > speedy kernel: [ 9575.033053] [<ffffffff810043a2>] do_IRQ+0x52/0xd0
> > speedy kernel: [ 9575.033056] [<ffffffff8159806d>]
> > common_interrupt+0x6d/0x6d
> > speedy kernel: [ 9575.033061] [<ffffffff8132018c>] intel_idle+0xec/0x160
> > speedy kernel: [ 9575.033064] [<ffffffff81452e0d>]
> > cpuidle_idle_call+0x9d/0x330
> > speedy kernel: [ 9575.033067] [<ffffffff8100be0a>] cpu_idle+0x6a/0xe0
> > speedy kernel: [ 9575.033071] [<ffffffff81ac8bc8>]
> > start_kernel+0x3b8/0x3c3
> > speedy kernel: [ 9575.033073] [<ffffffff81ac8436>]
> > x86_64_start_kernel+0x105/0x114
> > speedy kernel: [ 9575.033075] handlers:
> > speedy kernel: [ 9575.033077] [<ffffffff813f2220>] usb_hcd_irq
> > speedy kernel: [ 9575.033080] [<ffffffffa0282940>] rtl8139_interrupt
> > [8139too]
> > speedy kernel: [ 9575.033080] Disabling IRQ #16
> >
> > IRQ 16 is used by ehci_hcd:usb1 and eth1.
>
> How do you know that the problem was caused by ehci-hcd rather than
> 8139too? Or by some other piece of hardware entirely?
I've seen this also with an other ethernet card. And the status bit is
always a bit described in the USB.
> > Adding the "irqpoll" option to the kernels
> > command line had not helped. Therefore I had debugged this problem by
> > adding a printk()
> > debug line in the ehci_irq() function of drivers/usb/host/ehci-hcd.c. This
> > had shown
> > out that my USB controller causes STS_RECL (reclamation readonly status
> > bit) in the
> > IRQ status.
>
> What makes you think that STS_RECL is the cause of the problem? It is
> quite normal for STS_RECL to be set.
As described: the printk() does show exactly this bit.
> > After a while this had lead to the message in the subject with the side
> > effect that
> > networking becomes slow.
>
> How do you know that something else didn't cause the "nobody cared"
> error?
Yes.
> > From the debugging code I've evolved the attached patch. It is not perfect
> > as it
> > returns IRQ_NONE for the first time the STS_RECL status bit is found but it
> > does
> > its job.
>
> Please put your patches in the main email message; don't attach them.
> Now there's no easy way for me to include it in this reply.
>
> The patch is definitely wrong. It will never set spurious_recl,
> because the "if (unlikely(masked_status & STS_RECL))" test can't
> succeed unless spurious_recl has already been set.
OK ... the patch was changed as I had been told that I should do it this
way. In my original code I simply use
masked_status = status & (INTR_MASK | STS_FLR | STS_RECL);
/* Shared IRQ? */
if (!masked_status || unlikely(ehci->rh_state == EHCI_RH_HALTED)) {
spin_unlock_irqrestore(&ehci->lock, flags);
printk("ehci_irq status: %#8.8x", status);
return IRQ_NONE;
}
and with this I can use my ethernet card more than 15 minutes. The printk()
line I used first after I had also used some printk() lines in the ethernet
driver to see what was wrong with the shared IRQ. Then I had identified the
STS_RECL from the printk() above in my logs and or'd the STS_RECL to the
masked status bits. After this all problems had been gone.
>
> Alan Stern
Werner
--
"Having a smoking section in a restaurant is like having
a peeing section in a swimming pool." -- Edward Burr
pgplizj71NW0x.pgp
Description: PGP signature
