Hello, I posted below RFC to the linux-can ML, but it might be actually more appropriate for netdev (since it involves napi). This driver is lying a bit about the amount of work done. Changing that (always look at work done instead of the hw state) seems to solve the issue, but I have no clear understanding why that is (and why it takes down the ethernet stack as well).
If someone has suggestions / ideas on what to check, please let me know. Regards, Jeroen The mail send to linux-can: While updating to Linux 4.9.93, the ti_hecc causes several problems. One of them is that under high load on the CAN-bus and ethernet, the ethernet connection completely stalls and won't recover. The patch below seems to fix that (as in read as much as we can and always re-enable interrupts, it will poll again thereafter). It would be appreciated if someone more familiar with this code can comment on it. Thanks in advance, Jeroen --- a/drivers/net/can/ti_hecc.c +++ b/drivers/net/can/ti_hecc.c @@ -625,20 +625,16 @@ static int ti_hecc_rx_poll(struct napi_struct *napi, int quota) spin_unlock_irqrestore(&priv->mbx_lock, flags); } else if (priv->rx_next == HECC_MAX_TX_MBOX - 1) { priv->rx_next = HECC_RX_FIRST_MBOX; - break; } } /* Enable packet interrupt if all pkts are handled */ - if (hecc_read(priv, HECC_CANRMP) == 0) { + if (num_pkts < quota) { napi_complete(napi); /* Re-enable RX mailbox interrupts */ mbx_mask = hecc_read(priv, HECC_CANMIM); mbx_mask |= HECC_TX_MBOX_MASK; hecc_write(priv, HECC_CANMIM, mbx_mask); - } else { - /* repoll is done only if whole budget is used */ - num_pkts = quota; }