I encountered an issue with DPDK 2.1.0 which occasionally causes the link
status interrupt callback not to be called after the interface is started for
the first time. I traced the problem back to the function
eth_igb_link_update(), which is used to determine if the link has changed state
since the previous time it was called. It appears that this function can be
called simultaneously from two different threads:
(1) From the main application/configuration thread, via rte_eth_dev_start() -
pointed to by (*dev->dev_ops->link_update)
(2) From the eal interrupt thread, via eth_igb_interrupt_action(), to check if
the link state has transitioned up or down. The user callback is only executed
if the link has changed state.
The race condition manifests itself as follows:
- Main thread configures the interface with link status interrupt (LSI)
enabled, sets up the queues etc.
- Main thread calls rte_eth_dev_start. The interface is started and then we
call eth_igb_link_update()
- While in this call, the link goes up. Accordingly, we detect the
transition, and write the new link state (up) into the global rte_eth_dev struct
- The interrupt fires, which also drops into the eth_igb_link_update function,
finds that the global link status has already been set to up (no change)
- Therefore, the handler thinks the interrupt was spurious, and the callback
doesn't get called.
I suspect that rte_eth_dev_start shouldn't be checking the link state if
interrupts are enabled. Would someone mind taking a quick look at the patch
below?
Thanks!
Tim
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1300,7 +1300,7 @@ rte_eth_dev_start(uint8_t port_id)
rte_eth_dev_config_restore(port_id);
- if (dev->data->dev_conf.intr_conf.lsc != 0) {
+ if (dev->data->dev_conf.intr_conf.lsc == 0) {
FUNC_PTR_OR_ERR_RET(*dev->dev_ops->link_update, -ENOTSUP);
(*dev->dev_ops->link_update)(dev, 0);
}