Hi All, I've been investigating a segmentation fault caused by the incorrect setup of TX queues for netdev-dpdk. It occurs in the following scenario.
Running OVS with DPDK on a system with 72 cores (Hyper threading enabled) and using an Intel XL710 Network Card. Default behavior in OVS when adding a DPDK physical port is to attempt to setup 1 tx queue for each core detected on the system plus one more queue for non_pmd threads. In this case 73 tx queues will be requested in total. The standard behavior when initializing a DPDK port is to check the number of queues being requested against the max number of queues available for the device itself. This is done in dpdk_eth_dev_init() with the following code segment ... rte_eth_dev_info_get(dev->port_id, &info); dev->up.n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); dev->real_n_txq = MIN(info.max_tx_queues, dev->up.n_txq); diag = rte_eth_dev_configure(dev->port_id, dev->up.n_rxq, dev->real_n_txq, &port_conf); ... The smaller of the two values is selected as the real amount of tx queues that can be setup. This accommodates a situation where we could have more cores on a system than we have tx queues on the network device in DPDK. This has worked fine with the previous generation of Intel interfaces such as Intel 82599. However it will not work with the XL710. In DPDK the XL710 has a total of 316 tx queues that can be used. From the check above we would think we can allocate 73 of these tx queues without issue. But the 316 queues available are subdivided between different queue types. For a DPDK host application (In this case OVS) queues 1 - 64 inclusive can be used. However queue 65 to 96 are strictly for SRIOV tx queue use. The check for max_tx_queues above will identify the total number of queues available (316), compare it to the number of queues being requested (73) and will select 73 as the real_n_txq. But this is not the correct number of tx queues that are usable by OVS (64). We can cause the switch to segfault by doing the following Add a dpdk physical port sudo $OVS_DIR/utilities/ovs-vsctl add-br br0 -- set Bridge br0 datapath_type=netdev sudo $OVS_DIR/utilities/ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk This will output the following warning ovs-vsctl: Error detected while setting up 'dpdk0'. See ovs-vswitchd log for details. Looking at the log we see PMD: i40e_dev_tx_queue_setup(): Using simple tx path PMD: i40e_pf_get_vsi_by_qindex(): queue_idx out of range. VMDQ configured? 2015-07-15T01:22:48Z|00019|dpdk|ERR|eth dev tx queue setup error -5 2015-07-15T01:22:48Z|00020|dpif_netdev|ERR|dpdk0, cannot set multiq 2015-07-15T01:22:48Z|00021|dpif|WARN|netdev@ovs-netdev: failed to add dpdk0 as port: Resource temporarily unavailable This is as expected. This warning will be reported in dpdk_eth_dev_init() by the following code segment when it attempts to initialize the 65th queue for (i = 0; i < dev->real_n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, NIC_PORT_TX_Q_SIZE, dev->socket_id, NULL); if (diag) { VLOG_ERR("eth dev tx queue setup error %d",diag); return -diag; } } Then add an internal port type to the same bridge sudo $OVS_DIR/utilities/ovs-vsctl add-port br0 testif1 -- set interface testif1 type=internal I was surprised to see that after adding the internal port , the DPDK port that failed previously is now added as well. Is this expected behavior? Looking at the vswitch log I can see that both the internal port and the DPDK port have port IDs now. 2015-07-15T01:23:19Z|00024|bridge|INFO|bridge br0: added interface testif1 on port 1 2015-07-15T01:23:19Z|00025|dpif_netdev|INFO|Created 1 pmd threads on numa node 0 2015-07-15T01:23:19Z|00001|dpif_netdev(pmd40)|INFO|Core 0 processing port 'dpdk0' 2015-07-15T01:23:19Z|00002|dpif_netdev(pmd40)|INFO|Core 0 processing port 'dpdk0' 2015-07-15T01:23:19Z|00026|bridge|INFO|bridge br0: added interface dpdk0 on port 2 2015-07-15T01:23:19Z|00027|bridge|INFO|bridge br0: using datapath ID 00006805ca2d3cb8 If we assign an IP to the internal port we will segfault the vswitch sudo ip addr add 192.168.1.1/24 dev testif1 This is caused by the internal interface broadcasting an ICMP6 neighbor solicitation message. This packet is copied from kernel space memory to DPDK memory in the netdev_dpdk_send__() function. The issue is that the qid passed to netdev_dpdk_send__ function is 72. This packet will eventually be transmitted with with rte_eth_tx_burst with a tx qid of 72. In DPDK, queue 72 for the XL710 is for SRIOV use only and so will not be initialized during the rte_eth_tx_queue_setup process above and so the switch segfaults when an attempt is made to access it. In terms of a solution to this I would appreciate some feedback on what people think is the best approach. Ideally DPDK could extend the number of sequential queues supported for host DPDK applications. Previous generation cards supported 128 TX queues that could be used with a host application, hence why this issue is not seen with them. This however would not fix the immediate issue and would be more of a long term solution. It could be flagged in the documentation as a known issue/corner case that is not supported in the mean time. Alternatively OVS could attempt to setup as many queues as possible on the DPDK device itself. If an error is detected the appropriate fields would have to be updated such as dev->real_n_txq. In this case we would setup 64 of the requested 73, log a warning message to the user. However there may be issues with how the pmd threads map to the correct tx queue IDs. I've noticed that when netdev_dpdk_send__ is called the qid is 72, and this value comes from dp_execute_cb() where the tx_qid is taken from the dp_netdev_pmd_thread. Any Feedback would be appreciated. Thanks Ian
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss