On 11.08.2017 16:11, Bodireddy, Bhanuprakash wrote: >> On 09.08.2017 15:35, Bodireddy, Bhanuprakash wrote: >>>>> >>>>> +static int >>>>> +netdev_dpdk_vhost_tx_burst(struct netdev_dpdk *dev, int qid) { >>>>> + struct dpdk_tx_queue *txq = &dev->tx_q[qid]; >>>>> + struct rte_mbuf **cur_pkts = (struct rte_mbuf >>>>> +**)txq->vhost_burst_pkts; >>>>> + >>>>> + int tx_vid = netdev_dpdk_get_vid(dev); >>>>> + int tx_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; >>>>> + uint32_t sent = 0; >>>>> + uint32_t retries = 0; >>>>> + uint32_t sum, total_pkts; >>>>> + >>>>> + total_pkts = sum = txq->vhost_pkt_cnt; >>>>> + do { >>>>> + uint32_t ret; >>>>> + ret = rte_vhost_enqueue_burst(tx_vid, tx_qid, >>>>> + &cur_pkts[sent], >>>> sum); >>>>> + if (OVS_UNLIKELY(!ret)) { >>>>> + /* No packets enqueued - do not retry. */ >>>>> + break; >>>>> + } else { >>>>> + /* Packet have been sent. */ >>>>> + sent += ret; >>>>> + >>>>> + /* 'sum' packet have to be retransmitted. */ >>>>> + sum -= ret; >>>>> + } >>>>> + } while (sum && (retries++ < VHOST_ENQ_RETRY_NUM)); >>>>> + >>>>> + for (int i = 0; i < total_pkts; i++) { >>>>> + dp_packet_delete(txq->vhost_burst_pkts[i]); >>>>> + } >>>>> + >>>>> + /* Reset pkt count. */ >>>>> + txq->vhost_pkt_cnt = 0; >>>>> + >>>>> + /* 'sum' refers to packets dropped. */ >>>>> + return sum; >>>>> +} >>>>> + >>>>> +/* Flush the txq if there are any packets available. */ static int >>>>> +netdev_dpdk_vhost_txq_flush(struct netdev *netdev, int qid, >>>>> + bool concurrent_txq OVS_UNUSED) { >>>>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>>> + struct dpdk_tx_queue *txq; >>>>> + >>>>> + qid = dev->tx_q[qid % netdev->n_txq].map; >>>>> + >>>>> + /* The qid may be disabled in the guest and has been set to >>>>> + * OVS_VHOST_QUEUE_DISABLED. >>>>> + */ >>>>> + if (OVS_UNLIKELY(qid < 0)) { >>>>> + return 0; >>>>> + } >>>>> + >>>>> + txq = &dev->tx_q[qid]; >>>>> + /* Increment the drop count and free the memory. */ >>>>> + if (OVS_UNLIKELY(!is_vhost_running(dev) || >>>>> + !(dev->flags & NETDEV_UP))) { >>>>> + >>>>> + if (txq->vhost_pkt_cnt) { >>>>> + rte_spinlock_lock(&dev->stats_lock); >>>>> + dev->stats.tx_dropped+= txq->vhost_pkt_cnt; >>>>> + rte_spinlock_unlock(&dev->stats_lock); >>>>> + >>>>> + for (int i = 0; i < txq->vhost_pkt_cnt; i++) { >>>>> + dp_packet_delete(txq->vhost_burst_pkts[i]); >>>> >>>> Spinlock (tx_lock) must be held here to avoid queue and mempool >> breakage. >>> >>> I think you are right. tx_lock might be acquired for freeing the packets. >> >> I think that 'vhost_pkt_cnt' reads and updates also should be protected to >> avoid races. > > From the discussion in the thread > https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html, > We are going to acquire tx_lock for updating the map and flushing the queue > inside vring_state_changed(). > > That triggers a deadlock in the flushing function as we have already > acquired the same lock in netdev_dpdk_vhost_txq_flush(). > This is the same problem for freeing the memory and protecting the updates to > vhost_pkt_cnt. > > if (OVS_LIKELY(txq->vhost_pkt_cnt)) { > rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > netdev_dpdk_vhost_tx_burst(dev, qid); > rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); > } > > As the problem is triggered when the guest queues are enabled/disabled, with > a small race window where packets can get enqueued in to the queue just after > the flush and before map value is updated in cb > function(vring_state_changed()), how abt this? > > Technically as the queues are disabled, there is no point in flushing the > packets, so let's free the packets and set the txq->vhost_pkt_cnt in > vring_state_changed() itself instead of calling flush().
Technically, enabling case also should be handled, because while enabling we're remapping the queue and, in some specific cases, I guess, the old queue may be not used after remapping by the threads. > > vring_state_changed(). > ------------------------------------------------------ > rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > > mapped_qid = dev->tx_q[qid].map; > if (OVS_UNLIKELY(qid != mapped_qid)) { > rte_spinlock_lock(&dev->tx_q[mapped_qid].tx_lock); > } > > if (enable) { > dev->tx_q[qid].map = qid; > } else { > struct dpdk_tx_queue *txq = &dev->tx_q[qid]; > if (txq->vhost_pkt_cnt) { > rte_spinlock_lock(&dev->stats_lock); > dev->stats.tx_dropped+= txq->vhost_pkt_cnt; > rte_spinlock_unlock(&dev->stats_lock); > > for (int i = 0; i < txq->vhost_pkt_cnt; i++) { > dp_packet_delete(txq->vhost_burst_pkts[i]); > } > txq->vhost_pkt_cnt = 0; > } > > dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; > } > ------------------------------------------------------------------------- > > Regards, > Bhanuprakash. > >> >>> --------------------------------------------------------------------------- >>> rte_spinlock_lock(&dev->tx_q[qid].tx_lock); >>> for (int i = 0; i < txq->vhost_pkt_cnt; i++) { >>> dp_packet_delete(txq->vhost_burst_pkts[i]); >>> } >>> rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); >>> >>> - Bhanuprakash >>> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev