On Fri, Jan 13, 2023 at 1:58 PM Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 1/12/23 21:55, David Marchand wrote: > > As Ilya reported, we have a ABBA deadlock between DPDK vq->access_lock > > and OVS dev->mutex when OVS main thread refreshes statistics, while a > > vring state change event is being processed for a same vhost port. > > > > To break from this situation, move vring state change notifications > > handling from the vhost-events DPDK thread, back to the OVS main thread > > using a lockless queue. > > > > Reported-at: > > https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401101.html > > Fixes: 3b29286db1c5 ("netdev-dpdk: Add per virtqueue statistics.") > > Signed-off-by: David Marchand <david.march...@redhat.com> > > --- > > lib/netdev-dpdk.c | 67 +++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 53 insertions(+), 14 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 5e2d64651d..7946e7f2df 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -49,6 +49,7 @@ > > #include "dpif-netdev.h" > > #include "fatal-signal.h" > > #include "if-notifier.h" > > +#include "mpsc-queue.h" > > #include "netdev-provider.h" > > #include "netdev-vport.h" > > #include "odp-util.h" > > @@ -4255,30 +4256,37 @@ destroy_device(int vid) > > } > > } > > > > -static int > > -vring_state_changed(int vid, uint16_t queue_id, int enable) > > +static struct mpsc_queue vhost_state_change_queue > > + = MPSC_QUEUE_INITIALIZER(&vhost_state_change_queue); > > + > > +struct vhost_state_change { > > + struct mpsc_queue_node node; > > + char ifname[IF_NAME_SZ]; > > + uint16_t queue_id; > > + int enable; > > +}; > > + > > +static void > > +vring_state_changed__(struct vhost_state_change *sc) > > { > > struct netdev_dpdk *dev; > > bool exists = false; > > - int qid = queue_id / VIRTIO_QNUM; > > - bool is_rx = (queue_id % VIRTIO_QNUM) == VIRTIO_TXQ; > > - char ifname[IF_NAME_SZ]; > > - > > - rte_vhost_get_ifname(vid, ifname, sizeof ifname); > > + int qid = sc->queue_id / VIRTIO_QNUM; > > + bool is_rx = (sc->queue_id % VIRTIO_QNUM) == VIRTIO_TXQ; > > > > ovs_mutex_lock(&dpdk_mutex); > > LIST_FOR_EACH (dev, list_node, &dpdk_list) { > > ovs_mutex_lock(&dev->mutex); > > - if (nullable_string_is_equal(ifname, dev->vhost_id)) { > > + if (nullable_string_is_equal(sc->ifname, dev->vhost_id)) { > > if (is_rx) { > > bool old_state = dev->vhost_rxq_enabled[qid]; > > > > - dev->vhost_rxq_enabled[qid] = enable != 0; > > + dev->vhost_rxq_enabled[qid] = sc->enable != 0; > > if (old_state != dev->vhost_rxq_enabled[qid]) { > > netdev_change_seq_changed(&dev->up); > > } > > } else { > > - if (enable) { > > + if (sc->enable) { > > dev->tx_q[qid].map = qid; > > } else { > > dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; > > @@ -4295,11 +4303,41 @@ vring_state_changed(int vid, uint16_t queue_id, int > > enable) > > > > if (exists) { > > VLOG_INFO("State of queue %d ( %s_qid %d ) of vhost device '%s' " > > - "changed to \'%s\'", queue_id, is_rx == true ? "rx" : > > "tx", > > - qid, ifname, (enable == 1) ? "enabled" : "disabled"); > > + "changed to \'%s\'", sc->queue_id, is_rx ? "rx" : "tx", > > + qid, sc->ifname, sc->enable == 1 ? "enabled" : > > "disabled"); > > } else { > > - VLOG_INFO("vHost Device '%s' not found", ifname); > > - return -1; > > + VLOG_INFO("vHost Device '%s' not found", sc->ifname); > > + } > > +} > > + > > +static void > > +netdev_dpdk_vhost_run(const struct netdev_class *netdev_class OVS_UNUSED) > > +{ > > + struct mpsc_queue_node *node; > > + > > + mpsc_queue_acquire(&vhost_state_change_queue); > > + MPSC_QUEUE_FOR_EACH_POP (node, &vhost_state_change_queue) { > > + struct vhost_state_change *sc; > > + > > + sc = CONTAINER_OF(node, struct vhost_state_change, node); > > + vring_state_changed__(sc); > > + free(sc); > > + } > > + mpsc_queue_release(&vhost_state_change_queue); > > +} > > + > > +static int > > +vring_state_changed(int vid, uint16_t queue_id, int enable) > > +{ > > + struct vhost_state_change *sc; > > + > > + sc = xmalloc(sizeof *sc); > > + if (!rte_vhost_get_ifname(vid, sc->ifname, sizeof sc->ifname)) { > > + sc->queue_id = queue_id; > > + sc->enable = enable; > > + mpsc_queue_insert(&vhost_state_change_queue, &sc->node); > > + } else { > > + free(sc); > > } > > > > return 0; > > @@ -5766,6 +5804,7 @@ static const struct netdev_class dpdk_vhost_class = { > > static const struct netdev_class dpdk_vhost_client_class = { > > .type = "dpdkvhostuserclient", > > NETDEV_DPDK_CLASS_COMMON, > > + .run = netdev_dpdk_vhost_run, > > This part is also needed for the 'server' type. > > However, there is no guarantee that run() will be called in any reasonable > time, because we don't have a wait() method that would schedule a wake up > for the main thread. The main thread wakes up for other reasons like > stats update, but that happens not that frequently and we can't rely on that > anyway. For example, default stats update interval is 5 seconds. > > The problem is that we can't really iterate over devices and take dev->mutex > in the context of a callback. Meaning we can't really implement a good > wait() method that would trigger some seqno change that would wake up the > main thread. We could trigger a global connectivity seqno, I guess. Not > sure how safe it is, since sequence numbers use internal mutex. > > OTOH, maybe we can employ dpdk-watchdog thread to perform vhost housekeeping? > Currently it's doing a very similar thing for physical ports, i.e. updates > the link state periodically. It's sleep interval is a bit large though, > but maybe we could tweak that, or update phy link states with the same > interval as it is now, but check the vhost event queue a bit more frequently. > Maybe even have something like a backoff in dp_netdev_flow_offload_main().
Having a dedicated thread seems safer, for the same reason as Maxime mentionned for OVS main thread: vhost events may be triggered by a malicious guest and OVS would be blind to some phy link updates. Another thought on my patch... is it safe to have an unbound queue wrt to memory consumption? Again, a malicious guest could make OVS consume a lot of memory (in my tests, OVS complained that heap increased to 500M). -- David Marchand _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev