> I couldn't recreate the issue on x86 but after testing with vhostuser and > vhostuserclient for a few scenarios such as client, server reconnect and > multi-queue I didn't find any problems with this patch. > > Tested-by: Billy O'Mahony <billy.o.mah...@intel.com> > Acked-by: Billy O'Mahony <billy.o.mah...@intel.com> >
Thanks All for working on this, I've pushed this patch to the dpdk_merge branch now. https://github.com/istokes/ovs/tree/dpdk_merge Ian > > -----Original Message----- > > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > > Sent: Friday, October 13, 2017 1:06 PM > > To: O Mahony, Billy <billy.o.mah...@intel.com>; > > ovs-dev@openvswitch.org > > Cc: Maxime Coquelin <maxime.coque...@redhat.com>; Heetae Ahn > > <heetae82....@samsung.com> > > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Fix calling vhost API with > > negative vid. > > > > On 13.10.2017 14:38, O Mahony, Billy wrote: > > > Hi Ilya, > > > > > > > > >> Issue can be reproduced by stopping DPDK application (testpmd) > > >> inside guest while heavy traffic flows to this VM. > > >> > > > > > > I tried both quitting testpmd without stopping the forwarding task > > > and> simply > > killing testpmd without crashing vswitch in the host. > > > > > > What versions of dpdk are you using in the guest and host? > > > > Versions below, but I don't think that it's so important. > > > > Host: 17.05.2 > > Guest: 16.07-rc1 > > > > > > > > Are you using dpdkvhostuser or dpdkvhostuserclient type ports? > > > > dpdkvhostuserclient. > > > > The complete test scenario where I saw this behaviour was: > > > > 2 VMs with 4 queues per vhostuserclient port. > > VM1 - OVS - VM2 > > > > VM1 runs testpmd with --rxq=4 --txq=4 --nb-cores=4 --eth-peer=0,MAC2 > > -- forward-mode=mac > > VM2 runs testpmd with --rxq=4 --txq=4 --nb-cores=4 --eth-peer=0,MAC1 > > -- forward-mode=txonly > > > > OVS with 8 pmd threads (1 core per queue). > > action=NORMAL > > > > Steps: > > > > 1. Starting testpmd in both VMs (non-interactive mode) > > 2. Waiting a while > > 3. Pushing <enter> in VM1 console. > > --> OVS crashes while testpmd termination. > > > > The most important thing, I guess, is that I'm using ARMv8 machine for > that. > > It could be not so easy to reproduce on x86 system (I didn't try). > > > > Best regards, Ilya Maximets. > > > > > > > > Thanks, > > > Billy. > > > > > >> -----Original Message----- > > >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > >> boun...@openvswitch.org] On Behalf Of Ilya Maximets > > >> Sent: Friday, October 6, 2017 11:50 AM > > >> To: ovs-dev@openvswitch.org > > >> Cc: Ilya Maximets <i.maxim...@samsung.com>; Maxime Coquelin > > >> <maxime.coque...@redhat.com>; Heetae Ahn > > <heetae82....@samsung.com> > > >> Subject: [ovs-dev] [PATCH] netdev-dpdk: Fix calling vhost API with > > >> negative > > vid. > > >> > > >> Currently, rx and tx functions for vhost interfaces always obtain > > >> 'vid' twice. First time inside 'is_vhost_running' for checking the > > >> value and the second time in enqueue/dequeue function calls to > > >> send/receive packets. But second time we're not checking the > > >> returned value. If vhost device will be destroyed between checking > > >> and enqueue/dequeue, DPDK API will be called with '-1' instead of > valid 'vid'. > > DPDK API does not validate the 'vid'. > > >> This leads to getting random memory value as a pointer to internal > > >> device structure inside DPDK. Access by this pointer leads to > > >> segmentation fault. For > > >> example: > > >> > > >> |00503|dpdk|INFO|VHOST_CONFIG: read message > > >> VHOST_USER_GET_VRING_BASE > > >> [New Thread 0x7fb6754910 (LWP 21246)] > > >> > > >> Program received signal SIGSEGV, Segmentation fault. > > >> rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630 > > >> 630 if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) > > >> (gdb) bt full > > >> #0 rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630 > > >> dev = 0xffffffff > > >> #1 __netdev_dpdk_vhost_send at lib/netdev-dpdk.c:1803 > > >> tx_pkts = <optimized out> > > >> cur_pkts = 0x7f340084f0 > > >> total_pkts = 32 > > >> dropped = 0 > > >> i = <optimized out> > > >> retries = 0 > > >> ... > > >> (gdb) p *((struct netdev_dpdk *) netdev) > > >> $8 = { ... , > > >> flags = (NETDEV_UP | NETDEV_PROMISC), ... , > > >> vid = {v = -1}, > > >> vhost_reconfigured = false, ... } > > >> > > >> Issue can be reproduced by stopping DPDK application (testpmd) > > >> inside guest while heavy traffic flows to this VM. > > >> > > >> Fix that by obtaining and checking the 'vid' only once. > > >> > > >> CC: Ciara Loftus <ciara.lof...@intel.com> > > >> Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07") > > >> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com> > > >> --- > > >> lib/netdev-dpdk.c | 14 +++++++------- > > >> 1 file changed, 7 insertions(+), 7 deletions(-) > > >> > > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > >> c60f46f..bf30bb0 100644 > > >> --- a/lib/netdev-dpdk.c > > >> +++ b/lib/netdev-dpdk.c > > >> @@ -1637,18 +1637,18 @@ netdev_dpdk_vhost_rxq_recv(struct > > >> netdev_rxq *rxq, > > >> struct dp_packet_batch *batch) { > > >> struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); > > >> - int qid = rxq->queue_id; > > >> struct ingress_policer *policer = > netdev_dpdk_get_ingress_policer(dev); > > >> uint16_t nb_rx = 0; > > >> uint16_t dropped = 0; > > >> + int qid = rxq->queue_id; > > >> + int vid = netdev_dpdk_get_vid(dev); > > >> > > >> - if (OVS_UNLIKELY(!is_vhost_running(dev) > > >> + if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured > > >> || !(dev->flags & NETDEV_UP))) { > > >> return EAGAIN; > > >> } > > >> > > >> - nb_rx = rte_vhost_dequeue_burst(netdev_dpdk_get_vid(dev), > > >> - qid * VIRTIO_QNUM + VIRTIO_TXQ, > > >> + nb_rx = rte_vhost_dequeue_burst(vid, qid * VIRTIO_QNUM + > > >> + VIRTIO_TXQ, > > >> dev->dpdk_mp->mp, > > >> (struct rte_mbuf **) batch- > >packets, > > >> NETDEV_MAX_BURST); @@ -1783,10 > > >> +1783,11 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > > >> +qid, > > >> unsigned int total_pkts = cnt; > > >> unsigned int dropped = 0; > > >> int i, retries = 0; > > >> + int vid = netdev_dpdk_get_vid(dev); > > >> > > >> qid = dev->tx_q[qid % netdev->n_txq].map; > > >> > > >> - if (OVS_UNLIKELY(!is_vhost_running(dev) || qid < 0 > > >> + if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || qid < > > >> + 0 > > >> || !(dev->flags & NETDEV_UP))) { > > >> rte_spinlock_lock(&dev->stats_lock); > > >> dev->stats.tx_dropped+= cnt; @@ -1805,8 +1806,7 @@ > > >> __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, > > >> int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; > > >> unsigned int tx_pkts; > > >> > > >> - tx_pkts = rte_vhost_enqueue_burst(netdev_dpdk_get_vid(dev), > > >> - vhost_qid, cur_pkts, cnt); > > >> + tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, > > >> + cur_pkts, cnt); > > >> if (OVS_LIKELY(tx_pkts)) { > > >> /* Packets have been sent.*/ > > >> cnt -= tx_pkts; > > >> -- > > >> 2.7.4 > > >> > > >> _______________________________________________ > > >> dev mailing list > > >> d...@openvswitch.org > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev