[dpdk-dev] [Q] l2fwd in examples directory
thanks bruce. I didn't know that PCI slots have direct socket affinity. is it static or configurable through PCI configuration space? well, my NUT, two node NUMA, seems always returns -1 on calling rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values. I appreciate if you explain more about getting the affinity. p.s. I'm using intel Xeon processor and 1G NIC(82576). On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson < bruce.richardson at intel.com> wrote: > On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote: > > There is codes as below in examples/l2fwd/main.c and I think > > rte_eth_dev_socket_id(portid) > > always returns -1(SOCKET_ID_ANY) since there is no association code > between > > port and > > lcore in the example codes. > > Can you perhaps clarify what you mean here. On modern NUMA systems, such > as those > from Intel :-), the PCI slots are directly connected to the CPU sockets, > so the > ethernet ports do indeed have a direct NUMA affinity. It's not something > that > the app needs to specify. > > /Bruce > > > (i.e. I need to find a matching lcore from > > lcore_queue_conf[] with portid > > and call rte_lcore_to_socket_id(lcore_id).) > > > > /* init one RX queue */ > > fflush(stdout); > > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, > > rte_eth_dev_socket_id(portid), > > NULL, > > l2fwd_pktmbuf_pool); > > if (ret < 0) > > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, > > port=%u\n", > > ret, (unsigned) portid); > > > > It works fine even though memory is allocated in different NUMA node. > But I > > wonder there is > > a DPDK API that associates inlcore to port internally thus > > rte_eth_devices[portid].pci_dev->numa_node > > contains proper node. > > > > > > -- > > Moon-Sang Lee, SW Engineer > > Email: sang0627 at gmail.com > > Wisdom begins in wonder. *Socrates* > -- Moon-Sang Lee, SW Engineer Email: sang0627 at gmail.com Wisdom begins in wonder. *Socrates*
[dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing
In DPDK based switching enviroment, mostly vhost runs on a dedicated core while virtio processing in guest VMs runs on different cores. Take RX for example, with generic implementation, for each guest buffer, a) virtio driver allocates a descriptor from free descriptor list b) modify the entry of avail ring to point to allocated descriptor c) after packet is received, free the descriptor When vhost fetches the avail ring, it needs to fetch the modified L1 cache from virtio core, which is a heavy cost in current CPU implementation. This idea of this optimization is: allocate the fixed descriptor for each entry of avail ring. and avail ring will always be the same during the run. This removes L1 cache transfer from virtio core to vhost core for avail ring. Besides, no descriptor free and allocation is needed. Most importantly, this makes vector procesing possible to further accelerate the processing. This is the layout for the avail ring(take 256 ring entries for example), with each entry pointing to the descriptor with the same index. avail idx + | +++---+-+--+ | 0 | 1 | 2 | ... | 254 | 255 | avail ring +-+--+-+--+-+-+-+---+--+---+ ||| | | | ||| | | | vvv | v v +-+--+-+--+-+-+-+---+--+---+ | 0 | 1 | 2 | ... | 254 | 255 | desc ring +++---+-+--+ | | +++---+-+--+ | 0 | 1 | 2 | | 254 | 255 | used ring +++---+-+--+ | + This is the ring layout for TX. As we need one virtio header for each xmit packet, we have 128 slots available. ++ || || +-+-+-+--+--+--+--+ | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat +-+-+-+--+--+--+--+ || || ++ Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM case. There are also several vhost optimization patches to be submitted later. Changes in v2: - Remove the configure macro - Enable simple R/TX processing when user specifies simple txq flags - Reword some comments and commit messages Huawei Xie (7): virtio: add virtio_rxtx.h header file virtio: add software rx ring, fake_buf into virtqueue virtio: rx/tx ring layout optimization virtio: fill RX avail ring with blank mbufs virtio: virtio vec rx virtio: simple tx routine virtio: pick simple rx/tx func drivers/net/virtio/Makefile | 2 +- drivers/net/virtio/virtio_ethdev.c | 13 ++ drivers/net/virtio/virtio_ethdev.h | 5 + drivers/net/virtio/virtio_rxtx.c| 53 - drivers/net/virtio/virtio_rxtx.h| 39 drivers/net/virtio/virtio_rxtx_simple.c | 403 drivers/net/virtio/virtqueue.h | 5 + 7 files changed, 517 insertions(+), 3 deletions(-) create mode 100644 drivers/net/virtio/virtio_rxtx.h create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c -- 1.8.1.4
[dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing
In DPDK based switching enviroment, mostly vhost runs on a dedicated core while virtio processing in guest VMs runs on different cores. Take RX for example, with generic implementation, for each guest buffer, a) virtio driver allocates a descriptor from free descriptor list b) modify the entry of avail ring to point to allocated descriptor c) after packet is received, free the descriptor When vhost fetches the avail ring, it needs to fetch the modified L1 cache from virtio core, which is a heavy cost in current CPU implementation. This idea of this optimization is: allocate the fixed descriptor for each entry of avail ring. and avail ring will always be the same during the run. This removes L1 cache transfer from virtio core to vhost core for avail ring. Besides, no descriptor free and allocation is needed. Most importantly, this makes vector procesing possible to further accelerate the processing. This is the layout for the avail ring(take 256 ring entries for example), with each entry pointing to the descriptor with the same index. avail idx + | +++---+-+--+ | 0 | 1 | 2 | ... | 254 | 255 | avail ring +-+--+-+--+-+-+-+---+--+---+ ||| | | | ||| | | | vvv | v v +-+--+-+--+-+-+-+---+--+---+ | 0 | 1 | 2 | ... | 254 | 255 | desc ring +++---+-+--+ | | +++---+-+--+ | 0 | 1 | 2 | | 254 | 255 | used ring +++---+-+--+ | + This is the ring layout for TX. As we need one virtio header for each xmit packet, we have 128 slots available. ++ || || +-+-+-+--+--+--+--+ | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat +-+-+-+--+--+--+--+ || || ++ Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM case. There are also several vhost optimization patches to be submitted later. Changes in v2: - Remove the configure macro - Enable simple R/TX processing when user specifies simple txq flags - Reword some comments and commit messages Huawei Xie (7): virtio: add virtio_rxtx.h header file virtio: add software rx ring, fake_buf into virtqueue virtio: rx/tx ring layout optimization virtio: fill RX avail ring with blank mbufs virtio: virtio vec rx virtio: simple tx routine virtio: pick simple rx/tx func drivers/net/virtio/Makefile | 2 +- drivers/net/virtio/virtio_ethdev.c | 13 ++ drivers/net/virtio/virtio_ethdev.h | 5 + drivers/net/virtio/virtio_rxtx.c| 53 - drivers/net/virtio/virtio_rxtx.h| 39 drivers/net/virtio/virtio_rxtx_simple.c | 403 drivers/net/virtio/virtqueue.h | 5 + 7 files changed, 517 insertions(+), 3 deletions(-) create mode 100644 drivers/net/virtio/virtio_rxtx.h create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c -- 1.8.1.4
[dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization
In DPDK based switching enviroment, mostly vhost runs on a dedicated core while virtio processing in guest VMs runs on different cores. Take RX for example, with generic implementation, for each guest buffer, a) virtio driver allocates a descriptor from free descriptor list b) modify the entry of avail ring to point to allocated descriptor c) after packet is received, free the descriptor When vhost fetches the avail ring, it needs to fetch the modified L1 cache from virtio core, which is a heavy cost in current CPU implementation. This idea of this optimization is: allocate the fixed descriptor for each entry of avail ring. and avail ring will always be the same during the run. This removes L1 cache transfer from virtio core to vhost core for avail ring. Besides, no descriptor free and allocation is needed. This also makes vector procesing possible to further accelerate the processing. This is the layout for the avail ring(take 256 ring entries for example), with each entry pointing to the descriptor with the same index. avail idx + | +++---+-+--+ | 0 | 1 | 2 | ... | 254 | 255 | avail ring +-+--+-+--+-+-+-+---+--+---+ ||| | | | ||| | | | vvv | v v +-+--+-+--+-+-+-+---+--+---+ | 0 | 1 | 2 | ... | 254 | 255 | desc ring +++---+-+--+ | | +++---+-+--+ | 0 | 1 | 2 | | 254 | 255 | used ring +++---+-+--+ | + This is the ring layout for TX. As we need one virtio header for each xmit packet, we have 128 slots available. ++ || || +-+-+-+--+--+--+--+ | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat +-+-+-+--+--+--+--+ || || ++ Signed-off-by: Huawei Xie --- drivers/net/virtio/virtio_rxtx.c | 24 1 file changed, 24 insertions(+) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 5c00e9d..7c82a6a 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type) nbufs = 0; error = ENOSPC; + if (use_simple_rxtx) + for (i = 0; i < vq->vq_nentries; i++) { + vq->vq_ring.avail->ring[i] = i; + vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE; + } + memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf)); for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++) vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf; @@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type) VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN, vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT); } else if (queue_type == VTNET_TQ) { + if (use_simple_rxtx) { + int mid_idx = vq->vq_nentries >> 1; + for (i = 0; i < mid_idx; i++) { + vq->vq_ring.avail->ring[i] = i + mid_idx; + vq->vq_ring.desc[i + mid_idx].next = i; + vq->vq_ring.desc[i + mid_idx].addr = + vq->virtio_net_hdr_mem + + mid_idx * vq->hw->vtnet_hdr_size; + vq->vq_ring.desc[i + mid_idx].len = + vq->hw->vtnet_hdr_size; + vq->vq_ring.desc[i + mid_idx].flags = + VRING_DESC_F_NEXT; + vq->vq_ring.desc[i].flags = 0; + } + for (i = mid_idx; i < vq->vq_nentries; i++) + vq->vq_ring.avail->ring[i] = i; + } +
[dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs
fill avail ring with blank mbufs in virtio_dev_vring_start Signed-off-by: Huawei Xie --- drivers/net/virtio/Makefile | 2 +- drivers/net/virtio/virtio_rxtx.c| 6 ++- drivers/net/virtio/virtio_rxtx.h| 3 ++ drivers/net/virtio/virtio_rxtx_simple.c | 84 + 4 files changed, 92 insertions(+), 3 deletions(-) create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile index 930b60f..43835ba 100644 --- a/drivers/net/virtio/Makefile +++ b/drivers/net/virtio/Makefile @@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c - +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c # this lib depends upon: DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 7c82a6a..5162ce6 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type) /** * Enqueue allocated buffers* ***/ - error = virtqueue_enqueue_recv_refill(vq, m); - + if (use_simple_rxtx) + error = virtqueue_enqueue_recv_refill_simple(vq, m); + else + error = virtqueue_enqueue_recv_refill(vq, m); if (error) { rte_pktmbuf_free(m); break; diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h index a10aa69..7d2d8fe 100644 --- a/drivers/net/virtio/virtio_rxtx.h +++ b/drivers/net/virtio/virtio_rxtx.h @@ -32,3 +32,6 @@ */ #define RTE_PMD_VIRTIO_RX_MAX_BURST 64 + +int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq, + struct rte_mbuf *m); diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c new file mode 100644 index 000..cac5b9f --- /dev/null +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -0,0 +1,84 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "virtio_logs.h" +#include "virtio_ethdev.h" +#include "virtqueue.h" +#include "virtio_rxtx.h" + +int __attribute__((cold)) +virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq, + struct rte_mbuf *cookie) +{ + struct vq_desc_extra *dxp; + struct vring_desc *start_dp; + uint16_t desc_idx; + + desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1); + dxp = &vq->vq_descx[desc_idx]; + dxp->cookie = (void *)cookie; + vq->sw_ring[desc_idx] = cookie; + + start_dp = vq->vq_ring.desc; + start_dp[desc_idx
[dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file
Would move all rx/tx related code into this header file in future. Add RTE_VIRTIO_PMD_MAX_BURST. Signed-off-by: Huawei Xie --- drivers/net/virtio/virtio_ethdev.c | 1 + drivers/net/virtio/virtio_rxtx.c | 1 + drivers/net/virtio/virtio_rxtx.h | 34 ++ 3 files changed, 36 insertions(+) create mode 100644 drivers/net/virtio/virtio_rxtx.h diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 465d3cd..79a3640 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -61,6 +61,7 @@ #include "virtio_pci.h" #include "virtio_logs.h" #include "virtqueue.h" +#include "virtio_rxtx.h" static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev); diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index c5b53bb..9324f7f 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -54,6 +54,7 @@ #include "virtio_logs.h" #include "virtio_ethdev.h" #include "virtqueue.h" +#include "virtio_rxtx.h" #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len) diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h new file mode 100644 index 000..a10aa69 --- /dev/null +++ b/drivers/net/virtio/virtio_rxtx.h @@ -0,0 +1,34 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#define RTE_PMD_VIRTIO_RX_MAX_BURST 64 -- 1.8.1.4
[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue
Add software RX ring in virtqueue. Add fake_mbuf in virtqueue for wraparound processing. Use global simple_rxtx to indicate whether simple rxtx is enabled Signed-off-by: Huawei Xie --- drivers/net/virtio/virtio_ethdev.c | 12 drivers/net/virtio/virtio_rxtx.c | 7 +++ drivers/net/virtio/virtqueue.h | 4 3 files changed, 23 insertions(+) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 79a3640..3b7b841 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -247,6 +247,9 @@ virtio_dev_queue_release(struct virtqueue *vq) { VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id); VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0); + if (vq->sw_ring) + rte_free(vq->sw_ring); + rte_free(vq); vq = NULL; } @@ -292,6 +295,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, dev->data->port_id, queue_idx); vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) + vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE); + vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring", + (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) * + sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id); } else if (queue_type == VTNET_TQ) { snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d", dev->data->port_id, queue_idx); @@ -308,6 +314,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__); return (-ENOMEM); } + if (queue_type == VTNET_RQ && vq->sw_ring == NULL) { + PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring", + __func__); + rte_free(vq); + return -ENOMEM; + } vq->hw = hw; vq->port_id = dev->data->port_id; diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 9324f7f..5c00e9d 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -62,6 +62,8 @@ #define VIRTIO_DUMP_PACKET(m, len) do { } while (0) #endif +static int use_simple_rxtx; + static void vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx) { @@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type) /* Allocate blank mbufs for the each rx descriptor */ nbufs = 0; error = ENOSPC; + + memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf)); + for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++) + vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf; + while (!virtqueue_full(vq)) { m = rte_rxmbuf_alloc(vq->mpool); if (m == NULL) diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 7789411..6a1ec48 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -190,6 +190,10 @@ struct virtqueue { uint16_t vq_avail_idx; phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */ + struct rte_mbuf **sw_ring; /**< RX software ring. */ + /* dummy mbuf, for wraparound when processing RX ring. */ + struct rte_mbuf fake_mbuf; + /* Statistics */ uint64_tpackets; uint64_tbytes; -- 1.8.1.4
[dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx
With fixed avail ring, we don't need to get desc idx from avail ring. virtio driver only has to deal with desc ring. This patch uses vector instruction to accelerate processing desc ring. Signed-off-by: Huawei Xie --- drivers/net/virtio/virtio_ethdev.h | 2 + drivers/net/virtio/virtio_rxtx.c| 3 + drivers/net/virtio/virtio_rxtx.h| 2 + drivers/net/virtio/virtio_rxtx_simple.c | 224 drivers/net/virtio/virtqueue.h | 1 + 5 files changed, 232 insertions(+) diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 9026d42..d7797ab 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts); /* * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 5162ce6..947fc46 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, vq->mpool = mp; dev->data->rx_queues[queue_idx] = vq; + + virtio_rxq_vec_setup(vq); + return 0; } diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h index 7d2d8fe..831e492 100644 --- a/drivers/net/virtio/virtio_rxtx.h +++ b/drivers/net/virtio/virtio_rxtx.h @@ -33,5 +33,7 @@ #define RTE_PMD_VIRTIO_RX_MAX_BURST 64 +int virtio_rxq_vec_setup(struct virtqueue *rxq); + int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq, struct rte_mbuf *m); diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index cac5b9f..ef17562 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -58,6 +58,10 @@ #include "virtqueue.h" #include "virtio_rxtx.h" +#define RTE_VIRTIO_VPMD_RX_BURST 32 +#define RTE_VIRTIO_DESC_PER_LOOP 8 +#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST + int __attribute__((cold)) virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq, struct rte_mbuf *cookie) @@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq, return 0; } + +static inline void +virtio_rxq_rearm_vec(struct virtqueue *rxvq) +{ + int i; + uint16_t desc_idx; + struct rte_mbuf **sw_ring; + struct vring_desc *start_dp; + int ret; + + desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1); + sw_ring = &rxvq->sw_ring[desc_idx]; + start_dp = &rxvq->vq_ring.desc[desc_idx]; + + ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring, + RTE_VIRTIO_VPMD_RX_REARM_THRESH); + if (unlikely(ret)) { + rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed += + RTE_VIRTIO_VPMD_RX_REARM_THRESH; + return; + } + + for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) { + uintptr_t p; + + p = (uintptr_t)&sw_ring[i]->rearm_data; + *(uint64_t *)p = rxvq->mbuf_initializer; + + start_dp[i].addr = + (uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr + + RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr)); + start_dp[i].len = sw_ring[i]->buf_len - + RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr); + } + + rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH; + rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH; + vq_update_avail_idx(rxvq); +} + +/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP) + * + * This routine is for non-mergable RX, one desc for each guest buffer. + * This routine is based on the RX ring layout optimization. Each entry in the + * avail ring points to the desc with the same index in the desc ring and this + * will never be changed in the driver. + * + * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet + */ +uint16_t +virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + struct virtqueue *rxvq = rx_queue; + uint16_t nb_used; + uint16_t desc_idx; + struct vring_used_elem *rused; + struct rte_mbuf **sw_ring; + struct rte_mbuf **sw_ring_end; + uint16_t nb_pkts_received; + __m128i shuf_msk1, shuf_msk2, len_adjust; + + shuf_msk1 = _mm_set_epi8( + 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, /* vlan tci */ + 5, 4, /* dat len */ + 0xFF, 0xFF, 5, 4,
[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
bulk free of mbufs when clean used ring. shift operation of idx could be further saved if vq_free_cnt means free slots rather than free descriptors. Signed-off-by: Huawei Xie --- drivers/net/virtio/virtio_ethdev.h | 3 ++ drivers/net/virtio/virtio_rxtx_simple.c | 95 + 2 files changed, 98 insertions(+) diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index d7797ab..ae2d47d 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); + /* * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us * frames larger than 1514 bytes. We do not yet support software LRO diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index ef17562..3339a24 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -288,6 +288,101 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, return nb_pkts_received; } +#define VIRTIO_TX_FREE_THRESH 32 +#define VIRTIO_TX_MAX_FREE_BUF_SZ 32 +#define VIRTIO_TX_FREE_NR 32 +/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */ +static inline void __attribute__((always_inline)) +virtio_xmit_cleanup(struct virtqueue *vq) +{ + uint16_t i, desc_idx; + int nb_free = 0; + struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ]; + + desc_idx = (uint16_t)(vq->vq_used_cons_idx & + ((vq->vq_nentries >> 1) - 1)); + free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie; + nb_free = 1; + + for (i = 1; i < VIRTIO_TX_FREE_NR; i++) { + m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie; + if (likely(m->pool == free[0]->pool)) + free[nb_free++] = m; + else { + rte_mempool_put_bulk(free[0]->pool, (void **)free, + nb_free); + free[0] = m; + nb_free = 1; + } + } + + rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free); + vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR; + vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1); + + return; +} + +uint16_t +virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + struct virtqueue *txvq = tx_queue; + uint16_t nb_used; + uint16_t desc_idx; + struct vring_desc *start_dp; + uint16_t nb_tail, nb_commit; + int i; + uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1; + + nb_used = VIRTQUEUE_NUSED(txvq); + rte_compiler_barrier(); + + nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts); + desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max); + start_dp = txvq->vq_ring.desc; + nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx); + + if (nb_used >= VIRTIO_TX_FREE_THRESH) + virtio_xmit_cleanup(tx_queue); + + if (nb_commit >= nb_tail) { + for (i = 0; i < nb_tail; i++) + txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i]; + for (i = 0; i < nb_tail; i++) { + start_dp[desc_idx].addr = + RTE_MBUF_DATA_DMA_ADDR(*tx_pkts); + start_dp[desc_idx].len = (*tx_pkts)->pkt_len; + tx_pkts++; + desc_idx++; + } + nb_commit -= nb_tail; + desc_idx = 0; + } + for (i = 0; i < nb_commit; i++) + txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i]; + for (i = 0; i < nb_commit; i++) { + start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts); + start_dp[desc_idx].len = (*tx_pkts)->pkt_len; + tx_pkts++; + desc_idx++; + } + + rte_compiler_barrier(); + + txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1); + txvq->vq_avail_idx += nb_pkts; + txvq->vq_ring.avail->idx = txvq->vq_avail_idx; + txvq->packets += nb_pkts; + + if (likely(nb_pkts)) { + if (unlikely(virtqueue_kick_prepare(txvq))) + virtqueue_notify(txvq); + } + + return nb_pkts; +} + int __attribute__((cold)) virtio_rxq_vec_setup(struct virtqueue *rxq) { -- 1.8.1.4
[dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func
simple rx/tx func is enabled when user specifies single segment, no offload support. merge-able should be disabled to use simple rxtx. Signed-off-by: Huawei Xie --- drivers/net/virtio/virtio_rxtx.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 947fc46..71f8cd4 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -62,6 +62,10 @@ #define VIRTIO_DUMP_PACKET(m, len) do { } while (0) #endif + +#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \ + ETH_TXQ_FLAGS_NOOFFLOADS) + static int use_simple_rxtx; static void @@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, return -EINVAL; } + /* Use simple rx/tx func if single segment and no offloads */ + if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS) { + PMD_INIT_LOG(INFO, "Using simple rx/tx path"); + dev->tx_pkt_burst = virtio_xmit_pkts_simple; + dev->rx_pkt_burst = virtio_recv_pkts_vec; + use_simple_rxtx = 1; + } + ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx, nb_desc, socket_id, &vq); if (ret < 0) { -- 1.8.1.4
[dpdk-dev] [PATCH] vhost-user: enable virtio 1.0
On Fri, Oct 16, 2015 at 02:52:30PM +0100, Bruce Richardson wrote: > On Thu, Oct 15, 2015 at 04:18:59PM +0300, Michael S. Tsirkin wrote: > > On Thu, Oct 15, 2015 at 02:08:39PM +0300, Marcel Apfelbaum wrote: > > > Make vhost-user virtio 1.0 compatible by adding it to the > > > supported features and keeping the header length > > > the same as for mergeable RX buffers. > > > > > > Signed-off-by: Marcel Apfelbaum > > > > Looks good to me > > > > Acked-by: Michael S. Tsirkin > > > > Just one question: dpdk is only supported on little-endian > > platforms at the moment, right? > > A recent release added in support for PPC (patches supplied by IBM). For > example, see: > http://dpdk.org/browse/dpdk/commit/?id=704ba3770032c5a901719d3837845581d5a56b58 > > /Bruce This will require more work then as 1.0 is a different endian-ness from 0.9. It's up to you guys to decide whether correct BE support is now a requirement for all new dpdk code. Let us know. -- MST
[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
On Sun, 18 Oct 2015 14:29:03 +0800 Huawei Xie wrote: > bulk free of mbufs when clean used ring. > shift operation of idx could be further saved if vq_free_cnt means > free slots rather than free descriptors. > > Signed-off-by: Huawei Xie Did you measure this. I finished my transmit optimizations and gets 25% performance improvement without any of these restrictions.
[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
On Sun, 18 Oct 2015 14:29:03 +0800 Huawei Xie wrote: > + > + for (i = 1; i < VIRTIO_TX_FREE_NR; i++) { > + m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie; > + if (likely(m->pool == free[0]->pool)) > + free[nb_free++] = m; > + else { > + rte_mempool_put_bulk(free[0]->pool, (void **)free, > + nb_free); > + free[0] = m; > + nb_free = 1; > + } > + } This assumes all transmits are from the same pool, which is not necessarily true.
[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
+static inline void __attribute__((always_inline)) +virtio_xmit_cleanup(struct virtqueue *vq) +{ Please don't use always inline, frustrating the compiler isn't going to help. + uint16_t i, desc_idx; + int nb_free = 0; + struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ]; + + desc_idx = (uint16_t)(vq->vq_used_cons_idx & + ((vq->vq_nentries >> 1) - 1)); + free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie; + nb_free = 1; + + for (i = 1; i < VIRTIO_TX_FREE_NR; i++) { + m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie; + if (likely(m->pool == free[0]->pool)) + free[nb_free++] = m; + else { + rte_mempool_put_bulk(free[0]->pool, (void **)free, + nb_free); + free[0] = m; + nb_free = 1; + } + } + + rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free); + vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR; + vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1); + + return; +} Don't add return; at end of void functions. It only clutters things for no reason.
[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue
On Sun, 18 Oct 2015 14:28:59 +0800 Huawei Xie wrote: > + if (vq->sw_ring) > + rte_free(vq->sw_ring); > + Do not need to test for NULL before calling rte_free. Better to just rely on the fact that rte_free(NULL) is documented to be ok (no operation).
[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
This is a tested version of the virtio Tx performance improvements that I posted earlier on the list, and described at the DPDK Userspace meeting in Dublin. Together they get a 25% performance improvement for both small packet and large multi-segment packet case when testing from DPDK guest application to Linux KVM host. Stephen Hemminger (5): virtio: clean up space checks on xmit virtio: don't use unlikely for normal tx stuff virtio: use indirect ring elements virtio: use any layout on transmit virtio: optimize transmit enqueue drivers/net/virtio/virtio_ethdev.c | 38 +++--- drivers/net/virtio/virtio_ethdev.h | 4 +- drivers/net/virtio/virtio_rxtx.c | 150 - drivers/net/virtio/virtqueue.h | 19 + 4 files changed, 130 insertions(+), 81 deletions(-) -- 2.1.4
[dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit
The space check for transmit ring only needs a single conditional. I.e only need to recheck for space if there was no space in first check. This can help performance and simplifies loop. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_rxtx.c | 66 1 file changed, 27 insertions(+), 39 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index c5b53bb..5b50ed0 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -745,7 +745,6 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { struct virtqueue *txvq = tx_queue; - struct rte_mbuf *txm; uint16_t nb_used, nb_tx; int error; @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh)) virtio_xmit_cleanup(txvq, nb_used); - nb_tx = 0; + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { + struct rte_mbuf *txm = tx_pkts[nb_tx]; + int need = txm->nb_segs - txvq->vq_free_cnt + 1; - while (nb_tx < nb_pkts) { - /* Need one more descriptor for virtio header. */ - int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1; - - /*Positive value indicates it need free vring descriptors */ + /* Positive value indicates it need free vring descriptors */ if (unlikely(need > 0)) { nb_used = VIRTQUEUE_NUSED(txvq); virtio_rmb(); need = RTE_MIN(need, (int)nb_used); virtio_xmit_cleanup(txvq, need); - need = (int)tx_pkts[nb_tx]->nb_segs - - txvq->vq_free_cnt + 1; - } - - /* -* Zero or negative value indicates it has enough free -* descriptors to use for transmitting. -*/ - if (likely(need <= 0)) { - txm = tx_pkts[nb_tx]; - - /* Do VLAN tag insertion */ - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) { - error = rte_vlan_insert(&txm); - if (unlikely(error)) { - rte_pktmbuf_free(txm); - ++nb_tx; - continue; - } + need = txm->nb_segs - txvq->vq_free_cnt + 1; + if (unlikely(need > 0)) { + PMD_TX_LOG(ERR, + "No free tx descriptors to transmit"); + break; } + } - /* Enqueue Packet buffers */ - error = virtqueue_enqueue_xmit(txvq, txm); + /* Do VLAN tag insertion */ + if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) { + error = rte_vlan_insert(&txm); if (unlikely(error)) { - if (error == ENOSPC) - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0"); - else if (error == EMSGSIZE) - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1"); - else - PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error); - break; + rte_pktmbuf_free(txm); + continue; } - nb_tx++; - txvq->bytes += txm->pkt_len; - } else { - PMD_TX_LOG(ERR, "No free tx descriptors to transmit"); + } + + /* Enqueue Packet buffers */ + error = virtqueue_enqueue_xmit(txvq, txm); + if (unlikely(error)) { + if (error == ENOSPC) + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0"); + else if (error == EMSGSIZE) + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1"); + else + PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error); break; } + txvq->bytes += txm->pkt_len; } txvq->packets += nb_tx; -- 2.1.4
[dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff
Don't use unlikely() for VLAN or ring getting full. GCC will not optimize code in unlikely paths and since these can happen with normal code that can hurt performance. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_rxtx.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 5b50ed0..dbe6665 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -763,7 +763,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) int need = txm->nb_segs - txvq->vq_free_cnt + 1; /* Positive value indicates it need free vring descriptors */ - if (unlikely(need > 0)) { + if (need > 0) { nb_used = VIRTQUEUE_NUSED(txvq); virtio_rmb(); need = RTE_MIN(need, (int)nb_used); @@ -778,7 +778,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) } /* Do VLAN tag insertion */ - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) { + if (txm->ol_flags & PKT_TX_VLAN_PKT) { error = rte_vlan_insert(&txm); if (unlikely(error)) { rte_pktmbuf_free(txm); @@ -798,10 +798,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) break; } txvq->bytes += txm->pkt_len; + ++txvq->packets; } - txvq->packets += nb_tx; - if (likely(nb_tx)) { vq_update_avail_idx(txvq); -- 2.1.4
[dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements
The virtio ring in QEMU/KVM is usually limited to 256 entries and the normal way that virtio driver was queuing mbufs required nsegs + 1 ring elements. By using the indirect ring element feature if available, each packet will take only one ring slot even for multi-segment packets. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_ethdev.c | 38 +-- drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtio_rxtx.c | 62 +++--- drivers/net/virtio/virtqueue.h | 19 4 files changed, 94 insertions(+), 28 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 465d3cd..cfce4f0 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -357,27 +357,45 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, vq->virtio_net_hdr_mem = 0; if (queue_type == VTNET_TQ) { + const struct rte_memzone *hdr_mz; + struct virtio_tx_region *txr; + int i; + /* * For each xmit packet, allocate a virtio_net_hdr +* and indirect ring elements */ snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone", dev->data->port_id, queue_idx); - vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name, - vq_size * hw->vtnet_hdr_size, + hdr_mz = rte_memzone_reserve_aligned(vq_name, + vq_size * sizeof(*txr), socket_id, 0, RTE_CACHE_LINE_SIZE); - if (vq->virtio_net_hdr_mz == NULL) { + if (hdr_mz == NULL) { if (rte_errno == EEXIST) - vq->virtio_net_hdr_mz = - rte_memzone_lookup(vq_name); - if (vq->virtio_net_hdr_mz == NULL) { + hdr_mz = rte_memzone_lookup(vq_name); + if (hdr_mz == NULL) { rte_free(vq); return -ENOMEM; } } - vq->virtio_net_hdr_mem = - vq->virtio_net_hdr_mz->phys_addr; - memset(vq->virtio_net_hdr_mz->addr, 0, - vq_size * hw->vtnet_hdr_size); + vq->virtio_net_hdr_mz = hdr_mz; + vq->virtio_net_hdr_mem = hdr_mz->phys_addr; + + txr = hdr_mz->addr; + memset(txr, 0, vq_size * sizeof(*txr)); + for (i = 0; i < vq_size; i++) { + struct vring_desc *start_dp = txr[i].tx_indir; + + vring_desc_init(start_dp, VIRTIO_MAX_INDIRECT); + + /* first indirect descriptor is always the tx header */ + start_dp->addr = vq->virtio_net_hdr_mem + + i * sizeof(*txr) + + offsetof(struct virtio_tx_region, tx_hdr); + + start_dp->len = vq->hw->vtnet_hdr_size; + start_dp->flags = VRING_DESC_F_NEXT; + } } else if (queue_type == VTNET_CQ) { /* Allocate a page for control vq command, data and status */ snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone", diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 9026d42..07a9265 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -64,7 +64,8 @@ 1u << VIRTIO_NET_F_CTRL_VQ | \ 1u << VIRTIO_NET_F_CTRL_RX | \ 1u << VIRTIO_NET_F_CTRL_VLAN | \ -1u << VIRTIO_NET_F_MRG_RXBUF) +1u << VIRTIO_NET_F_MRG_RXBUF | \ +1u << VIRTIO_RING_F_INDIRECT_DESC) /* * CQ function prototype diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index dbe6665..f68ab8f 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -199,14 +199,15 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie) } static int -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie) +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, + int use_indirect) { struct vq_desc_extra *dxp; struct vring_desc *start_dp; uint16_t seg_num = cookie->nb_segs; - uint16_t needed = 1 + seg_num; + uint16_t needed = use_indirect ? 1 : 1 + seg_num; uint16_t head_idx, idx; - uint16_t head_size = txvq->hw->vtnet_hdr_size; + unsigned long offs; if (unlikely(txvq->vq_free_cnt == 0)) return -ENOSPC; @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
Virtio supports a feature that allows sender to put transmit header prepended to data. It requires that the mbuf be writeable, correct alignment, and the feature has been negotiatied. If all this works out, then it will be the optimum way to transmit a single segment packet. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtio_rxtx.c | 66 +++--- 2 files changed, 42 insertions(+), 27 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 07a9265..f260fbb 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -65,7 +65,8 @@ 1u << VIRTIO_NET_F_CTRL_RX | \ 1u << VIRTIO_NET_F_CTRL_VLAN | \ 1u << VIRTIO_NET_F_MRG_RXBUF | \ -1u << VIRTIO_RING_F_INDIRECT_DESC) +1u << VIRTIO_RING_F_INDIRECT_DESC| \ +1u << VIRTIO_F_ANY_LAYOUT) /* * CQ function prototype diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index f68ab8f..dbedcc3 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -200,13 +200,13 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie) static int virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, - int use_indirect) + uint16_t needed, int use_indirect, int can_push) { struct vq_desc_extra *dxp; struct vring_desc *start_dp; uint16_t seg_num = cookie->nb_segs; - uint16_t needed = use_indirect ? 1 : 1 + seg_num; uint16_t head_idx, idx; + uint16_t head_size = txvq->hw->vtnet_hdr_size; unsigned long offs; if (unlikely(txvq->vq_free_cnt == 0)) @@ -223,7 +223,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, dxp->ndescs = needed; start_dp = txvq->vq_ring.desc; - if (use_indirect) { + if (can_push) { + /* put on zero'd transmit header (no offloads) */ + void *hdr = rte_pktmbuf_prepend(cookie, head_size); + + memset(hdr, 0, head_size); + } else if (use_indirect) { struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr; @@ -235,7 +240,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, start_dp[idx].flags = VRING_DESC_F_INDIRECT; start_dp = txr[idx].tx_indir; - idx = 0; + idx = 1; } else { offs = idx * sizeof(struct virtio_tx_region) + offsetof(struct virtio_tx_region, tx_hdr); @@ -243,22 +248,19 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs; start_dp[idx].len = txvq->hw->vtnet_hdr_size; start_dp[idx].flags = VRING_DESC_F_NEXT; + idx = start_dp[idx].next; } - for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) { - idx = start_dp[idx].next; + while (cookie != NULL) { start_dp[idx].addr = RTE_MBUF_DATA_DMA_ADDR(cookie); start_dp[idx].len = cookie->data_len; - start_dp[idx].flags = VRING_DESC_F_NEXT; + start_dp[idx].flags = cookie->next ? VRING_DESC_F_NEXT : 0; cookie = cookie->next; + idx = start_dp[idx].next; } - start_dp[idx].flags &= ~VRING_DESC_F_NEXT; - if (use_indirect) idx = txvq->vq_ring.desc[head_idx].next; - else - idx = start_dp[idx].next; txvq->vq_desc_head_idx = idx; if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END) @@ -761,10 +763,13 @@ virtio_recv_mergeable_pkts(void *rx_queue, return nb_rx; } + uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { struct virtqueue *txvq = tx_queue; + struct virtio_hw *hw = txvq->hw; + uint16_t hdr_size = hw->vtnet_hdr_size; uint16_t nb_used, nb_tx; int error; @@ -780,14 +785,31 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { struct rte_mbuf *txm = tx_pkts[nb_tx]; - int use_indirect, slots, need; + int can_push = 0, use_indirect = 0, slots, need; + + /* Do VLAN tag insertion */ + if (txm->ol_flags & PKT_TX_VLAN_PKT) { + error = rte_vlan_insert(&txm); + if (unlikely(error)) { + rte_pktmbuf_free(txm); + continue; + } + } - use_indirect = vtpci_with_feature(txvq->hw, -
[dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue
All the error checks in virtqueue_enqueue_xmit are already done by the caller. Therefore they can be removed to improve performance. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_rxtx.c | 23 ++- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index dbedcc3..8fa0dd7 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -198,7 +198,7 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie) return 0; } -static int +static inline void virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, uint16_t needed, int use_indirect, int can_push) { @@ -209,14 +209,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, uint16_t head_size = txvq->hw->vtnet_hdr_size; unsigned long offs; - if (unlikely(txvq->vq_free_cnt == 0)) - return -ENOSPC; - if (unlikely(txvq->vq_free_cnt < needed)) - return -EMSGSIZE; head_idx = txvq->vq_desc_head_idx; - if (unlikely(head_idx >= txvq->vq_nentries)) - return -EFAULT; - idx = head_idx; dxp = &txvq->vq_descx[idx]; dxp->cookie = (void *)cookie; @@ -267,8 +260,6 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, txvq->vq_desc_tail_idx = idx; txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed); vq_update_avail_ring(txvq, head_idx); - - return 0; } static inline struct rte_mbuf * @@ -828,17 +819,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) } /* Enqueue Packet buffers */ - error = virtqueue_enqueue_xmit(txvq, txm, slots, - use_indirect, can_push); - if (unlikely(error)) { - if (error == ENOSPC) - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0"); - else if (error == EMSGSIZE) - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1"); - else - PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error); - break; - } + virtqueue_enqueue_xmit(txvq, txm, slots, use_indirect, can_push); txvq->bytes += txm->pkt_len; ++txvq->packets; } -- 2.1.4