date:20151018

[dpdk-dev] [Q] l2fwd in examples directory

2015-10-18 Thread Moon-Sang Lee

thanks bruce.

I didn't know that PCI slots have direct socket affinity.
is it static or configurable through PCI configuration space?
well, my NUT, two node NUMA, seems always returns -1 on calling
rte_eth_dev_socket_id(portid) whenever portid is 0, 1, or other values.
I appreciate if you explain more about getting the affinity.

p.s.
I'm using intel Xeon processor and 1G NIC(82576).




On Fri, Oct 16, 2015 at 10:43 PM, Bruce Richardson <
bruce.richardson at intel.com> wrote:

> On Thu, Oct 15, 2015 at 11:08:57AM +0900, Moon-Sang Lee wrote:
> > There is codes as below in examples/l2fwd/main.c and I think
> > rte_eth_dev_socket_id(portid)
> > always returns -1(SOCKET_ID_ANY) since there is no association code
> between
> > port and
> > lcore in the example codes.
>
> Can you perhaps clarify what you mean here. On modern NUMA systems, such
> as those
> from Intel :-), the PCI slots are directly connected to the CPU sockets,
> so the
> ethernet ports do indeed have a direct NUMA affinity. It's not something
> that
> the app needs to specify.
>
> /Bruce
>
> > (i.e. I need to find a matching lcore from
> > lcore_queue_conf[] with portid
> > and call rte_lcore_to_socket_id(lcore_id).)
> >
> > /* init one RX queue */
> > fflush(stdout);
> > ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd,
> >  rte_eth_dev_socket_id(portid),
> >  NULL,
> >  l2fwd_pktmbuf_pool);
> > if (ret < 0)
> > rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d,
> > port=%u\n",
> >   ret, (unsigned) portid);
> >
> > It works fine even though memory is allocated in different NUMA node.
> But I
> > wonder there is
> > a DPDK API that associates inlcore to port internally thus
> > rte_eth_devices[portid].pci_dev->numa_node
> > contains proper node.
> >
> >
> > --
> > Moon-Sang Lee, SW Engineer
> > Email: sang0627 at gmail.com
> > Wisdom begins in wonder. *Socrates*
>



-- 
Moon-Sang Lee, SW Engineer
Email: sang0627 at gmail.com
Wisdom begins in wonder. *Socrates*

[dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing

2015-10-18 Thread Huawei Xie

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.

Most importantly, this makes vector procesing possible to further accelerate
the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+++---+-+--+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+-+---+--+---+
  |||   |   |  |
  |||   |   |  |
  vvv   |   v  v
+-+--+-+--+-+-+-+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+++---+-+--+
|
|
+++---+-+--+
| 0  | 1  | 2 | |  254  | 255  |  used ring
+++---+-+--+
|
+

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

 ++
 ||
 ||
+-+-+-+--+--+--+--+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for 
virtio_net_hdr
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-+-+-+--+--+--+--+
 ||
 ||
 ++

Performance boost could be observed only if the virtio backend isn't the 
bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func

 drivers/net/virtio/Makefile |   2 +-
 drivers/net/virtio/virtio_ethdev.c  |  13 ++
 drivers/net/virtio/virtio_ethdev.h  |   5 +
 drivers/net/virtio/virtio_rxtx.c|  53 -
 drivers/net/virtio/virtio_rxtx.h|  39 
 drivers/net/virtio/virtio_rxtx_simple.c | 403 
 drivers/net/virtio/virtqueue.h  |   5 +
 7 files changed, 517 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

[dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing

2015-10-18 Thread Huawei Xie

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.

Most importantly, this makes vector procesing possible to further accelerate
the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+++---+-+--+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+-+---+--+---+
  |||   |   |  |
  |||   |   |  |
  vvv   |   v  v
+-+--+-+--+-+-+-+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+++---+-+--+
|
|
+++---+-+--+
| 0  | 1  | 2 | |  254  | 255  |  used ring
+++---+-+--+
|
+

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

 ++
 ||
 ||
+-+-+-+--+--+--+--+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for 
virtio_net_hdr
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-+-+-+--+--+--+--+
 ||
 ||
 ++

Performance boost could be observed only if the virtio backend isn't the 
bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func

 drivers/net/virtio/Makefile |   2 +-
 drivers/net/virtio/virtio_ethdev.c  |  13 ++
 drivers/net/virtio/virtio_ethdev.h  |   5 +
 drivers/net/virtio/virtio_rxtx.c|  53 -
 drivers/net/virtio/virtio_rxtx.h|  39 
 drivers/net/virtio/virtio_rxtx_simple.c | 403 
 drivers/net/virtio/virtqueue.h  |   5 +
 7 files changed, 517 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

[dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization

2015-10-18 Thread Huawei Xie

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+++---+-+--+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+-+---+--+---+
  |||   |   |  |
  |||   |   |  |
  vvv   |   v  v
+-+--+-+--+-+-+-+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+++---+-+--+
|
|
+++---+-+--+
| 0  | 1  | 2 | |  254  | 255  |  used ring
+++---+-+--+
|
+

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

 ++
 ||
 ||
+-+-+-+--+--+--+--+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for 
virtio_net_hdr
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-+-+-+--+--+--+--+
 ||
 ||
 ++

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_rxtx.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
nbufs = 0;
error = ENOSPC;

+   if (use_simple_rxtx)
+   for (i = 0; i < vq->vq_nentries; i++) {
+   vq->vq_ring.avail->ring[i] = i;
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+   }
+
memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
} else if (queue_type == VTNET_TQ) {
+   if (use_simple_rxtx) {
+   int mid_idx  = vq->vq_nentries >> 1;
+   for (i = 0; i < mid_idx; i++) {
+   vq->vq_ring.avail->ring[i] = i + mid_idx;
+   vq->vq_ring.desc[i + mid_idx].next = i;
+   vq->vq_ring.desc[i + mid_idx].addr =
+   vq->virtio_net_hdr_mem +
+   mid_idx * 
vq->hw->vtnet_hdr_size;
+   vq->vq_ring.desc[i + mid_idx].len =
+   vq->hw->vtnet_hdr_size;
+   vq->vq_ring.desc[i + mid_idx].flags =
+   VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[i].flags = 0;
+   }
+   for (i = mid_idx; i < vq->vq_nentries; i++)
+   vq->vq_ring.avail->ring[i] = i;
+   }
+

[dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs

2015-10-18 Thread Huawei Xie

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/Makefile |  2 +-
 drivers/net/virtio/virtio_rxtx.c|  6 ++-
 drivers/net/virtio/virtio_rxtx.h|  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
/**
* Enqueue allocated buffers*
***/
-   error = virtqueue_enqueue_recv_refill(vq, m);
-
+   if (use_simple_rxtx)
+   error = 
virtqueue_enqueue_recv_refill_simple(vq, m);
+   else
+   error = virtqueue_enqueue_recv_refill(vq, m);
if (error) {
rte_pktmbuf_free(m);
break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */

 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+   struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+   struct rte_mbuf *cookie)
+{
+   struct vq_desc_extra *dxp;
+   struct vring_desc *start_dp;
+   uint16_t desc_idx;
+
+   desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+   dxp = &vq->vq_descx[desc_idx];
+   dxp->cookie = (void *)cookie;
+   vq->sw_ring[desc_idx] = cookie;
+
+   start_dp = vq->vq_ring.desc;
+   start_dp[desc_idx

[dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file

2015-10-18 Thread Huawei Xie

Would move all rx/tx related code into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"


 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"

 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue

2015-10-18 Thread Huawei Xie

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c | 12 
 drivers/net/virtio/virtio_rxtx.c   |  7 +++
 drivers/net/virtio/virtqueue.h |  4 
 3 files changed, 23 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..3b7b841 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,6 +247,9 @@ virtio_dev_queue_release(struct virtqueue *vq) {
VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);

+   if (vq->sw_ring)
+   rte_free(vq->sw_ring);
+
rte_free(vq);
vq = NULL;
}
@@ -292,6 +295,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
dev->data->port_id, queue_idx);
vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
vq_size * sizeof(struct vq_desc_extra), 
RTE_CACHE_LINE_SIZE);
+   vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+   (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+   sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
} else if (queue_type == VTNET_TQ) {
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
dev->data->port_id, queue_idx);
@@ -308,6 +314,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
return (-ENOMEM);
}
+   if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+   PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+   __func__);
+   rte_free(vq);
+   return -ENOMEM;
+   }

vq->hw = hw;
vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif

+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
/* Allocate blank mbufs for the each rx descriptor */
nbufs = 0;
error = ENOSPC;
+
+   memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+   for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+   vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
while (!virtqueue_full(vq)) {
m = rte_rxmbuf_alloc(vq->mpool);
if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
uint16_t vq_avail_idx;
phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */

+   struct rte_mbuf **sw_ring; /**< RX software ring. */
+   /* dummy mbuf, for wraparound when processing RX ring. */
+   struct rte_mbuf fake_mbuf;
+
/* Statistics */
uint64_tpackets;
uint64_tbytes;
-- 
1.8.1.4

[dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx

2015-10-18 Thread Huawei Xie

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.h  |   2 +
 drivers/net/virtio/virtio_rxtx.c|   3 +
 drivers/net/virtio/virtio_rxtx.h|   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 
 drivers/net/virtio/virtqueue.h  |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct 
rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts);

 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
vq->mpool = mp;

dev->data->rx_queues[queue_idx] = vq;
+
+   virtio_rxq_vec_setup(vq);
+
return 0;
 }

diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@

 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64

+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,

return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+   int i;
+   uint16_t desc_idx;
+   struct rte_mbuf **sw_ring;
+   struct vring_desc *start_dp;
+   int ret;
+
+   desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+   sw_ring = &rxvq->sw_ring[desc_idx];
+   start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+   ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+   RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+   if (unlikely(ret)) {
+   rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+   return;
+   }
+
+   for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+   uintptr_t p;
+
+   p = (uintptr_t)&sw_ring[i]->rearm_data;
+   *(uint64_t *)p = rxvq->mbuf_initializer;
+
+   start_dp[i].addr =
+   (uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+   RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+   start_dp[i].len = sw_ring[i]->buf_len -
+   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+   }
+
+   rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+   rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+   vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= 
RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct virtqueue *rxvq = rx_queue;
+   uint16_t nb_used;
+   uint16_t desc_idx;
+   struct vring_used_elem *rused;
+   struct rte_mbuf **sw_ring;
+   struct rte_mbuf **sw_ring_end;
+   uint16_t nb_pkts_received;
+   __m128i shuf_msk1, shuf_msk2, len_adjust;
+
+   shuf_msk1 = _mm_set_epi8(
+   0xFF, 0xFF, 0xFF, 0xFF,
+   0xFF, 0xFF, /* vlan tci */
+   5, 4,   /* dat len */
+   0xFF, 0xFF, 5, 4,

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-18 Thread Huawei Xie

bulk free of mbufs when clean used ring.
shift operation of idx could be further saved if vq_free_cnt means
free slots rather than free descriptors.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.h  |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 95 +
 2 files changed, 98 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..3339a24 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,101 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return nb_pkts_received;
 }

+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift 
*/
+static inline void __attribute__((always_inline))
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+   uint16_t i, desc_idx;
+   int nb_free = 0;
+   struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+   desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+   ((vq->vq_nentries >> 1) - 1));
+   free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+   nb_free = 1;
+
+   for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+   m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+   if (likely(m->pool == free[0]->pool))
+   free[nb_free++] = m;
+   else {
+   rte_mempool_put_bulk(free[0]->pool, (void **)free,
+   nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+   vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+
+   return;
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   struct virtqueue *txvq = tx_queue;
+   uint16_t nb_used;
+   uint16_t desc_idx;
+   struct vring_desc *start_dp;
+   uint16_t nb_tail, nb_commit;
+   int i;
+   uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+   nb_used = VIRTQUEUE_NUSED(txvq);
+   rte_compiler_barrier();
+
+   nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+   desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+   start_dp = txvq->vq_ring.desc;
+   nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+   if (nb_used >= VIRTIO_TX_FREE_THRESH)
+   virtio_xmit_cleanup(tx_queue);
+
+   if (nb_commit >= nb_tail) {
+   for (i = 0; i < nb_tail; i++)
+   txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+   for (i = 0; i < nb_tail; i++) {
+   start_dp[desc_idx].addr =
+   RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+   start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+   tx_pkts++;
+   desc_idx++;
+   }
+   nb_commit -= nb_tail;
+   desc_idx = 0;
+   }
+   for (i = 0; i < nb_commit; i++)
+   txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+   for (i = 0; i < nb_commit; i++) {
+   start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+   start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+   tx_pkts++;
+   desc_idx++;
+   }
+
+   rte_compiler_barrier();
+
+   txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+   txvq->vq_avail_idx += nb_pkts;
+   txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+   txvq->packets += nb_pkts;
+
+   if (likely(nb_pkts)) {
+   if (unlikely(virtqueue_kick_prepare(txvq)))
+   virtqueue_notify(txvq);
+   }
+
+   return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

[dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func

2015-10-18 Thread Huawei Xie

simple rx/tx func is enabled when user specifies single segment, no offload 
support.
merge-able should be disabled to use simple rxtx.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_rxtx.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..71f8cd4 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif

+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;

 static void
@@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

+   /* Use simple rx/tx func if single segment and no offloads */
+   if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS) {
+   PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+   dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+   dev->rx_pkt_burst = virtio_recv_pkts_vec;
+   use_simple_rxtx = 1;
+   }
+
ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
nb_desc, socket_id, &vq);
if (ret < 0) {
-- 
1.8.1.4

[dpdk-dev] [PATCH] vhost-user: enable virtio 1.0

2015-10-18 Thread Michael S. Tsirkin

On Fri, Oct 16, 2015 at 02:52:30PM +0100, Bruce Richardson wrote:
> On Thu, Oct 15, 2015 at 04:18:59PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Oct 15, 2015 at 02:08:39PM +0300, Marcel Apfelbaum wrote:
> > > Make vhost-user virtio 1.0 compatible by adding it to the
> > > supported features and keeping the header length
> > > the same as for mergeable RX buffers.
> > > 
> > > Signed-off-by: Marcel Apfelbaum 
> > 
> > Looks good to me
> > 
> > Acked-by: Michael S. Tsirkin 
> > 
> > Just one question: dpdk is only supported on little-endian
> > platforms at the moment, right?
> 
> A recent release added in support for PPC (patches supplied by IBM). For 
> example, see: 
> http://dpdk.org/browse/dpdk/commit/?id=704ba3770032c5a901719d3837845581d5a56b58
> 
> /Bruce

This will require more work then as 1.0 is a different
endian-ness from 0.9. It's up to you guys to decide
whether correct BE support is now a requirement for all
new dpdk code. Let us know.

-- 
MST

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-18 Thread Stephen Hemminger

On Sun, 18 Oct 2015 14:29:03 +0800
Huawei Xie  wrote:

> bulk free of mbufs when clean used ring.
> shift operation of idx could be further saved if vq_free_cnt means
> free slots rather than free descriptors.
> 
> Signed-off-by: Huawei Xie 

Did you measure this. I finished my transmit optimizations and gets 25% 
performance improvement
without any of these restrictions.

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-18 Thread Stephen Hemminger

On Sun, 18 Oct 2015 14:29:03 +0800
Huawei Xie  wrote:

> +
> + for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> + m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> + if (likely(m->pool == free[0]->pool))
> + free[nb_free++] = m;
> + else {
> + rte_mempool_put_bulk(free[0]->pool, (void **)free,
> + nb_free);
> + free[0] = m;
> + nb_free = 1;
> + }
> + }

This assumes all transmits are from the same pool, which is not necessarily 
true.

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

2015-10-18 Thread Stephen Hemminger


+static inline void __attribute__((always_inline))
+virtio_xmit_cleanup(struct virtqueue *vq)
+{

Please don't use always inline, frustrating the compiler isn't going
to help.

+   uint16_t i, desc_idx;
+   int nb_free = 0;
+   struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+   desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+   ((vq->vq_nentries >> 1) - 1));
+   free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+   nb_free = 1;
+
+   for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+   m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+   if (likely(m->pool == free[0]->pool))
+   free[nb_free++] = m;
+   else {
+   rte_mempool_put_bulk(free[0]->pool, (void **)free,
+   nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+   vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+
+   return;
+}

Don't add return; at end of void functions. It only clutters
things for no reason.

[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue

2015-10-18 Thread Stephen Hemminger

On Sun, 18 Oct 2015 14:28:59 +0800
Huawei Xie  wrote:

> + if (vq->sw_ring)
> + rte_free(vq->sw_ring);
> +

Do not need to test for NULL before calling rte_free.
Better to just rely on the fact that rte_free(NULL) is documented
to be ok (no operation).

[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements

2015-10-18 Thread Stephen Hemminger

This is a tested version of the virtio Tx performance improvements
that I posted earlier on the list, and described at the DPDK Userspace
meeting in Dublin. Together they get a 25% performance improvement for
both small packet and large multi-segment packet case when testing
from DPDK guest application to Linux KVM host.

Stephen Hemminger (5):
  virtio: clean up space checks on xmit
  virtio: don't use unlikely for normal tx stuff
  virtio: use indirect ring elements
  virtio: use any layout on transmit
  virtio: optimize transmit enqueue

 drivers/net/virtio/virtio_ethdev.c |  38 +++---
 drivers/net/virtio/virtio_ethdev.h |   4 +-
 drivers/net/virtio/virtio_rxtx.c   | 150 -
 drivers/net/virtio/virtqueue.h |  19 +
 4 files changed, 130 insertions(+), 81 deletions(-)

-- 
2.1.4

[dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit

2015-10-18 Thread Stephen Hemminger

The space check for transmit ring only needs a single conditional.
I.e only need to recheck for space if there was no space in first check.

This can help performance and simplifies loop.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/virtio/virtio_rxtx.c | 66 
 1 file changed, 27 insertions(+), 39 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..5b50ed0 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -745,7 +745,6 @@ uint16_t
 virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *txvq = tx_queue;
-   struct rte_mbuf *txm;
uint16_t nb_used, nb_tx;
int error;

@@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
virtio_xmit_cleanup(txvq, nb_used);

-   nb_tx = 0;
+   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+   struct rte_mbuf *txm = tx_pkts[nb_tx];
+   int need = txm->nb_segs - txvq->vq_free_cnt + 1;

-   while (nb_tx < nb_pkts) {
-   /* Need one more descriptor for virtio header. */
-   int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
-
-   /*Positive value indicates it need free vring descriptors */
+   /* Positive value indicates it need free vring descriptors */
if (unlikely(need > 0)) {
nb_used = VIRTQUEUE_NUSED(txvq);
virtio_rmb();
need = RTE_MIN(need, (int)nb_used);

virtio_xmit_cleanup(txvq, need);
-   need = (int)tx_pkts[nb_tx]->nb_segs -
-   txvq->vq_free_cnt + 1;
-   }
-
-   /*
-* Zero or negative value indicates it has enough free
-* descriptors to use for transmitting.
-*/
-   if (likely(need <= 0)) {
-   txm = tx_pkts[nb_tx];
-
-   /* Do VLAN tag insertion */
-   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
-   error = rte_vlan_insert(&txm);
-   if (unlikely(error)) {
-   rte_pktmbuf_free(txm);
-   ++nb_tx;
-   continue;
-   }
+   need = txm->nb_segs - txvq->vq_free_cnt + 1;
+   if (unlikely(need > 0)) {
+   PMD_TX_LOG(ERR,
+  "No free tx descriptors to 
transmit");
+   break;
}
+   }

-   /* Enqueue Packet buffers */
-   error = virtqueue_enqueue_xmit(txvq, txm);
+   /* Do VLAN tag insertion */
+   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
+   error = rte_vlan_insert(&txm);
if (unlikely(error)) {
-   if (error == ENOSPC)
-   PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
count = 0");
-   else if (error == EMSGSIZE)
-   PMD_TX_LOG(ERR, "virtqueue_enqueue Free 
count < 1");
-   else
-   PMD_TX_LOG(ERR, "virtqueue_enqueue 
error: %d", error);
-   break;
+   rte_pktmbuf_free(txm);
+   continue;
}
-   nb_tx++;
-   txvq->bytes += txm->pkt_len;
-   } else {
-   PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
+   }
+
+   /* Enqueue Packet buffers */
+   error = virtqueue_enqueue_xmit(txvq, txm);
+   if (unlikely(error)) {
+   if (error == ENOSPC)
+   PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 
0");
+   else if (error == EMSGSIZE)
+   PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 
1");
+   else
+   PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", 
error);
break;
}
+   txvq->bytes += txm->pkt_len;
}

txvq->packets += nb_tx;
-- 
2.1.4

[dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff

2015-10-18 Thread Stephen Hemminger

Don't use unlikely() for VLAN or ring getting full.
GCC will not optimize code in unlikely paths and since these can
happen with normal code that can hurt performance.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/virtio/virtio_rxtx.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5b50ed0..dbe6665 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -763,7 +763,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
int need = txm->nb_segs - txvq->vq_free_cnt + 1;

/* Positive value indicates it need free vring descriptors */
-   if (unlikely(need > 0)) {
+   if (need > 0) {
nb_used = VIRTQUEUE_NUSED(txvq);
virtio_rmb();
need = RTE_MIN(need, (int)nb_used);
@@ -778,7 +778,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
}

/* Do VLAN tag insertion */
-   if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
+   if (txm->ol_flags & PKT_TX_VLAN_PKT) {
error = rte_vlan_insert(&txm);
if (unlikely(error)) {
rte_pktmbuf_free(txm);
@@ -798,10 +798,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
break;
}
txvq->bytes += txm->pkt_len;
+   ++txvq->packets;
}

-   txvq->packets += nb_tx;
-
if (likely(nb_tx)) {
vq_update_avail_idx(txvq);

-- 
2.1.4

[dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements

2015-10-18 Thread Stephen Hemminger

The virtio ring in QEMU/KVM is usually limited to 256 entries
and the normal way that virtio driver was queuing mbufs required
nsegs + 1 ring elements. By using the indirect ring element feature
if available, each packet will take only one ring slot even for
multi-segment packets.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/virtio/virtio_ethdev.c | 38 +--
 drivers/net/virtio/virtio_ethdev.h |  3 +-
 drivers/net/virtio/virtio_rxtx.c   | 62 +++---
 drivers/net/virtio/virtqueue.h | 19 
 4 files changed, 94 insertions(+), 28 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..cfce4f0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -357,27 +357,45 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mem = 0;

if (queue_type == VTNET_TQ) {
+   const struct rte_memzone *hdr_mz;
+   struct virtio_tx_region *txr;
+   int i;
+
/*
 * For each xmit packet, allocate a virtio_net_hdr
+* and indirect ring elements
 */
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
dev->data->port_id, queue_idx);
-   vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
-   vq_size * hw->vtnet_hdr_size,
+   hdr_mz = rte_memzone_reserve_aligned(vq_name,
+   vq_size * sizeof(*txr),
socket_id, 0, RTE_CACHE_LINE_SIZE);
-   if (vq->virtio_net_hdr_mz == NULL) {
+   if (hdr_mz == NULL) {
if (rte_errno == EEXIST)
-   vq->virtio_net_hdr_mz =
-   rte_memzone_lookup(vq_name);
-   if (vq->virtio_net_hdr_mz == NULL) {
+   hdr_mz = rte_memzone_lookup(vq_name);
+   if (hdr_mz == NULL) {
rte_free(vq);
return -ENOMEM;
}
}
-   vq->virtio_net_hdr_mem =
-   vq->virtio_net_hdr_mz->phys_addr;
-   memset(vq->virtio_net_hdr_mz->addr, 0,
-   vq_size * hw->vtnet_hdr_size);
+   vq->virtio_net_hdr_mz = hdr_mz;
+   vq->virtio_net_hdr_mem = hdr_mz->phys_addr;
+
+   txr = hdr_mz->addr;
+   memset(txr, 0, vq_size * sizeof(*txr));
+   for (i = 0; i < vq_size; i++) {
+   struct vring_desc *start_dp = txr[i].tx_indir;
+
+   vring_desc_init(start_dp, VIRTIO_MAX_INDIRECT);
+
+   /* first indirect descriptor is always the tx header */
+   start_dp->addr = vq->virtio_net_hdr_mem
+   + i * sizeof(*txr)
+   + offsetof(struct virtio_tx_region, tx_hdr);
+
+   start_dp->len = vq->hw->vtnet_hdr_size;
+   start_dp->flags = VRING_DESC_F_NEXT;
+   }
} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..07a9265 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -64,7 +64,8 @@
 1u << VIRTIO_NET_F_CTRL_VQ   | \
 1u << VIRTIO_NET_F_CTRL_RX   | \
 1u << VIRTIO_NET_F_CTRL_VLAN | \
-1u << VIRTIO_NET_F_MRG_RXBUF)
+1u << VIRTIO_NET_F_MRG_RXBUF | \
+1u << VIRTIO_RING_F_INDIRECT_DESC)

 /*
  * CQ function prototype
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index dbe6665..f68ab8f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -199,14 +199,15 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, 
struct rte_mbuf *cookie)
 }

 static int
-virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
+virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
+  int use_indirect)
 {
struct vq_desc_extra *dxp;
struct vring_desc *start_dp;
uint16_t seg_num = cookie->nb_segs;
-   uint16_t needed = 1 + seg_num;
+   uint16_t needed = use_indirect ? 1 : 1 + seg_num;
uint16_t head_idx, idx;
-   uint16_t head_size = txvq->hw->vtnet_hdr_size;
+   unsigned long offs;

if (unlikely(txvq->vq_free_cnt == 0))
return -ENOSPC;
@@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie)

[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit

2015-10-18 Thread Stephen Hemminger

Virtio supports a feature that allows sender to put transmit
header prepended to data.  It requires that the mbuf be writeable, correct
alignment, and the feature has been negotiatied.  If all this works out,
then it will be the optimum way to transmit a single segment packet.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/virtio/virtio_ethdev.h |  3 +-
 drivers/net/virtio/virtio_rxtx.c   | 66 +++---
 2 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 07a9265..f260fbb 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -65,7 +65,8 @@
 1u << VIRTIO_NET_F_CTRL_RX   | \
 1u << VIRTIO_NET_F_CTRL_VLAN | \
 1u << VIRTIO_NET_F_MRG_RXBUF | \
-1u << VIRTIO_RING_F_INDIRECT_DESC)
+1u << VIRTIO_RING_F_INDIRECT_DESC| \
+1u << VIRTIO_F_ANY_LAYOUT)

 /*
  * CQ function prototype
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index f68ab8f..dbedcc3 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -200,13 +200,13 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, 
struct rte_mbuf *cookie)

 static int
 virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
-  int use_indirect)
+  uint16_t needed, int use_indirect, int can_push)
 {
struct vq_desc_extra *dxp;
struct vring_desc *start_dp;
uint16_t seg_num = cookie->nb_segs;
-   uint16_t needed = use_indirect ? 1 : 1 + seg_num;
uint16_t head_idx, idx;
+   uint16_t head_size = txvq->hw->vtnet_hdr_size;
unsigned long offs;

if (unlikely(txvq->vq_free_cnt == 0))
@@ -223,7 +223,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
dxp->ndescs = needed;
start_dp = txvq->vq_ring.desc;

-   if (use_indirect) {
+   if (can_push) {
+   /* put on zero'd transmit header (no offloads) */
+   void *hdr = rte_pktmbuf_prepend(cookie, head_size);
+
+   memset(hdr, 0, head_size);
+   } else if (use_indirect) {
struct virtio_tx_region *txr
= txvq->virtio_net_hdr_mz->addr;

@@ -235,7 +240,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
start_dp[idx].flags = VRING_DESC_F_INDIRECT;

start_dp = txr[idx].tx_indir;
-   idx = 0;
+   idx = 1;
} else {
offs = idx * sizeof(struct virtio_tx_region)
+ offsetof(struct virtio_tx_region, tx_hdr);
@@ -243,22 +248,19 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
start_dp[idx].len   = txvq->hw->vtnet_hdr_size;
start_dp[idx].flags = VRING_DESC_F_NEXT;
+   idx = start_dp[idx].next;
}

-   for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
-   idx = start_dp[idx].next;
+   while (cookie != NULL) {
start_dp[idx].addr  = RTE_MBUF_DATA_DMA_ADDR(cookie);
start_dp[idx].len   = cookie->data_len;
-   start_dp[idx].flags = VRING_DESC_F_NEXT;
+   start_dp[idx].flags = cookie->next ? VRING_DESC_F_NEXT : 0;
cookie = cookie->next;
+   idx = start_dp[idx].next;
}

-   start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
-
if (use_indirect)
idx = txvq->vq_ring.desc[head_idx].next;
-   else
-   idx = start_dp[idx].next;

txvq->vq_desc_head_idx = idx;
if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
@@ -761,10 +763,13 @@ virtio_recv_mergeable_pkts(void *rx_queue,
return nb_rx;
 }

+
 uint16_t
 virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *txvq = tx_queue;
+   struct virtio_hw *hw = txvq->hw;
+   uint16_t hdr_size = hw->vtnet_hdr_size;
uint16_t nb_used, nb_tx;
int error;

@@ -780,14 +785,31 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)

for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
struct rte_mbuf *txm = tx_pkts[nb_tx];
-   int use_indirect, slots, need;
+   int can_push = 0, use_indirect = 0, slots, need;
+
+   /* Do VLAN tag insertion */
+   if (txm->ol_flags & PKT_TX_VLAN_PKT) {
+   error = rte_vlan_insert(&txm);
+   if (unlikely(error)) {
+   rte_pktmbuf_free(txm);
+   continue;
+   }
+   }

-   use_indirect = vtpci_with_feature(txvq->hw,
-

[dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue

2015-10-18 Thread Stephen Hemminger

All the error checks in virtqueue_enqueue_xmit are already done
by the caller. Therefore they can be removed to improve performance.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/virtio/virtio_rxtx.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index dbedcc3..8fa0dd7 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -198,7 +198,7 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct 
rte_mbuf *cookie)
return 0;
 }

-static int
+static inline void
 virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
   uint16_t needed, int use_indirect, int can_push)
 {
@@ -209,14 +209,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
uint16_t head_size = txvq->hw->vtnet_hdr_size;
unsigned long offs;

-   if (unlikely(txvq->vq_free_cnt == 0))
-   return -ENOSPC;
-   if (unlikely(txvq->vq_free_cnt < needed))
-   return -EMSGSIZE;
head_idx = txvq->vq_desc_head_idx;
-   if (unlikely(head_idx >= txvq->vq_nentries))
-   return -EFAULT;
-
idx = head_idx;
dxp = &txvq->vq_descx[idx];
dxp->cookie = (void *)cookie;
@@ -267,8 +260,6 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
txvq->vq_desc_tail_idx = idx;
txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed);
vq_update_avail_ring(txvq, head_idx);
-
-   return 0;
 }

 static inline struct rte_mbuf *
@@ -828,17 +819,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
}

/* Enqueue Packet buffers */
-   error = virtqueue_enqueue_xmit(txvq, txm, slots,
-  use_indirect, can_push);
-   if (unlikely(error)) {
-   if (error == ENOSPC)
-   PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 
0");
-   else if (error == EMSGSIZE)
-   PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 
1");
-   else
-   PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", 
error);
-   break;
-   }
+   virtqueue_enqueue_xmit(txvq, txm, slots, use_indirect, 
can_push);
txvq->bytes += txm->pkt_len;
++txvq->packets;
}
-- 
2.1.4

[dpdk-dev] [Q] l2fwd in examples directory

[dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing

[dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing

[dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization

[dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs

[dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file

[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue

[dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

[dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func

[dpdk-dev] [PATCH] vhost-user: enable virtio 1.0

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

[dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine

[dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue

[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements

[dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit

[dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff

[dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements

[dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit

[dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue

21 matches

Site Navigation

Mail list logo

Footer information