[dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

2016-10-09 Thread Yuanhan Liu
On Thu, Oct 06, 2016 at 02:37:27PM +, Xu, Qian Q wrote:
> this function copy_desc_to_mbuf has changed on the dpdk-next-virtio repo. 
> Based on current dpdk-next-virtio repo, the commit ID is as below: 
> commit b4f7b43cd9d3b6413f41221051d03a23bc5f5fbe
> Author: Zhiyong Yang 
> Date:   Thu Sep 29 20:35:49 2016 +0800
> 
> Then you will find the parameter "struct vhost_virtqueue *vq" is removed, so 
> if apply your patch on that commit ID, the build will fail, since no vq 
> definition but we used it in the function. 
> Could you check? Thx. 

I knew that: a rebase is needed, and I have done the rebase (locally);
just haven't sent it out yet.

--yliu

> 
> == Build lib/librte_table
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c: In function 
> 'copy_desc_to_mbuf':
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: error: 'vq' 
> undeclared (first use in this function)
>zmbuf = get_zmbuf(vq);
>  ^
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: note: each 
> undeclared identifier is reported only once for each function it appears in
> /home/qxu10/dpdk-zero/mk/internal/rte.compile-pre.mk:138: recipe for target 
> 'virtio_net.o' failed
> make[5]: *** [virtio_net.o] Error 1
> /home/qxu10/dpdk-zero/mk/rte.subdir.mk:61: recipe for target 'librte_vhost' 
> failed
> make[4]: *** [librte_vhost] Error 2
> make[4]: *** Waiting for unfinished jobs
> 


[dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance

2016-10-09 Thread Qiming Yang
This patch adds a notice that the ABI change for ethtool app to
get the NIC firmware version in the 17.02 release.

Signed-off-by: Qiming Yang 
---
 doc/guides/rel_notes/deprecation.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 845d2aa..60bd7ed 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,7 @@ Deprecation Notices
 * API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
   structures. The member ``file_name`` data type will be changed from
   ``char *`` to ``const char *``. This change targets release 16.11.
+
+* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
+  will be extended with a new member ``fw_version`` in order to store
+  the NIC firmware version.
-- 
2.7.4



[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio

2016-10-09 Thread Yuanhan Liu
On Thu, Sep 29, 2016 at 04:31:30PM -0400, Dey wrote:
>  /*
>   * dev_ops for virtio, bare necessities for basic operation
>   */
> @@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
>   .allmulticast_enable = virtio_dev_allmulticast_enable,
>   .allmulticast_disable= virtio_dev_allmulticast_disable,
> + .mtu_set = virtio_mtu_set,
>   .dev_infos_get   = virtio_dev_info_get,
>   .stats_get   = virtio_dev_stats_get,
>   .xstats_get  = virtio_dev_xstats_get,
> -- 
> 2.9.3.windows.1

Your patch is malformed: I got an error while trying to apply it.

patch:  malformed patch at line 167:   * dev_ops for virtio,
bare necessities for basic operation

Seems like the way you were generating the patch is wrong.

Anyway, I applied it manually, with the "- frame_size" fix as well
as few more minor coding style fixes.

So applied to dpdk-next-virtio.

--yliu


[dpdk-dev] [PATCH] net/virtio: add missing driver name

2016-10-09 Thread Yuanhan Liu
On Fri, Oct 07, 2016 at 06:57:41PM +0530, Shreyansh Jain wrote:
> On Friday 07 October 2016 06:33 PM, David Marchand wrote:
> >The driver name has been lost with the eal rework.
> >Restore it.
> >
> >Fixes: c830cb295411 ("drivers: use PCI registration macro")
> >
> >Signed-off-by: David Marchand 
> >---
> > drivers/net/virtio/virtio_ethdev.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> >diff --git a/drivers/net/virtio/virtio_ethdev.c 
> >b/drivers/net/virtio/virtio_ethdev.c
> >index b4dfc0a..809ebf7 100644
> >--- a/drivers/net/virtio/virtio_ethdev.c
> >+++ b/drivers/net/virtio/virtio_ethdev.c
> >@@ -1304,6 +1304,9 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
> >
> > static struct eth_driver rte_virtio_pmd = {
> > .pci_drv = {
> >+.driver = {
> >+.name = "net_virtio",
> >+},
> > .id_table = pci_id_virtio_map,
> > .drv_flags = RTE_PCI_DRV_DETACHABLE,
> > .probe = rte_eth_dev_pci_probe,
> >
> 
> Indeed. For MLX4/5 and Virtio, DRIVER_REGISTER_* is not being used which
> takes this 'name'. In this case, it was missed.
> 
> Acked-by: Shreyansh Jain 

Applied to dpdk-next-virtio.

Thanks.

--yliu


[dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support

2016-10-09 Thread Yuanhan Liu
This patch set enables vhost dequeue zero copy. The majority work goes
to patch 4: "vhost: add dequeue zero copy".

The basic idea of dequeue zero copy is, instead of copying data from the
desc buf, here we let the mbuf reference the desc buf addr directly.

The major issue behind that is how and when to update the used ring.
You could check the commit log of patch 4 for more details.

Patch 5 introduces a new flag, RTE_VHOST_USER_DEQUEUE_ZERO_COPY, to enable
dequeue zero copy, which is disabled by default.

The performance gain is quite impressive. For a simple dequeue workload
(running rxonly in vhost-pmd and runnin txonly in guest testpmd), it yields
50+% performance boost for packet size 1500B. For VM2VM iperf test case,
it's even better: about 70% boost.

For small packets, the performance is worse (it's expected, as the extra
overhead introduced by zero copy outweighs the benefits from saving few
bytes copy).

v3: - rebase: mainly for removing conflicts with the Tx indirect patch
- don't update last_used_idx twice for zero-copy mode
- handle two mssiing "Tx -> dequeue" renames in log and usage

v2: - renamed "tx zero copy" to "dequeue zero copy", to reduce confusions.
- hnadle the case that a desc buf might across 2 host phys pages
- use MAP_POPULATE to let kernel populate the page tables
- updated release note
- doc-ed the limitations for the vm2nic case
- merge 2 continuous guest phys memory region
- and few more trivial changes, please see them in the corresponding
  patches

---
Yuanhan Liu (7):
  vhost: simplify memory regions handling
  vhost: get guest/host physical address mappings
  vhost: introduce last avail idx for dequeue
  vhost: add dequeue zero copy
  vhost: add a flag to enable dequeue zero copy
  examples/vhost: add an option to enable dequeue zero copy
  net/vhost: add an option to enable dequeue zero copy

 doc/guides/prog_guide/vhost_lib.rst|  35 +++-
 doc/guides/rel_notes/release_16_11.rst |  13 ++
 drivers/net/vhost/rte_eth_vhost.c  |  13 ++
 examples/vhost/main.c  |  19 +-
 lib/librte_vhost/rte_virtio_net.h  |   1 +
 lib/librte_vhost/socket.c  |   5 +
 lib/librte_vhost/vhost.c   |  12 ++
 lib/librte_vhost/vhost.h   | 102 ---
 lib/librte_vhost/vhost_user.c  | 315 ++---
 lib/librte_vhost/virtio_net.c  | 196 +---
 10 files changed, 549 insertions(+), 162 deletions(-)

-- 
1.9.0



[dpdk-dev] [PATCH v3 1/7] vhost: simplify memory regions handling

2016-10-09 Thread Yuanhan Liu
Due to history reason (that vhost-cuse comes before vhost-user), some
fields for maintaining the vhost-user memory mappings (such as mmapped
address and size, with those we then can unmap on destroy) are kept in
"orig_region_map" struct, a structure that is defined only in vhost-user
source file.

The right way to go is to remove the structure and move all those fields
into virtio_memory_region struct. But we simply can't do that before,
because it breaks the ABI.

Now, thanks to the ABI refactoring, it's never been a blocking issue
any more. And here it goes: this patch removes orig_region_map and
redefines virtio_memory_region, to include all necessary info.

With that, we can simplify the guest/host address convert a bit.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost.h  |  49 ++--
 lib/librte_vhost/vhost_user.c | 173 +-
 2 files changed, 91 insertions(+), 131 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c2dfc3c..df2107b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -143,12 +143,14 @@ struct virtio_net {
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
  */
-struct virtio_memory_regions {
-   uint64_t guest_phys_address;
-   uint64_t guest_phys_address_end;
-   uint64_t memory_size;
-   uint64_t userspace_address;
-   uint64_t address_offset;
+struct virtio_memory_region {
+   uint64_t guest_phys_addr;
+   uint64_t guest_user_addr;
+   uint64_t host_user_addr;
+   uint64_t size;
+   void *mmap_addr;
+   uint64_t mmap_size;
+   int fd;
 };


@@ -156,12 +158,8 @@ struct virtio_memory_regions {
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-   /* Base QEMU userspace address of the memory file. */
-   uint64_t base_address;
-   uint64_t mapped_address;
-   uint64_t mapped_size;
uint32_t nregions;
-   struct virtio_memory_regions regions[0];
+   struct virtio_memory_region regions[0];
 };


@@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
 #define MAX_VHOST_DEVICE   1024
 extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];

-/**
- * Function to convert guest physical addresses to vhost virtual addresses.
- * This is used to convert guest virtio buffer addresses.
- */
+/* Convert guest physical Address to host virtual address */
 static inline uint64_t __attribute__((always_inline))
-gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
+gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
 {
-   struct virtio_memory_regions *region;
-   uint32_t regionidx;
-   uint64_t vhost_va = 0;
-
-   for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-   region = &dev->mem->regions[regionidx];
-   if ((guest_pa >= region->guest_phys_address) &&
-   (guest_pa <= region->guest_phys_address_end)) {
-   vhost_va = region->address_offset + guest_pa;
-   break;
+   struct virtio_memory_region *reg;
+   uint32_t i;
+
+   for (i = 0; i < dev->mem->nregions; i++) {
+   reg = &dev->mem->regions[i];
+   if (gpa >= reg->guest_phys_addr &&
+   gpa <  reg->guest_phys_addr + reg->size) {
+   return gpa - reg->guest_phys_addr +
+  reg->host_user_addr;
}
}
-   return vhost_va;
+
+   return 0;
 }

 struct virtio_net_device_ops const *notify_ops;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index eee99e9..49585b8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 };

-struct orig_region_map {
-   int fd;
-   uint64_t mapped_address;
-   uint64_t mapped_size;
-   uint64_t blksz;
-};
-
-#define orig_region(ptr, nregions) \
-   ((struct orig_region_map *)RTE_PTR_ADD((ptr), \
-   sizeof(struct virtio_memory) + \
-   sizeof(struct virtio_memory_regions) * (nregions)))
-
 static uint64_t
 get_blk_size(int fd)
 {
@@ -99,18 +87,17 @@ get_blk_size(int fd)
 static void
 free_mem_region(struct virtio_net *dev)
 {
-   struct orig_region_map *region;
-   unsigned int idx;
+   uint32_t i;
+   struct virtio_memory_region *reg;

if (!dev || !dev->mem)
return;

-   region = orig_region(dev->mem, dev->mem->nregions);
-   for (idx = 0; idx < dev->mem->nregions; idx++) {
-   if (region[idx].mapped_address) {
-   munmap((void *)(uintptr_t)region[idx].mapped_address,
-   region[idx].mapped_size);
-   close(r

[dpdk-dev] [PATCH v3 2/7] vhost: get guest/host physical address mappings

2016-10-09 Thread Yuanhan Liu
So that we can convert a guest physical address to host physical
address, which will be used in later Tx zero copy implementation.

MAP_POPULATE is set while mmaping guest memory regions, to make
sure the page tables are setup and then rte_mem_virt2phy() could
yield proper physical address.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---

v2: - use MAP_POPULATE option to make sure the page table will
  be already setup while getting the phys address

- do a simple merge if the last 2 pages are continuous

- dump guest pages only in debug mode
---
 lib/librte_vhost/vhost.h  |  30 +
 lib/librte_vhost/vhost_user.c | 100 +-
 2 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index df2107b..2d52987 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -114,6 +114,12 @@ struct vhost_virtqueue {
  #define VIRTIO_F_VERSION_1 32
 #endif

+struct guest_page {
+   uint64_t guest_phys_addr;
+   uint64_t host_phys_addr;
+   uint64_t size;
+};
+
 /**
  * Device structure contains all configuration information relating
  * to the device.
@@ -137,6 +143,10 @@ struct virtio_net {
uint64_tlog_addr;
struct ether_addr   mac;

+   uint32_tnr_guest_pages;
+   uint32_tmax_guest_pages;
+   struct guest_page   *guest_pages;
+
 } __rte_cache_aligned;

 /**
@@ -217,6 +227,26 @@ gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
return 0;
 }

+/* Convert guest physical address to host physical address */
+static inline phys_addr_t __attribute__((always_inline))
+gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
+{
+   uint32_t i;
+   struct guest_page *page;
+
+   for (i = 0; i < dev->nr_guest_pages; i++) {
+   page = &dev->guest_pages[i];
+
+   if (gpa >= page->guest_phys_addr &&
+   gpa + size < page->guest_phys_addr + page->size) {
+   return gpa - page->guest_phys_addr +
+  page->host_phys_addr;
+   }
+   }
+
+   return 0;
+}
+
 struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(int vid);

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 49585b8..e651912 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -372,6 +372,91 @@ vhost_user_set_vring_base(struct virtio_net *dev,
return 0;
 }

+static void
+add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr,
+  uint64_t host_phys_addr, uint64_t size)
+{
+   struct guest_page *page, *last_page;
+
+   if (dev->nr_guest_pages == dev->max_guest_pages) {
+   dev->max_guest_pages *= 2;
+   dev->guest_pages = realloc(dev->guest_pages,
+   dev->max_guest_pages * sizeof(*page));
+   }
+
+   if (dev->nr_guest_pages > 0) {
+   last_page = &dev->guest_pages[dev->nr_guest_pages - 1];
+   /* merge if the two pages are continuous */
+   if (host_phys_addr == last_page->host_phys_addr +
+ last_page->size) {
+   last_page->size += size;
+   return;
+   }
+   }
+
+   page = &dev->guest_pages[dev->nr_guest_pages++];
+   page->guest_phys_addr = guest_phys_addr;
+   page->host_phys_addr  = host_phys_addr;
+   page->size = size;
+}
+
+static void
+add_guest_pages(struct virtio_net *dev, struct virtio_memory_region *reg,
+   uint64_t page_size)
+{
+   uint64_t reg_size = reg->size;
+   uint64_t host_user_addr  = reg->host_user_addr;
+   uint64_t guest_phys_addr = reg->guest_phys_addr;
+   uint64_t host_phys_addr;
+   uint64_t size;
+
+   host_phys_addr = rte_mem_virt2phy((void *)(uintptr_t)host_user_addr);
+   size = page_size - (guest_phys_addr & (page_size - 1));
+   size = RTE_MIN(size, reg_size);
+
+   add_one_guest_page(dev, guest_phys_addr, host_phys_addr, size);
+   host_user_addr  += size;
+   guest_phys_addr += size;
+   reg_size -= size;
+
+   while (reg_size > 0) {
+   host_phys_addr = rte_mem_virt2phy((void *)(uintptr_t)
+ host_user_addr);
+   add_one_guest_page(dev, guest_phys_addr, host_phys_addr,
+  page_size);
+
+   host_user_addr  += page_size;
+   guest_phys_addr += page_size;
+   reg_size -= page_size;
+   }
+}
+
+#ifdef RTE_LIBRTE_VHOST_DEBUG
+/* TODO: enable it only in debug mode? */
+static void
+dump_guest_pages(struct virtio_net *dev)
+{
+   uint32_t i;
+   struct guest_page *page;
+
+   for (i = 0; i < dev->nr_guest_pages; i++) {

[dpdk-dev] [PATCH v3 3/7] vhost: introduce last avail idx for dequeue

2016-10-09 Thread Yuanhan Liu
So far, we retrieve both the used ring and avail ring idx by the var
last_used_idx; it won't be a problem because the used ring is updated
immediately after those avail entries are consumed.

But that's not true when dequeue zero copy is enabled, that used ring is
updated only when the mbuf is consumed. Thus, we need use another var to
note the last avail ring idx we have consumed.

Therefore, last_avail_idx is introduced.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost.h  |  2 +-
 lib/librte_vhost/vhost_user.c |  6 --
 lib/librte_vhost/virtio_net.c | 19 +++
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 2d52987..8565fa1 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -70,7 +70,7 @@ struct vhost_virtqueue {
struct vring_used   *used;
uint32_tsize;

-   /* Last index used on the available ring */
+   uint16_tlast_avail_idx;
volatile uint16_t   last_used_idx;
 #define VIRTIO_INVALID_EVENTFD (-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD   (-2)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index e651912..a92377a 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -343,7 +343,8 @@ vhost_user_set_vring_addr(struct virtio_net *dev, struct 
vhost_vring_addr *addr)
"last_used_idx (%u) and vq->used->idx (%u) mismatches; "
"some packets maybe resent for Tx and dropped for Rx\n",
vq->last_used_idx, vq->used->idx);
-   vq->last_used_idx = vq->used->idx;
+   vq->last_used_idx  = vq->used->idx;
+   vq->last_avail_idx = vq->used->idx;
}

vq->log_guest_addr = addr->log_guest_addr;
@@ -367,7 +368,8 @@ static int
 vhost_user_set_vring_base(struct virtio_net *dev,
  struct vhost_vring_state *state)
 {
-   dev->virtqueue[state->index]->last_used_idx = state->num;
+   dev->virtqueue[state->index]->last_used_idx  = state->num;
+   dev->virtqueue[state->index]->last_avail_idx = state->num;

return 0;
 }
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index a59c39b..70301a5 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -851,16 +851,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
}
}

-   avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
-   free_entries = avail_idx - vq->last_used_idx;
+   free_entries = *((volatile uint16_t *)&vq->avail->idx) -
+   vq->last_avail_idx;
if (free_entries == 0)
goto out;

LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);

-   /* Prefetch available ring to retrieve head indexes. */
-   used_idx = vq->last_used_idx & (vq->size - 1);
-   rte_prefetch0(&vq->avail->ring[used_idx]);
+   /* Prefetch available and used ring */
+   avail_idx = vq->last_avail_idx & (vq->size - 1);
+   used_idx  = vq->last_used_idx  & (vq->size - 1);
+   rte_prefetch0(&vq->avail->ring[avail_idx]);
rte_prefetch0(&vq->used->ring[used_idx]);

count = RTE_MIN(count, MAX_PKT_BURST);
@@ -870,8 +871,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,

/* Retrieve all of the head indexes first to avoid caching issues. */
for (i = 0; i < count; i++) {
-   used_idx = (vq->last_used_idx + i) & (vq->size - 1);
-   desc_indexes[i] = vq->avail->ring[used_idx];
+   avail_idx = (vq->last_avail_idx + i) & (vq->size - 1);
+   used_idx  = (vq->last_used_idx  + i) & (vq->size - 1);
+   desc_indexes[i] = vq->avail->ring[avail_idx];

vq->used->ring[used_idx].id  = desc_indexes[i];
vq->used->ring[used_idx].len = 0;
@@ -921,7 +923,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
rte_smp_wmb();
rte_smp_rmb();
vq->used->idx += i;
-   vq->last_used_idx += i;
+   vq->last_avail_idx += i;
+   vq->last_used_idx  += i;
vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
sizeof(vq->used->idx));

-- 
1.9.0



[dpdk-dev] [PATCH v3 4/7] vhost: add dequeue zero copy

2016-10-09 Thread Yuanhan Liu
The basic idea of dequeue zero copy is, instead of copying data from
the desc buf, here we let the mbuf reference the desc buf addr directly.

Doing so, however, has one major issue: we can't update the used ring
at the end of rte_vhost_dequeue_burst. Because we don't do the copy
here, an update of the used ring would let the driver to reclaim the
desc buf. As a result, DPDK might reference a stale memory region.

To update the used ring properly, this patch does several tricks:

- when mbuf references a desc buf, refcnt is added by 1.

  This is to pin lock the mbuf, so that a mbuf free from the DPDK
  won't actually free it, instead, refcnt is subtracted by 1.

- We chain all those mbuf together (by tailq)

  And we check it every time on the rte_vhost_dequeue_burst entrance,
  to see if the mbuf is freed (when refcnt equals to 1). If that
  happens, it means we are the last user of this mbuf and we are
  safe to update the used ring.

- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
  right desc idx.

Dequeue zero copy is introduced for performance reason, and some rough
tests show about 50% perfomance boost for packet size 1500B. For small
packets, (e.g. 64B), it actually slows a bit down (well, it could up to
15%). That is expected because this patch introduces some extra works,
and it outweighs the benefit from saving few bytes copy.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---

v3: - rebase on top of the indirect enabling patch

- don't update last_used_idx twice for zero-copy

- handle the missing "tx -> dequeue" rename in log message

v2: - rename "Tx zero copy" to "dequeue zero copy" for reducing confusions

- use unlikely/likely for dequeue_zero_copy check, as it's not enabled
  by default, as well as it has some limitations in vm2nic case.

- handle the case that a desc buf might across 2 host phys pages

- reset nr_zmbuf to 0 at set_vring_num

- set the zmbuf_size to vq->size, but not the double of it.
---
 lib/librte_vhost/vhost.c  |   2 +
 lib/librte_vhost/vhost.h  |  22 +-
 lib/librte_vhost/vhost_user.c |  42 +-
 lib/librte_vhost/virtio_net.c | 179 +-
 4 files changed, 224 insertions(+), 21 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 30bb0ce..dbf5d1b 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -142,6 +142,8 @@ init_vring_queue(struct vhost_virtqueue *vq, int qp_idx)
/* always set the default vq pair to enabled */
if (qp_idx == 0)
vq->enabled = 1;
+
+   TAILQ_INIT(&vq->zmbuf_list);
 }

 static void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 8565fa1..be8a398 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -61,6 +62,19 @@ struct buf_vector {
uint32_t desc_idx;
 };

+/*
+ * A structure to hold some fields needed in zero copy code path,
+ * mainly for associating an mbuf with the right desc_idx.
+ */
+struct zcopy_mbuf {
+   struct rte_mbuf *mbuf;
+   uint32_t desc_idx;
+   uint16_t in_use;
+
+   TAILQ_ENTRY(zcopy_mbuf) next;
+};
+TAILQ_HEAD(zcopy_mbuf_list, zcopy_mbuf);
+
 /**
  * Structure contains variables relevant to RX/TX virtqueues.
  */
@@ -85,6 +99,12 @@ struct vhost_virtqueue {

/* Physical address of used ring, for logging */
uint64_tlog_guest_addr;
+
+   uint16_tnr_zmbuf;
+   uint16_tzmbuf_size;
+   uint16_tlast_zmbuf_idx;
+   struct zcopy_mbuf   *zmbufs;
+   struct zcopy_mbuf_list  zmbuf_list;
 } __rte_cache_aligned;

 /* Old kernels have no such macro defined */
@@ -135,6 +155,7 @@ struct virtio_net {
/* to tell if we need broadcast rarp packet */
rte_atomic16_t  broadcast_rarp;
uint32_tvirt_qp_nb;
+   int dequeue_zero_copy;
struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
charifname[IF_NAME_SZ];
@@ -146,7 +167,6 @@ struct virtio_net {
uint32_tnr_guest_pages;
uint32_tmax_guest_pages;
struct guest_page   *guest_pages;
-
 } __rte_cache_aligned;

 /**
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a92377a..3074227 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -180,7 +180,23 @@ static int
 vhost_user_set_vring_num(struct virtio_net *dev,
 struct vhost_vring_state *state)
 {
-   dev->virtqueue[state->index]->size = state->num;
+   struct vhost_virtqueue *vq = dev->virtqueue[state->index];
+
+   vq->size = state->num;
+
+   if (dev->dequeue_zero_copy) {

[dpdk-dev] [PATCH v3 5/7] vhost: add a flag to enable dequeue zero copy

2016-10-09 Thread Yuanhan Liu
Dequeue zero copy is disabled by default. Here add a new flag
``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` to explictily enable it.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---

v2: - update release log
- doc dequeue zero copy in detail
---
 doc/guides/prog_guide/vhost_lib.rst| 35 +-
 doc/guides/rel_notes/release_16_11.rst | 13 +
 lib/librte_vhost/rte_virtio_net.h  |  1 +
 lib/librte_vhost/socket.c  |  5 +
 lib/librte_vhost/vhost.c   | 10 ++
 lib/librte_vhost/vhost.h   |  1 +
 6 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index 6b0c6b2..3fa9dd7 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -79,7 +79,7 @@ The following is an overview of the Vhost API functions:
   ``/dev/path`` character device file will be created. For vhost-user server
   mode, a Unix domain socket file ``path`` will be created.

-  Currently two flags are supported (these are valid for vhost-user only):
+  Currently supported flags are (these are valid for vhost-user only):

   - ``RTE_VHOST_USER_CLIENT``

@@ -97,6 +97,39 @@ The following is an overview of the Vhost API functions:
 This reconnect option is enabled by default. However, it can be turned off
 by setting this flag.

+  - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY``
+
+Dequeue zero copy will be enabled when this flag is set. It is disabled by
+default.
+
+There are some truths (including limitations) you might want to know while
+setting this flag:
+
+* zero copy is not good for small packets (typically for packet size below
+  512).
+
+* zero copy is really good for VM2VM case. For iperf between two VMs, the
+  boost could be above 70% (when TSO is enableld).
+
+* for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if 
virtio
+  indirect feature is not enabled and <= 128 if it is enabled.
+
+  The is because when dequeue zero copy is enabled, guest Tx used vring 
will
+  be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc
+  has to be small enough so that the PMD driver will run out of available
+  Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be
+  starved.
+
+* Guest memory should be backended with huge pages to achieve better
+  performance. Using 1G page size is the best.
+
+  When dequeue zero copy is enabled, the guest phys address and host phys
+  address mapping has to be established. Using non-huge pages means far
+  more page segments. To make it simple, DPDK vhost does a linear search
+  of those segments, thus the fewer the segments, the quicker we will get
+  the mapping. NOTE: we may speed it by using radix tree searching in
+  future.
+
 * ``rte_vhost_driver_session_start()``

   This function starts the vhost session loop to handle vhost messages. It
diff --git a/doc/guides/rel_notes/release_16_11.rst 
b/doc/guides/rel_notes/release_16_11.rst
index 905186a..2180d8d 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -36,6 +36,19 @@ New Features

  This section is a comment. Make sure to start the actual text at the 
margin.

+* **Added vhost-user dequeue zero copy support**
+
+  The copy in dequeue path is saved, which is meant to improve the performance.
+  In the VM2VM case, the boost is quite impressive. The bigger the packet size,
+  the bigger performance boost you may get. However, for VM2NIC case, there
+  are some limitations, yet the boost is not that impressive as VM2VM case.
+  It may even drop quite a bit for small packets.
+
+  For such reason, this feature is disabled by default. It can be enabled when
+  ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` flag is given. Check the vhost section
+  at programming guide for more information.
+
+
 * **Added vhost-user indirect descriptors support.**

   If indirect descriptor feature is negotiated, each packet sent by the guest
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index a88aecd..c53ff64 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -53,6 +53,7 @@

 #define RTE_VHOST_USER_CLIENT  (1ULL << 0)
 #define RTE_VHOST_USER_NO_RECONNECT(1ULL << 1)
+#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY   (1ULL << 2)

 /* Enum for virtqueue management. */
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index bf03f84..967cb65 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -62,6 +62,7 @@ struct vhost_user_socket {
int connfd;
bool is_server;
bool reconnect;
+   bool dequeue_zero_copy;
 };

 struct vhost_user_connection {
@@ -203,6 +204,9 @@ vhost_user_add_connection(int f

[dpdk-dev] [PATCH v3 6/7] examples/vhost: add an option to enable dequeue zero copy

2016-10-09 Thread Yuanhan Liu
Add an option, --dequeue-zero-copy, to enable dequeue zero copy.

One thing worth noting while using dequeue zero copy is the nb_tx_desc
has to be small enough so that the eth driver will hit the mbuf free
threshold easily and thus free mbuf more frequently.

The reason behind that is, when dequeue zero copy is enabled, guest Tx
used vring will be updated only when corresponding mbuf is freed. If mbuf
is not freed frequently, the guest Tx vring could be starved.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---

v3: - handle missing "Tx --> dequeue" renaming in usage
---
 examples/vhost/main.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 195b1db..91000e8 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -127,6 +127,7 @@ static uint32_t enable_tx_csum;
 static uint32_t enable_tso;

 static int client_mode;
+static int dequeue_zero_copy;

 /* Specify timeout (in useconds) between retries on RX. */
 static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
@@ -294,6 +295,17 @@ port_init(uint8_t port)

rx_ring_size = RTE_TEST_RX_DESC_DEFAULT;
tx_ring_size = RTE_TEST_TX_DESC_DEFAULT;
+
+   /*
+* When dequeue zero copy is enabled, guest Tx used vring will be
+* updated only when corresponding mbuf is freed. Thus, the nb_tx_desc
+* (tx_ring_size here) must be small enough so that the driver will
+* hit the free threshold easily and free mbufs timely. Otherwise,
+* guest Tx vring would be starved.
+*/
+   if (dequeue_zero_copy)
+   tx_ring_size = 64;
+
tx_rings = (uint16_t)rte_lcore_count();

retval = validate_num_devices(MAX_DEVICES);
@@ -470,7 +482,8 @@ us_vhost_usage(const char *prgname)
"   --socket-file: The path of the socket file.\n"
"   --tx-csum [0|1] disable/enable TX checksum offload.\n"
"   --tso [0|1] disable/enable TCP segment offload.\n"
-   "   --client register a vhost-user socket as client 
mode.\n",
+   "   --client register a vhost-user socket as client mode.\n"
+   "   --dequeue-zero-copy enables dequeue zero copy\n",
   prgname);
 }

@@ -495,6 +508,7 @@ us_vhost_parse_args(int argc, char **argv)
{"tx-csum", required_argument, NULL, 0},
{"tso", required_argument, NULL, 0},
{"client", no_argument, &client_mode, 1},
+   {"dequeue-zero-copy", no_argument, &dequeue_zero_copy, 1},
{NULL, 0, 0, 0},
};

@@ -1501,6 +1515,9 @@ main(int argc, char *argv[])
if (client_mode)
flags |= RTE_VHOST_USER_CLIENT;

+   if (dequeue_zero_copy)
+   flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+
/* Register vhost user driver to handle vhost messages. */
for (i = 0; i < nb_sockets; i++) {
ret = rte_vhost_driver_register
-- 
1.9.0



[dpdk-dev] [PATCH v3 7/7] net/vhost: add an option to enable dequeue zero copy

2016-10-09 Thread Yuanhan Liu
Add an option, dequeue-zero-copy, to enable this feature in vhost-pmd.

Signed-off-by: Yuanhan Liu 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/vhost/rte_eth_vhost.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index c1d09a0..86c7a4d 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -51,6 +51,7 @@
 #define ETH_VHOST_IFACE_ARG"iface"
 #define ETH_VHOST_QUEUES_ARG   "queues"
 #define ETH_VHOST_CLIENT_ARG   "client"
+#define ETH_VHOST_DEQUEUE_ZERO_COPY"dequeue-zero-copy"

 static const char *drivername = "VHOST PMD";

@@ -58,6 +59,7 @@ static const char *valid_arguments[] = {
ETH_VHOST_IFACE_ARG,
ETH_VHOST_QUEUES_ARG,
ETH_VHOST_CLIENT_ARG,
+   ETH_VHOST_DEQUEUE_ZERO_COPY,
NULL
 };

@@ -1132,6 +1134,7 @@ rte_pmd_vhost_probe(const char *name, const char *params)
uint16_t queues;
uint64_t flags = 0;
int client_mode = 0;
+   int dequeue_zero_copy = 0;

RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);

@@ -1168,6 +1171,16 @@ rte_pmd_vhost_probe(const char *name, const char *params)
flags |= RTE_VHOST_USER_CLIENT;
}

+   if (rte_kvargs_count(kvlist, ETH_VHOST_DEQUEUE_ZERO_COPY) == 1) {
+   ret = rte_kvargs_process(kvlist, ETH_VHOST_DEQUEUE_ZERO_COPY,
+&open_int, &dequeue_zero_copy);
+   if (ret < 0)
+   goto out_free;
+
+   if (dequeue_zero_copy)
+   flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+   }
+
eth_dev_vhost_create(name, iface_name, queues, rte_socket_id(), flags);

 out_free:
-- 
1.9.0



[dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK

2016-10-09 Thread Jerin Jacob
On Fri, Oct 07, 2016 at 10:40:03AM +, Hemant Agrawal wrote:
> Hi Jerin/Narender,

Hi Hemant,

Thanks for the review.

> 
>   Thanks for the proposal and discussions. 

> 
>   I agree with many of the comment made by Narender.  Here are some 
> additional comments.
> 
> 1. rte_event_schedule - should support option for bulk dequeue. The size of 
> bulk should be a property of device, how much depth it can support.

OK. Will fix it in v2.

> 
> 2. The event schedule should also support the option to specify the amount of 
> time, it can wait. The implementation may only support global 
> setting(dequeue_wait_ns) for wait time. They can take any non-zero wait value 
> as to implement wait.  

OK. Will fix it in v2.

> 
> 3. rte_event_schedule_from_group - there should be one model.  Both Push and 
> Pull may not work well together. At least the simultaneous mixed config will 
> not work on NXP hardware scheduler. 

OK. Will remove Cavium specific "rte_event_schedule_from_group" API in v2.

> 
> 4. Priority of queues within the scheduling group?  - Please keep in mind 
> that some hardware supports intra scheduler priority and some only support 
> intra flow_queue priority within a scheduler instance. The events of same 
> flow id should have same priority.

Will try to address some solution based on capability.

> 
> 5. w.r.t flow_queue numbers in log2, I will prefer to have absolute number. 
> Not all system may have large number of queues. So the design should keep in 
> account the system will fewer number of queues.

OK. Will fix it in v2.

> 
> Regards,
> Hemant
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Wednesday, October 05, 2016 12:55 PM
> > On Tue, Oct 04, 2016 at 09:49:52PM +, Vangati, Narender wrote:
> > > Hi Jerin,
> > 
> > Hi Narender,
> > 
> > Thanks for the comments.I agree with proposed changes; I will address these
> > comments in v2.
> > 
> > /Jerin
> > 
> > 
> > >
> > >
> > >
> > > Here are some comments on the libeventdev RFC.
> > >
> > > These are collated thoughts after discussions with you & others to 
> > > understand
> > the concepts and rationale for the current proposal.
> > >
> > >
> > >
> > > 1. Concept of flow queues. This is better abstracted as flow ids and not 
> > > as flow
> > queues which implies there is a queueing structure per flow. A s/w
> > implementation can do atomic load balancing on multiple flow ids more
> > efficiently than maintaining each event in a specific flow queue.
> > >
> > >
> > >
> > > 2. Scheduling group. A scheduling group is more a steam of events, so an 
> > > event
> > queue might be a better abstraction.
> > >
> > >
> > >
> > > 3. An event queue should support the concept of max active atomic flows
> > (maximum number of active flows this queue can track at any given time) and
> > max active ordered sequences (maximum number of outstanding events waiting
> > to be egress reordered by this queue). This allows a scheduler 
> > implementation to
> > dimension/partition its resources among event queues.
> > >
> > >
> > >
> > > 4. An event queue should support concept of a single consumer. In an
> > application, a stream of events may need to be brought together to a single
> > core for some stages of processing, e.g. for TX at the end of the pipeline 
> > to
> > avoid NIC reordering of the packets. Having a 'single consumer' event queue 
> > for
> > that stage allows the intensive scheduling logic to be short circuited and 
> > can
> > improve throughput for s/w implementations.
> > >
> > >
> > >
> > > 5. Instead of tying eventdev access to an lcore, a higher level of 
> > > abstraction
> > called event port is needed which is the application i/f to the eventdev. 
> > Event
> > ports are connected to event queues and is the object the application uses 
> > to
> > dequeue and enqueue events. There can be more than one event port per lcore
> > allowing multiple lightweight threads to have their own i/f into eventdev, 
> > if the
> > implementation supports it. An event port abstraction also encapsulates
> > dequeue depth and enqueue depth for a scheduler implementations which can
> > schedule multiple events at a time and output events that can be buffered.
> > >
> > >
> > >
> > > 6. An event should support priority. Per event priority is useful for 
> > > segregating
> > high priority (control messages) traffic from low priority within the same 
> > flow.
> > This needs to be part of the event definition for implementations which 
> > support
> > it.
> > >
> > >
> > >
> > > 7. Event port to event queue servicing priority. This allows two event 
> > > ports to
> > connect to the same event queue with different priorities. For 
> > implementations
> > which support it, this allows a worker core to participate in two different
> > workflows with different priorities (workflow 1 needing 3.5 cores, workflow 
> > 2
> > needing 2.5 cores, and so on).
> > >
> > >
> > >
> >

[dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support

2016-10-09 Thread linhaifeng
? 2016/8/23 16:10, Yuanhan Liu ??:
> The basic idea of Tx zero copy is, instead of copying data from the
> desc buf, here we let the mbuf reference the desc buf addr directly.

Is there problem when push vlan to the mbuf which reference the desc buf addr 
directly?
We know if guest use virtio_net(kernel) maybe skb has no headroom.



[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-09 Thread Wang, Zhihong


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> Sent: Wednesday, September 28, 2016 12:45 AM
> To: Yuanhan Liu ; Jianbo Liu
> 
> Cc: Maxime Coquelin ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Tuesday, September 27, 2016 6:21 PM
> > To: Jianbo Liu 
> > Cc: Wang, Zhihong ; Maxime Coquelin
> > ; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> >
> > On Thu, Sep 22, 2016 at 05:01:41PM +0800, Jianbo Liu wrote:
> > > On 22 September 2016 at 14:58, Wang, Zhihong
> 
> > wrote:
> > > >
> > > >
> > > >> -Original Message-
> > > >> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> > > >> Sent: Thursday, September 22, 2016 1:48 PM
> > > >> To: Yuanhan Liu 
> > > >> Cc: Wang, Zhihong ; Maxime Coquelin
> > > >> ; dev at dpdk.org
> > > >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> > > >>
> > > >> On 22 September 2016 at 10:29, Yuanhan Liu
> 
> > > >> wrote:
> > > >> > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote:
> > > >> >> >> > My setup consists of one host running a guest.
> > > >> >> >> > The guest generates as much 64bytes packets as possible
> using
> > > >> >> >>
> > > >> >> >> Have you tested with other different packet size?
> > > >> >> >> My testing shows that performance is dropping when packet
> size is
> > > >> more
> > > >> >> >> than 256.
> > > >> >> >
> > > >> >> >
> > > >> >> > Hi Jianbo,
> > > >> >> >
> > > >> >> > Thanks for reporting this.
> > > >> >> >
> > > >> >> >  1. Are you running the vector frontend with mrg_rxbuf=off?
> > > >> >> >
> > > >> Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD.
> > > >>
> > > >> >> >  2. Could you please specify what CPU you're running? Is it
> Haswell
> > > >> >> > or Ivy Bridge?
> > > >> >> >
> > > >> It's an ARM server.
> > > >>
> > > >> >> >  3. How many percentage of drop are you seeing?
> > > >> The testing result:
> > > >> size (bytes) improvement (%)
> > > >> 64   3.92
> > > >> 128 11.51
> > > >> 256  24.16
> > > >> 512  -13.79
> > > >> 1024-22.51
> > > >> 1500-12.22
> > > >> A correction is that performance is dropping if byte size is larger 
> > > >> than
> 512.
> > > >
> > > >
> > > > Jianbo,
> > > >
> > > > Could you please verify does this patch really cause enqueue perf to
> drop?
> > > >
> > > > You can test the enqueue path only by set guest to do rxonly, and
> compare
> > > > the mpps by show port stats all in the guest.
> > > >
> > > >
> > > Tested with testpmd, host: txonly, guest: rxonly
> > > size (bytes) improvement (%)
> > > 644.12
> > > 128   6
> > > 256   2.65
> > > 512   -1.12
> > > 1024 -7.02
> >
> > There is a difference between Zhihong's code and the old I spotted in
> > the first time: Zhihong removed the avail_idx prefetch. I understand
> > the prefetch becomes a bit tricky when mrg-rx code path is considered;
> > thus, I didn't comment on that.
> >
> > That's one of the difference that, IMO, could drop a regression. I then
> > finally got a chance to add it back.
> >
> > A rough test shows it improves the performance of 1400B packet size
> greatly
> > in the "txonly in host and rxonly in guest" case: +33% is the number I get
> > with my test server (Ivybridge).
> 
> Thanks Yuanhan! I'll validate this on x86.

Hi Yuanhan,

Seems your code doesn't perform correctly. I write a new version
of avail idx prefetch but didn't see any perf benefit.

To be honest I doubt the benefit of this idea. The previous mrg_off
code has this method but doesn't give any benefits.

Even if this is useful, the benefits should be more significant for
small packets, it's unlikely this simple idx prefetch could bring
over 30% perf gain for large packets like 1400B ones.

But if you really do work it out like that I'll be very glad to see.

Thanks
Zhihong

> 
> >
> > I guess this might/would help your case as well. Mind to have a test
> > and tell me the results?
> >
> > BTW, I made it in rush; I haven't tested the mrg-rx code path yet.
> >
> > Thanks.
> >
> > --yliu


[dpdk-dev] [PATCH v2] drivers: prefix driver REGISTER macro with RTE PMD

2016-10-09 Thread Shreyansh Jain
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Sunday, October 09, 2016 1:52 AM
> To: Shreyansh Jain 
> Cc: david.marchand at 6wind.com; dev at dpdk.org; nhorman at tuxdriver.com
> Subject: Re: [PATCH v2] drivers: prefix driver REGISTER macro with RTE PMD
> 
> 2016-10-08 23:35, Shreyansh Jain:
> > --- a/mk/internal/rte.compile-pre.mk
> > +++ b/mk/internal/rte.compile-pre.mk
> > @@ -87,7 +87,8 @@ endif
> >  PMDINFO_GEN = $(RTE_SDK_BIN)/app/dpdk-pmdinfogen $@ $@.pmd.c
> >  PMDINFO_CC = $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@.pmd.o $@.pmd.c
> >  PMDINFO_LD = $(CROSS)ld $(LDFLAGS) -r -o $@.o $@.pmd.o $@
> > -PMDINFO_TO_O = if grep -q 'DRIVER_REGISTER_.*(.*)' $<; then \
> > +PMDINFO_TO_O = if grep -E 'RTE_PMD_REGISTER_PCI\([0-9a-zA-Z,_\.
> ]+\)|RTE_PMD_REGISTER_VDEV\([0-9a-zA-Z,_\. ]+\)' $<;\
> > +   then \
> 
> I don't understand why you don't simply grep 'RTE_PMD_REGISTER_.*(' ?

Because I want to make sure that the grep matches only the DRIVER registration 
functions.
In case a new macro (or driver type) is added in future, this macro can be 
updated. This way we can reduce the probability of a faulty match.

Is there a problem with closest possible match?

-
Shreyansh


[dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support

2016-10-09 Thread Yuanhan Liu
On Mon, Aug 29, 2016 at 08:32:55AM +, Xu, Qian Q wrote:
> I just ran a PVP test, nic receive packets then forwards to vhost PMD, and 
> virtio user interface. I didn't see any performance gains in this scenario. 
> All packet size from 64B to 1518B 
> performance haven't got benefit from this patchset, and in fact, the 
> performance dropped a lot before 1280B, and similar at 1518B. 

40G nic?

> The TX/RX desc setting is " txd=64, rxd=128"

Try it with "txd=128", you should be able to set that value since the
vhost Tx indirect patch is merged.

--yliu

> for TX-zero-copy enabled case. For TX-zero-copy disabled case, I just ran 
> default testpmd(txd=512, rxd=128) without the patch. 
> Could you help check if NIC2VM case? 
> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Tuesday, August 23, 2016 4:11 PM
> To: dev at dpdk.org
> Cc: Maxime Coquelin ; Yuanhan Liu  at linux.intel.com>
> Subject: [dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support
> 
> This patch set enables vhost Tx zero copy. The majority work goes to patch 4: 
> vhost: add Tx zero copy.
> 
> The basic idea of Tx zero copy is, instead of copying data from the desc buf, 
> here we let the mbuf reference the desc buf addr directly.
> 
> The major issue behind that is how and when to update the used ring.
> You could check the commit log of patch 4 for more details.
> 
> Patch 5 introduces a new flag, RTE_VHOST_USER_TX_ZERO_COPY, to enable Tx zero 
> copy, which is disabled by default.
> 
> Few more TODOs are left, including handling a desc buf that is across two 
> physical pages, updating release note, etc. Those will be fixed in later 
> version. For now, here is a simple one that hopefully it shows the idea 
> clearly.
> 
> I did some quick tests, the performance gain is quite impressive.
> 
> For a simple dequeue workload (running rxonly in vhost-pmd and runnin txonly 
> in guest testpmd), it yields 40+% performance boost for packet size 1400B.
> 
> For VM2VM iperf test case, it's even better: about 70% boost.
> 
> ---
> Yuanhan Liu (6):
>   vhost: simplify memory regions handling
>   vhost: get guest/host physical address mappings
>   vhost: introduce last avail idx for Tx
>   vhost: add Tx zero copy
>   vhost: add a flag to enable Tx zero copy
>   examples/vhost: add an option to enable Tx zero copy
> 
>  doc/guides/prog_guide/vhost_lib.rst |   7 +-
>  examples/vhost/main.c   |  19 ++-
>  lib/librte_vhost/rte_virtio_net.h   |   1 +
>  lib/librte_vhost/socket.c   |   5 +
>  lib/librte_vhost/vhost.c|  12 ++
>  lib/librte_vhost/vhost.h| 103 +
>  lib/librte_vhost/vhost_user.c   | 297 
> +++-
>  lib/librte_vhost/virtio_net.c   | 188 +++
>  8 files changed, 472 insertions(+), 160 deletions(-)
> 
> --
> 1.9.0


[dpdk-dev] [PATCH v2] drivers: prefix driver REGISTER macro with RTE PMD

2016-10-09 Thread Thomas Monjalon
2016-10-09 15:12, Shreyansh Jain:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-10-08 23:35, Shreyansh Jain:
> > > +PMDINFO_TO_O = if grep -E 'RTE_PMD_REGISTER_PCI\([0-9a-zA-Z,_\.
> > ]+\)|RTE_PMD_REGISTER_VDEV\([0-9a-zA-Z,_\. ]+\)' $<;\
> > > +   then \
> > 
> > I don't understand why you don't simply grep 'RTE_PMD_REGISTER_.*(' ?
> 
> Because I want to make sure that the grep matches only the DRIVER 
> registration functions.
> In case a new macro (or driver type) is added in future, this macro can be 
> updated. This way we can reduce the probability of a faulty match.
> 
> Is there a problem with closest possible match?

It is just long and useless. A macro starting with RTE_PMD_REGISTER_ must
be called from a PMD. What else?



[dpdk-dev] [PATCH v4 0/3] Add TSO on tunneling packet

2016-10-09 Thread Thomas Monjalon
2016-09-27 17:29, Ananyev, Konstantin:
> > > From: Tan, Jianfeng
> > > > Patch 1: mbuf: add Tx side tunneling type Patch 2: net/i40e: add TSO
> > > > support on tunneling packet Patch 3: app/testpmd: fix Tx offload on
> > > > tunneling packet
> 
> Acked-by: Konstantin Ananyev 
> I think you need to rebase your first one: mbuf: add Tx side tunneling type
> against the mainline.
> Also 3-rd one is v5 actually.

Applied (with 3/3 v5), thanks