[dpdk-dev] [PATCH v2 2/2] examples: new example: l2fwd-ethtool

2015-11-17 Thread Wang, Liang-min
Thomas,
Could you explain why this patch is put on RFC?

Thanks,
Larry

> -Original Message-
> From: Wang, Liang-min
> Sent: Wednesday, October 21, 2015 12:47 PM
> To: 'Thomas Monjalon'
> Cc: dev at dpdk.org; Andrew Harvey (agh) (agh at cisco.com)
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] examples: new example: l2fwd-ethtool
> 
> Thomas,
>   Let's put this patch on defer list because there are related work might
> take a different approach. Let's only review the make file change (PATCH 1/2).
> I believe "export" is needed since the variable is shared by all the build 
> but it
> might be already included due to the mk file inclusion. Since Andy is on
> vacation, I am not sure if he could make a comment on that.
> 
> Larry
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Wednesday, October 21, 2015 12:36 PM
> > To: Wang, Liang-min
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 2/2] examples: new example: l2fwd-
> > ethtool
> >
> > 2015-07-23 11:00, Liang-Min Larry Wang:
> > >  examples/Makefile|1 +
> > >  examples/l2fwd-ethtool/Makefile  |   48 +
> > >  examples/l2fwd-ethtool/l2fwd-app/Makefile|   58 ++
> > >  examples/l2fwd-ethtool/l2fwd-app/main.c  | 1025
> > ++
> > >  examples/l2fwd-ethtool/l2fwd-app/netdev_api.h|  770
> > 
> > >  examples/l2fwd-ethtool/l2fwd-app/shared_fifo.h   |  159 
> > >  examples/l2fwd-ethtool/lib/Makefile  |   57 ++
> > >  examples/l2fwd-ethtool/lib/rte_ethtool.c |  336 +++
> > >  examples/l2fwd-ethtool/lib/rte_ethtool.h |  385 
> > >  examples/l2fwd-ethtool/nic-control/Makefile  |   55 ++
> > >  examples/l2fwd-ethtool/nic-control/nic_control.c |  614 +
> > >  11 files changed, 3508 insertions(+)
> >
> > This patch is huge.
> > Please split a bit.
> >
> > > --- a/examples/Makefile
> > > +++ b/examples/Makefile
> > > @@ -53,6 +53,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_KNI) += kni
> > >  DIRS-y += l2fwd
> > >  DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem
> > >  DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += l2fwd-jobstats
> > > +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += l2fwd-ethtool
> > >  DIRS-y += l3fwd
> >
> > Please keep the alphabetical order.
> >
> > I do not plan to review it more.
> > If nobody complains, it means it's accepted.


[dpdk-dev] [RFC PATCH 2/2] lib/librte_eal: Remove unnecessary hugepage zero-filling

2015-11-17 Thread Zhihong Wang
The kernel fills new allocated (huge) pages with zeros.
DPDK just has to touch the pages to trigger the allocation.

Signed-off-by: Zhihong Wang 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 0de75cd..af823dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -410,7 +410,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,

if (orig) {
hugepg_tbl[i].orig_va = virtaddr;
-   memset(virtaddr, 0, hugepage_sz);
+   memset(virtaddr, 0, 8);
}
else {
hugepg_tbl[i].final_va = virtaddr;
@@ -592,9 +592,6 @@ remap_all_hugepages(struct hugepage_file *hugepg_tbl, 
struct hugepage_info *hpi)
}
}

-   /* zero out the whole segment */
-   memset(hugepg_tbl[page_idx].final_va, 0, total_size);
-
page_idx++;
}

-- 
2.5.0



[dpdk-dev] [RFC PATCH 1/2] lib/librte_eal: Reduce timer initialization time

2015-11-17 Thread Zhihong Wang
Changing from 1/2 second to 1/10 doesn't compromise the precision, and a 4/10 
second is worth saving.

Signed-off-by: Zhihong Wang 
---
 lib/librte_eal/linuxapp/eal/eal_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_timer.c 
b/lib/librte_eal/linuxapp/eal/eal_timer.c
index e0642de..4de0353 100644
--- a/lib/librte_eal/linuxapp/eal/eal_timer.c
+++ b/lib/librte_eal/linuxapp/eal/eal_timer.c
@@ -271,7 +271,7 @@ get_tsc_freq(void)
 #ifdef CLOCK_MONOTONIC_RAW
 #define NS_PER_SEC 1E9

-   struct timespec sleeptime = {.tv_nsec = 5E8 }; /* 1/2 second */
+   struct timespec sleeptime = {.tv_nsec = 1E8 }; /* 1/10 second */

struct timespec t_start, t_end;
uint64_t tsc_hz;
-- 
2.5.0



[dpdk-dev] [RFC PATCH 0/2] Reduce DPDK initialization time

2015-11-17 Thread Zhihong Wang
This RFC patch aims to reduce DPDK initialization time, which is important in 
cases such as micro service.

Changes are:

1. Reduce timer initialization time

2. Remove unnecessary hugepage zero-filling operations

With this patch:

1. Timer initialization time can be reduced by 4/10 second

2. Memory initialization time can be reduced nearly by half

The 2nd topic has been brought up before in this thread:
http://dpdk.org/dev/patchwork/patch/4219/

Zhihong Wang (2):
  lib/librte_eal: Reduce timer initialization time
  lib/librte_eal: Remove unnecessary hugepage zero-filling

 lib/librte_eal/linuxapp/eal/eal_memory.c | 5 +
 lib/librte_eal/linuxapp/eal/eal_timer.c  | 2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

-- 
2.5.0



[dpdk-dev] [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD

2015-11-17 Thread Yuanhan Liu
On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
> These variables are needed to be able to manage one of virtio devices
> using both vhost library APIs and vhost PMD.
> For example, if vhost PMD uses current callback handler and private data
> provided by vhost library, A DPDK application that links vhost library
> cannot use some of vhost library APIs.

Can you be more specific about this?

--yliu


[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-17 Thread Rich Lane
On Tue, Nov 17, 2015 at 6:56 PM, Yuanhan Liu 
wrote:

> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t
> queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx,
> _len, _idx);
> +   if (secure_len == 0)
> +   goto merge_rx_exit;
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
>

I think this needs to check whether secure_len was modified. secure_len is
read-write and could have a nonzero value going into the call. It could be
cleaner to give update_secure_len a return value saying whether it was able
to reserve any buffers.

Otherwise looks good, thanks!


[dpdk-dev] [PATCH v4 0/2] Add VHOST PMD

2015-11-17 Thread Yuanhan Liu
On Fri, Nov 13, 2015 at 03:50:16PM +0900, Tetsuya Mukawa wrote:
> On 2015/11/13 14:32, Yuanhan Liu wrote:
> > On Fri, Nov 13, 2015 at 02:20:29PM +0900, Tetsuya Mukawa wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> >> of librte_vhost.
> >>
> >> * Known issue.
> >> We may see issues while handling RESET_OWNER message.
> >> These handlings are done in vhost library, so not a part of vhost PMD.
> >> So far, we are waiting for QEMU fixing.
> > Fix patches have already been applied. Please help test :)
> >
> > --yliu
> 
> Hi Yuanhan,
> 
> It seems there might be an another issue related with "vq->callfd" in
> vhost library.
> We may miss something to handle the value correctly.
> 
> Anyway, here are steps.
> 1. Apply vhost PMD patch.
> (I guess you don't need it to reproduce the issue, but to reproduce it,
> using the PMD may be easy)
> 2. Start testpmd on host with vhost-user PMD.
> 3. Start QEMU with virtio-net device.
> 4. Login QEMU.
> 5. Bind the virtio-net device to igb_uio.
> 6. Start testpmd in QEMU.
> 7. Quit testmd in QEMU.
> 8. Start testpmd again in QEMU.
> 
> It seems when last command is executed, testpmd on host doesn't receive
> SET_VRING_CALL message from QEMU.
> Because of this, testpmd on host assumes virtio-net device is not ready.
> (I made sure virtio_is_ready() was failed on host).
> 
> According to QEMU source code, SET_VRING_KICK will be called when
> virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
> initialized.
> Not sure exactly, might be "vq->call" will be valid while connection is
> established?

Yes, it would be valid as far as we don't reset it from another
set_vring_call. So, we should not reset it on reset_device().

--yliu
> 
> Also I've found a workaround.
> Please execute after step7.
> 
> 8. Bind the virtio-net device to virtio-pci kernel driver.
> 9. Bind the virtio-net device to igb_uio.
> 10. Start testpmd in QEMU.
> 
> When step8 is executed, connection will be re-established, and testpmd
> on host will be able to receive SET_VRING_CALL.
> Then testpmd on host can start.
> 
> Thanks,
> Tetsuya


[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-17 Thread Yuanhan Liu
On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> You can reproduce this with l2fwd and the vhost PMD.
> 
> You'll need this patch on top of the vhost PMD patches:
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> ? ? ? ? ? ? ? ? return -1;
> ?
> ? ? ? ? if (dev->flags & VIRTIO_DEV_RUNNING)
> - ? ? ? ? ? ? ? notify_ops->destroy_device(dev);
> + ? ? ? ? ? ? ? notify_destroy_device(dev);
> ?
> ? ? ? ? cleanup_device(dev);
> ? ? ? ? reset_device(dev);
> 
> 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> eth_vhost0,iface=/run/vhost0.sock -- -p3
> 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> 3. Start l2fwd inside the VM:?l2fwd -l 0,1 --vdev eth_null -- -p3
> 4. Kill the l2fwd inside the VM with SIGINT.
> 5. Start l2fwd inside the VM.
> 6. l2fwd on the host crashes.
> 
> I found the source of the memory corruption by setting a watchpoint in
> gdb:?watch -l rte_eth_devices[1].data->rx_queues

Rich,

Thanks for the detailed steps for reproducing this issue, and sorry for
being a bit late: I finally got the time to dig this issue today.

Put simply, buffer overflow is not the root cause, but the fact "we do
not release resource on stop/exit" is.

And here is how the issue comes.  After step 4 (terminating l2fwd), neither
the l2fwd nor the virtio pmd driver does some resource release. Hence,
l2fwd at HOST will not notice such chage, still trying to receive and
queue packets to the vhost dev. It's not an issue as far as we don't
start l2fwd again, for there is actaully no packets to forward, and
rte_vhost_dequeue_burst returns from:

596 avail_idx =  *((volatile uint16_t *)>avail->idx);
597
598 /* If there are no available buffers then return. */
599 if (vq->last_used_idx == avail_idx)
600 return 0; 

But just at the init stage while starting l2fwd (step 5), rte_eal_memory_init()
resets all huge pages memory to zero, resulting all vq->desc[] items
being reset to zero, which in turn ends up with secure_len being set
with 0 at return.

(BTW, I'm not quite sure why the inside VM huge pages memory reset
would results to vq->desc reset).

The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
as update_secure_len() keeps setting secure_len with 0:

511do {
512avail_idx = *((volatile uint16_t 
*)>avail->idx);
513if (unlikely(res_cur_idx == avail_idx)) {
514LOG_DEBUG(VHOST_DATA,
515"(%"PRIu64") Failed "
516"to get enough desc from "
517"vring\n",
518dev->device_fh);
519goto merge_rx_exit;
520} else {
521update_secure_len(vq, res_cur_idx, 
_len, _idx);
522res_cur_idx++;
523}
524} while (pkt_len > secure_len);

The dead loop causes vec_idx keep increasing then, and overflows
quickly, leading to the crash in the end as you saw. 

So, the following would resolve this issue, in a right way (I
guess), and it's for virtio-pmd and l2fwd only so far.

---
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 12fcc23..8d6bf56 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1507,9 +1507,12 @@ static void
 virtio_dev_stop(struct rte_eth_dev *dev)
 {
struct rte_eth_link link;
+   struct virtio_hw *hw = dev->data->dev_private;

PMD_INIT_LOG(DEBUG, "stop");

+   vtpci_reset(hw);
+
if (dev->data->dev_conf.intr_conf.lsc)
rte_intr_disable(>pci_dev->intr_handle);

diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..565f648 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+static uint8_t nb_ports;
+static uint8_t nb_ports_available;
+
+/* When we receive a INT signal, unregister vhost driver */
+static void
+sigint_handler(__rte_unused int signum)
+{
+   uint8_t portid;
+
+   for (portid = 0; portid < nb_ports; portid++) {
+   /* skip ports that are not enabled */
+   if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
+   printf("Skipping disabled port %u\n", (unsigned) 
portid);
+   nb_ports_available--;
+   continue;
+   }
+
+   /* stopping port */
+ 

[dpdk-dev] [PATCH v4] doc: add nic performance guide on linux gsg

2015-11-17 Thread Qian Xu
Add a new guide doc as part of the Linux Getting Started Guide.

The document is a step-by-step guide on how to get high performance
with DPDK on an Intel platform.

It is designed for users who are not familiar with DPDK but would like
to get the best performance with NICs.

Signed-off-by: Qian Xu 

Changes in v4:
* Update some naming and wordings according to Thomas's comments.

Changes in v3:
* Refined the svg file.
* Made the perf guide more general, not specific with Intel NICs.
* Update BIOS settings.
* Update rst file format.
* Put it into linux_gsg folder.

Changes in v2:
* Created a svg file.
* Add one part about how to check memory channels by dmidecode.
* Add the command about how to check PCIe slot's speed.
* Some doc updates according to the comments.


diff --git a/doc/guides/linux_gsg/build_dpdk.rst 
b/doc/guides/linux_gsg/build_dpdk.rst
index 2680e66..014b52b 100644
--- a/doc/guides/linux_gsg/build_dpdk.rst
+++ b/doc/guides/linux_gsg/build_dpdk.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -28,6 +28,8 @@
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+.. _linux_gsg_compiling_dpdk:
+
 Compiling the DPDK Target from Source
 =

diff --git a/doc/guides/linux_gsg/img/intel_perf_test_setup.svg 
b/doc/guides/linux_gsg/img/intel_perf_test_setup.svg
new file mode 100644
index 000..31c60a6
--- /dev/null
+++ b/doc/guides/linux_gsg/img/intel_perf_test_setup.svg
@@ -0,0 +1,507 @@
+
+
+
+http://purl.org/dc/elements/1.1/;
+   xmlns:cc="http://creativecommons.org/ns#;
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
+   xmlns:svg="http://www.w3.org/2000/svg;
+   xmlns="http://www.w3.org/2000/svg;
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd;
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape;
+   width="750.94739"
+   height="466.69046"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.48.4 r9939"
+   sodipodi:docname="performance_test_setup.svg">
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+  
+  
+
+  
+image/svg+xml
+http://purl.org/dc/dcmitype/StillImage; />
+
+  
+
+  
+  
+
+
+IXIA
+
+
+Dest
 MAC: Port 0Dest IP: 2.1.1.1Src IP: Random
+Port
 A
+
+
+Dest MAC: Port 1Dest IP: 1.1.1.1Src IP: Random
+Port
 B
+
+
+
+Intel XL 71040G Ethernet
+Port
 0
+Flow 2
+Flow 1
+
+Port
 X
+  
+  
+
+
+
+Intel XL 71040G Ethernet
+
+Port
 1
+
+Port
 X
+
+Port
 0 to Port 1Port
 1 to Port 0 
+Forwarding
+IA
 Platform(Socket 1)
+  
+
diff --git a/doc/guides/linux_gsg/index.rst b/doc/guides/linux_gsg/index.rst
index 89800cc..d68135b 100644
--- a/doc/guides/linux_gsg/index.rst
+++ b/doc/guides/linux_gsg/index.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -47,3 +47,4 @@ Contents
 build_sample_apps
 enable_func
 quick_start
+nic_perf_intel_platform
diff --git a/doc/guides/linux_gsg/nic_perf_intel_platform.rst 
b/doc/guides/linux_gsg/nic_perf_intel_platform.rst
new file mode 100644
index 000..67b49c6
--- /dev/null
+++ b/doc/guides/linux_gsg/nic_perf_intel_platform.rst
@@ -0,0 +1,260 @@
+How to get best performance with NICs on Intel platforms
+
+
+This document is a step-by-step guide for getting high performance from DPDK 
applications on Intel platforms.
+
+
+Hardware and Memory Requirements
+
+
+For best performance use an Intel Xeon class server system such as Ivy Bridge, 
Haswell or newer.
+
+Ensure that each memory channel has at least one memory DIMM inserted, and 
that the memory size for each is at least 4GB.
+**Note**: this has one of the most direct effects on performance.
+
+You can check the memory configuration using ``dmidecode`` as follows::
+
+  dmidecode -t memory | grep Locator
+
+  Locator: DIMM_A1
+  Bank Locator: NODE 1
+  Locator: DIMM_A2
+  Bank Locator: NODE 1
+  Locator: DIMM_B1
+  Bank Locator: NODE 1
+  Locator: DIMM_B2
+  Bank Locator: NODE 1
+  ...
+  Locator: DIMM_G1
+  Bank Locator: NODE 2
+  Locator: DIMM_G2
+  Bank Locator: NODE 2
+  Locator: DIMM_H1
+  Bank Locator: NODE 2
+  Locator: DIMM_H2
+  Bank 

[dpdk-dev] [PATCH] ACL: fix build for native-icc target on haswell fails

2015-11-17 Thread Konstantin Ananyev
On HSW box with icc 16.0.0 build for x86_64-default-linuxapp-icc fails with:
icc: command line warning #10120: overriding '-march=native' with '-msse4.1'
...
dpdk.org/x86_64-native-linuxapp-icc/include/rte_memcpy.h(96): error: identifier 
"__m256i" is undefined

The reason is that icc treats "-march=native ... -msse4.1"
in a different way, then gcc and clang.
For icc it means override all flags enabled with
'-march=native' with '-msse4.1'.
Even when '-march=native' is a superset for '-msse4.1'.
To overcome the problem add a check is SSE4.1 compilation flag already enabled.
If yes, then no need to add '-msse4.1'
Similar change for avx2 compilation option.

Fixes: 074f54ad03ee ("acl: fix build and runtime for default target")

Reported-by: Declan Doherty 
Reported-by: Sergio Gonzalez Monroy 
Signed-off-by: Konstantin Ananyev 
---
 lib/librte_acl/Makefile | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 7a1cf8a..ff63a0c 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -50,24 +50,35 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c

-CFLAGS_acl_run_sse.o += -msse4.1
+#check if flag for SSE4.1 is already on, if not set it up manually
+ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_1,$(CFLAGS)),)
+   CFLAGS_acl_run_sse.o += -msse4.1
+endif

 #
 # If the compiler supports AVX2 instructions,
 # then add support for AVX2 classify method.
 #

-CC_AVX2_SUPPORT=$(shell $(CC) -march=core-avx2 -dM -E - &1 | \
-grep -q AVX2 && echo 1)
+#check if flag for AVX2 is already on, if not set it up manually
+ifeq ($(findstring 
RTE_MACHINE_CPUFLAG_AVX2,$(CFLAGS)),RTE_MACHINE_CPUFLAG_AVX2)
+   CC_AVX2_SUPPORT=1
+else
+   CC_AVX2_SUPPORT=\
+   $(shell $(CC) -march=core-avx2 -dM -E - &1 | \
+   grep -q AVX2 && echo 1)
+   ifeq ($(CC_AVX2_SUPPORT), 1)
+   ifeq ($(CC), icc)
+   CFLAGS_acl_run_avx2.o += -march=core-avx2
+   else
+   CFLAGS_acl_run_avx2.o += -mavx2
+   endif
+   endif
+endif

 ifeq ($(CC_AVX2_SUPPORT), 1)
SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_avx2.c
CFLAGS_rte_acl.o += -DCC_AVX2_SUPPORT
-   ifeq ($(CC), icc)
-   CFLAGS_acl_run_avx2.o += -march=core-avx2
-   else
-   CFLAGS_acl_run_avx2.o += -mavx2
-   endif
 endif

 # install this header file
-- 
1.8.5.3



[dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-11-17 Thread Jason Wang


On 11/17/2015 04:23 PM, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote:
>> > On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
>>> > > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
 > > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
>> > > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin 
>> > > > > > wrote:
>>> > > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
 > > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. 
 > > > > > > > Tsirkin wrote:
> > > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu 
> > > > > > > > > wrote:
>>> > > > > > > > > > > Please note that for virtio devices, guest is 
>>> > > > > > > > > > > supposed to
>>> > > > > > > > > > > control the placement of incoming packets in RX 
>>> > > > > > > > > > > queues.
>> > > > > > > > > > 
>> > > > > > > > > > I may not follow you.
>> > > > > > > > > > 
>> > > > > > > > > > Enqueuing packets to a RX queue is done at vhost 
>> > > > > > > > > > lib, outside the
>> > > > > > > > > > guest, how could the guest take the control here?
>> > > > > > > > > > 
>> > > > > > > > > >--yliu
> > > > > > > > > 
> > > > > > > > > vhost should do what guest told it to.
> > > > > > > > > 
> > > > > > > > > See virtio spec:
> > > > > > > > >   5.1.6.5.5 Automatic receive steering in 
> > > > > > > > > multiqueue mode
 > > > > > > > 
 > > > > > > > Spec says:
 > > > > > > > 
 > > > > > > > After the driver transmitted a packet of a flow on 
 > > > > > > > transmitqX,
 > > > > > > > the device SHOULD cause incoming packets for that 
 > > > > > > > flow to be
 > > > > > > > steered to receiveqX.
 > > > > > > > 
 > > > > > > > 
 > > > > > > > Michael, I still have no idea how vhost could know the 
 > > > > > > > flow even
 > > > > > > > after discussion with Huawei. Could you be more specific 
 > > > > > > > about
 > > > > > > > this? Say, how could guest know that? And how could 
 > > > > > > > guest tell
 > > > > > > > vhost which RX is gonna to use?
 > > > > > > > 
 > > > > > > > Thanks.
 > > > > > > > 
 > > > > > > >  --yliu
>>> > > > > > > 
>>> > > > > > > I don't really understand the question.
>>> > > > > > > 
>>> > > > > > > When guests transmits a packet, it makes a decision
>>> > > > > > > about the flow to use, and maps that to a tx/rx pair of 
>>> > > > > > > queues.
>>> > > > > > > 
>>> > > > > > > It sends packets out on the tx queue and expects device to
>>> > > > > > > return packets from the same flow on the rx queue.
>> > > > > > 
>> > > > > > Why? I can understand that there should be a mapping between
>> > > > > > flows and queues in a way that there is no re-ordering, but
>> > > > > > I can't see the relation of receiving a flow with a TX queue.
>> > > > > > 
>> > > > > > fbl
> > > > > 
> > > > > That's the way virtio chose to program the rx steering logic.
> > > > > 
> > > > > It's low overhead (no special commands), and
> > > > > works well for TCP when user is an endpoint since rx and tx
> > > > > for tcp are generally tied (because of ack handling).
>> > 
>> > It is low overhead for the control plane, but not for the data plane.
> Well, there's zero data plane overhead within the guest.
> You can't go lower :)
>
> > > > > We can discuss other ways, e.g. special commands for guests to
> > > > > program steering.
> > > > > We'd have to first see some data showing the current scheme
> > > > > is problematic somehow.
>> > 
>> > The issue is that the spec assumes the packets are coming in
>> > a serialized way and the distribution will be made by vhost-user
>> > but that isn't necessarily true.
>> > 
> Making the distribution guest controlled is obviously the right
> thing to do if guest is the endpoint: we need guest scheduler to
> make the decisions, it's the only entity that knows
> how are tasks distributed across VCPUs.
>
> It's possible that this is not the right thing for when guest
> is just doing bridging between two VNICs:
> are you saying packets should just go from RX queue N
> on eth0 to TX queue N on eth1, making host make all
> the queue selection decisions?

The problem looks like current automatic steering policy is not flexible
for all kinds of workload in guest. So we can implement the feature of
ntuple filters and export the interfaces to let guest/drivers to decide.

>
> This sounds reasonable. Since 

[dpdk-dev] DPDK Community Call - ARM Support

2015-11-17 Thread O'Driscoll, Tim
There's been a lot of activity on the mailing list recently on DPDK support for 
ARM. It's great to see the project being enhanced to embrace a new architecture.

We have seen some duplication of efforts on this, so we think it would make a 
good topic for a community call. This will give everybody a chance to share 
their plans so we can be clear on who's doing what and make sure that we avoid 
overlaps.

We'll host a community call on this next Tuesday (24th Nov) at 15:00-16:00 GMT. 
Details on the proposed agenda, the time in a couple of other timezones, and 
how to join the online meeting are included below.


Agenda:
ARMv7 & v8 ports:
- Summary of what's been submitted for 2.2 and what the remaining gaps are 
(Dave Hunt)
- Discussion on plans for further contributions in this area

External Memory Manager:
- Summary of our plans for DPDK 2.3 (Venky Venkatesan)
- Do others plan to do work in this area?

Other DPDK/ARM plans:
- Does anybody else have plans for ARM-related work in DPDK that they can share?


When: 
Tue, Nov 24, 2015 15:00 - 16:00 GMT
Tue, Nov 24, 2015 07:00 - 08:00 PST
Tue, Nov 24, 2015 10:00 - 11:00 EST
Tue, Nov 24, 2015 16:00 - 17:00 PST


Meeting Details:
You can join from your computer, tablet or smartphone: 
https://global.gotomeeting.com/join/535221101

You can also dial in using your phone. 

Access Code: 535-221-101 

Phone numbers:
United States: +1 (224) 501-3217   
Australia: +61 2 9087 3605   
Austria: +43 7 2088 1403   
Belgium: +32 (0) 28 93 7019   
Canada: +1 (647) 497-9351   
Denmark: +45 69 91 88 64   
Finland: +358 (0) 942 41 5781   
France: +33 (0) 182 880 458   
Germany: +49 (0) 692 5736 7210   
Ireland   +353 (0) 14 845 979   
Italy: +39 0 553 98 95 67   
Netherlands: +31 (0) 208 080 381   
New Zealand: +64 4 974 7214   
Norway: +47 21 03 58 98   
Spain: +34 955 32 0845   
Sweden: +46 (0) 853 527 836   
Switzerland: +41 (0) 435 0167 09   
United Kingdom: +44 (0) 330 221 0086


[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?

2015-11-17 Thread Thomas Monjalon
2015-11-17 08:56, Roger B. Melton:
> Hi David,  in-line -Roger
> 
> On 11/16/15 4:46 AM, David Marchand wrote:
> > Hello Roger,
> >
> > On Sun, Nov 15, 2015 at 3:45 PM, Roger B. Melton  > > wrote:
> >
> > I like the "-b all" and "-w none" idea, but I think it might be
> > complicated to implement it the way we would need it to work.  The
> > existing -b and -w options  persist for the duration of the
> > application, and we would need the "-b all"/"-w none" to persists
> > only through rte_eal_init() time.  Otherwise our attempt to to
> > attach a device at a later time would be blocked by the option.
> >
> > I agree, the black/white lists should only apply to initial scan.
> > I forgot about this problem ...
> > I had started some cleanup in the pci scan / attach code but this is 
> > too late for 2.2, I will post this in the next merge window.
> >
> >
> > Wouldn't it be simpler to have an option to disable the
> > rte_eal_init() time the probe.  Would that address the issue with
> > VFIO, prevent automatically attaching to devices while permitting
> > on demand attach?
> >
> >
> > I suppose we can do this yes (I think Thomas once proposed off-list an 
> > option like --no-pci-scan).
> > Do you think you can send a patch ?
> 
> What about --no-pci-init-probe?  I know it's long, but it is more 
> descriptive of it's purpose to disable only the init time pci probe.

Why not a "-b all"?
Making it work would also solve the case where you to scan only part of
the devices and initialize the blacklisted ones later.


[dpdk-dev] [PATCH v7 01/10] ethdev: rename macros to have RTE_ prefix

2015-11-17 Thread Thomas Monjalon
2015-11-17 14:44, Declan Doherty:
> Hey Thomas,
> 
> this patch needs to be re-based due to the committal of Daniel's patch 
> "ethdev: add ieee1588 functions for device clock time" is it ok to just 
> send an updated patch for this single patch as it doesn't effect the 
> other 9 patches in the series?

Yes, please use --in-reply-to to thread below v7 01/10 and change its state
in patchwork.
Thanks


[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Ananyev, Konstantin
Hi Matt,

As I said, at least  try to upgrade contents of shared code to the latest one.
In previous releases: lib/librte_pmd_ixgbe/ixgbe, now located at: 
drivers/net/ixgbe/.

> For reference, my transmit function is  rte_eth_tx_burst().
I meant what ixgbe TX function it points to: ixgbe_xmit_pkts or 
ixgbe_xmit_pkts_simple()?
For ixgbe_xmit_pkts_simple() don?t set tx_rs_thresh > 32,
for ixgbe_xmit_pkts() the safest way is to set  tx_rs_thresh=1.
Though as I understand from your previous mails, you already did that, and it 
didn?t help.
Konstantin


From: Matt Laswell [mailto:lasw...@infiniteio.com]
Sent: Tuesday, November 17, 2015 3:05 PM
To: Ananyev, Konstantin
Cc: Stephen Hemminger; dev at dpdk.org
Subject: Re: [dpdk-dev] How to approach packet TX lockups

Hey Konstantin,

Moving from 1.6r2 to 2.2 is going to be a pretty significant change due to 
things like changes in the MBuf format, API differences, etc.  Even as an 
experiment, that's an awfully large change to absorb.  Is there a subset that 
you're referring to that could be more readily included without modifying so 
many touch points into DPDK?

For reference, my transmit function is  rte_eth_tx_burst().  It seems to 
reliably tell me that it has enqueued all of the packets that I gave it, 
however the stats from rte_eth_stats_get() indicate that no packets are 
actually being sent.

Thanks,

- Matt

On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin mailto:konstantin.ananyev at intel.com>> wrote:


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On 
> Behalf Of Matt Laswell
> Sent: Tuesday, November 17, 2015 2:24 PM
> To: Stephen Hemminger
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] How to approach packet TX lockups
>
> Yes, we're on 1.6r2.  That said, I've tried a number of different values
> for the thresholds without a lot of luck.  Setting wthresh/hthresh/pthresh
> to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> suggested, I'm pretty sure using 0 for the thresholds leads to auto-config
> by the driver.  I also tried 1/1/32, which required that I also change the
> rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
>
> Any other suggestions?

That's not only DPDK code changed since 1.6.
I am pretty sure that we also have a new update of shared code since then
(and as I remember probably more than one).
One suggestion would be at least try to upgrade the shared code up to the 
latest.
Another one - even if you can't upgrade to 2.2 in you production environment,
it probably worth to do that in some test environment and then check does the 
problem persist.
If yes,  then we'll need some guidance how to reproduce it.

Another question it is not clear what TX function do you use?
Konstantin

>
> On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> stephen at networkplumber.org> wrote:
>
> > On Mon, 16 Nov 2015 18:49:15 -0600
> > Matt Laswell mailto:laswell at infiniteio.com>> 
> > wrote:
> >
> > > Hey Stephen,
> > >
> > > Thanks a lot; that's really useful information.  Unfortunately, I'm at a
> > > stage in our release cycle where upgrading to a new version of DPDK isn't
> > > feasible.  Any chance you (or others reading this) has a pointer to the
> > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > backporting targeted fixes is more doable.
> > >
> > > Again, thanks.
> > >
> > > - Matt
> > >
> > >
> > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > stephen at networkplumber.org> 
> > > wrote:
> > >
> > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > Matt Laswell mailto:laswell at 
> > > > infiniteio.com>> wrote:
> > > >
> > > > > Hey Folks,
> > > > >
> > > > > I sent this to the users email list, but I'm not sure how many
> > people are
> > > > > actively reading that list at this point.  I'm dealing with a
> > situation
> > > > in
> > > > > which my application loses the ability to transmit packets out of a
> > port
> > > > > during times of moderate stress.  I'd love to hear suggestions for
> > how to
> > > > > approach this problem, as I'm a bit at a loss at the moment.
> > > > >
> > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> > Haswell
> > > > > processors.  I'm using the 82599 controller, configured to spread
> > packets
> > > > > across multiple queues.  Each queue is accessed by a different lcore
> > in
> > > > my
> > > > > application; there is therefore concurrent access to the controller,
> > but
> > > > > not to any of the queues.  We're binding the ports to the igb_uio
> > driver.
> > > > > The symptoms I see are these:
> > > > >
> > > > >
> > > > >- All transmit out of a particular port stops
> > > > >- rte_eth_tx_burst() indicates that it is sending all of the
> > packets
> > > > >

[dpdk-dev] [PATCH v7.1 01/10] ethdev: rename macros to have RTE_ prefix

2015-11-17 Thread Declan Doherty
The macros to check that the function pointers and port ids are valid
for an ethdev are potentially useful to have in a common headers for
use with all PMDs. However, since they would then become externally
visible, we apply the RTE_ & RTE_ETH_ prefix to them as approtiate.

Signed-off-by: Declan Doherty 
Acked-by: Bruce Richardson 

---
 lib/librte_ether/rte_ethdev.c | 607 +-
 1 file changed, 304 insertions(+), 303 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b19ac9a..71775dc 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -70,58 +70,59 @@
 #include "rte_ethdev.h"

 #ifdef RTE_LIBRTE_ETHDEV_DEBUG
-#define PMD_DEBUG_TRACE(fmt, args...) do {\
+#define RTE_PMD_DEBUG_TRACE(fmt, args...) do { \
RTE_LOG(ERR, PMD, "%s: " fmt, __func__, ## args); \
} while (0)
 #else
-#define PMD_DEBUG_TRACE(fmt, args...)
+#define RTE_PMD_DEBUG_TRACE(fmt, args...)
 #endif

 /* Macros for checking for restricting functions to primary instance only */
-#define PROC_PRIMARY_OR_ERR_RET(retval) do { \
+#define RTE_PROC_PRIMARY_OR_ERR_RET(retval) do { \
if (rte_eal_process_type() != RTE_PROC_PRIMARY) { \
-   PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
+   RTE_PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
return (retval); \
} \
 } while (0)

-#define PROC_PRIMARY_OR_RET() do { \
+#define RTE_PROC_PRIMARY_OR_RET() do { \
if (rte_eal_process_type() != RTE_PROC_PRIMARY) { \
-   PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
+   RTE_PMD_DEBUG_TRACE("Cannot run in secondary processes\n"); \
return; \
} \
 } while (0)

 /* Macros to check for invalid function pointers in dev_ops structure */
-#define FUNC_PTR_OR_ERR_RET(func, retval) do { \
+#define RTE_FUNC_PTR_OR_ERR_RET(func, retval) do { \
if ((func) == NULL) { \
-   PMD_DEBUG_TRACE("Function not supported\n"); \
+   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
return (retval); \
} \
 } while (0)

-#define FUNC_PTR_OR_RET(func) do { \
+#define RTE_FUNC_PTR_OR_RET(func) do { \
if ((func) == NULL) { \
-   PMD_DEBUG_TRACE("Function not supported\n"); \
+   RTE_PMD_DEBUG_TRACE("Function not supported\n"); \
return; \
} \
 } while (0)

 /* Macros to check for valid port */
-#define VALID_PORTID_OR_ERR_RET(port_id, retval) do {  \
-   if (!rte_eth_dev_is_valid_port(port_id)) {  \
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
-   return retval;  \
-   }   \
+#define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do { \
+   if (!rte_eth_dev_is_valid_port(port_id)) {  \
+   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+   return retval; \
+   } \
 } while (0)

-#define VALID_PORTID_OR_RET(port_id) do {  \
-   if (!rte_eth_dev_is_valid_port(port_id)) {  \
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
-   return; \
-   }   \
+#define RTE_ETH_VALID_PORTID_OR_RET(port_id) do { \
+   if (!rte_eth_dev_is_valid_port(port_id)) { \
+   RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+   return; \
+   } \
 } while (0)

+
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
 static struct rte_eth_dev_data *rte_eth_dev_data;
@@ -244,7 +245,7 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)

port_id = rte_eth_dev_find_free_port();
if (port_id == RTE_MAX_ETHPORTS) {
-   PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
+   RTE_PMD_DEBUG_TRACE("Reached maximum number of Ethernet 
ports\n");
return NULL;
}

@@ -252,7 +253,7 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)
rte_eth_dev_data_alloc();

if (rte_eth_dev_allocated(name) != NULL) {
-   PMD_DEBUG_TRACE("Ethernet Device with name %s already 
allocated!\n",
+   RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s already 
allocated!\n",
name);
return NULL;
}
@@ -339,7 +340,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
if (diag == 0)
return 0;

-   PMD_DEBUG_TRACE("driver %s: eth_dev_init(vendor_id=0x%u device_id=0x%x) 
failed\n",
+   RTE_PMD_DEBUG_TRACE("driver %s: eth_dev_init(vendor_id=0x%u 
device_id=0x%x) 

[dpdk-dev] [PATCH] app/test: fix memory_autotest integer overflow/wraparound

2015-11-17 Thread Sergio Gonzalez Monroy
memory_autotest loops infinitely when at least one the memsegs
is bigger than 4GB.

The issue is the result of an integer overflow/wraparound of
the offset variable.

Fix it by using the correct type (size_t).

Signed-off-by: Sergio Gonzalez Monroy 
---
 app/test/test_memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/app/test/test_memory.c b/app/test/test_memory.c
index 02ef3cf..6816385 100644
--- a/app/test/test_memory.c
+++ b/app/test/test_memory.c
@@ -55,7 +55,8 @@ static int
 test_memory(void)
 {
uint64_t s;
-   unsigned i, j;
+   unsigned i;
+   size_t j;
const struct rte_memseg *mem;

/*
-- 
2.4.3



[dpdk-dev] [PATCH 1/1] mpipe: add missing version map for mpipe pmd driver

2015-11-17 Thread Zhigang Lu
Without it, compiling error occurs when CONFIG_RTE_BUILD_SHARED_LIB
is enabled.

Reported-by: Guo Xin 
Signed-off-by: Zhigang Lu 
---
 drivers/net/mpipe/rte_pmd_mpipe_version.map | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 drivers/net/mpipe/rte_pmd_mpipe_version.map

diff --git a/drivers/net/mpipe/rte_pmd_mpipe_version.map 
b/drivers/net/mpipe/rte_pmd_mpipe_version.map
new file mode 100644
index 000..ad607bb
--- /dev/null
+++ b/drivers/net/mpipe/rte_pmd_mpipe_version.map
@@ -0,0 +1,3 @@
+DPDK_2.2 {
+   local: *;
+};
-- 
2.1.2



[dpdk-dev] [PATCH] i40e: skip any phy config as a workaround

2015-11-17 Thread Helin Zhang
As firmware does not support any link control from software driver
side, any phy config should be ignored as a workaround. Otherwise
the link might not be up again after binding back to kernel driver.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.c | 61 +++---
 1 file changed, 9 insertions(+), 52 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2c51a0b..f06c566 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1347,58 +1347,15 @@ i40e_parse_link_speed(uint16_t eth_link_speed)
 }

 static int
-i40e_phy_conf_link(struct i40e_hw *hw, uint8_t abilities, uint8_t force_speed)
-{
-   enum i40e_status_code status;
-   struct i40e_aq_get_phy_abilities_resp phy_ab;
-   struct i40e_aq_set_phy_config phy_conf;
-   const uint8_t mask = I40E_AQ_PHY_FLAG_PAUSE_TX |
-   I40E_AQ_PHY_FLAG_PAUSE_RX |
-   I40E_AQ_PHY_FLAG_LOW_POWER;
-   const uint8_t advt = I40E_LINK_SPEED_40GB |
-   I40E_LINK_SPEED_10GB |
-   I40E_LINK_SPEED_1GB |
-   I40E_LINK_SPEED_100MB;
-   int ret = -ENOTSUP;
-
-   /* Skip it on 40G interfaces, as a workaround for the link issue */
-   if (i40e_is_40G_device(hw->device_id))
-   return I40E_SUCCESS;
-
-   status = i40e_aq_get_phy_capabilities(hw, false, false, _ab,
- NULL);
-   if (status)
-   return ret;
-
-   memset(_conf, 0, sizeof(phy_conf));
-
-   /* bits 0-2 use the values from get_phy_abilities_resp */
-   abilities &= ~mask;
-   abilities |= phy_ab.abilities & mask;
-
-   /* update ablities and speed */
-   if (abilities & I40E_AQ_PHY_AN_ENABLED)
-   phy_conf.link_speed = advt;
-   else
-   phy_conf.link_speed = force_speed;
-
-   phy_conf.abilities = abilities;
-
-   /* use get_phy_abilities_resp value for the rest */
-   phy_conf.phy_type = phy_ab.phy_type;
-   phy_conf.eee_capability = phy_ab.eee_capability;
-   phy_conf.eeer = phy_ab.eeer_val;
-   phy_conf.low_power_ctrl = phy_ab.d3_lpan;
-
-   PMD_DRV_LOG(DEBUG, "\tCurrent: abilities %x, link_speed %x",
-   phy_ab.abilities, phy_ab.link_speed);
-   PMD_DRV_LOG(DEBUG, "\tConfig:  abilities %x, link_speed %x",
-   phy_conf.abilities, phy_conf.link_speed);
-
-   status = i40e_aq_set_phy_config(hw, _conf, NULL);
-   if (status)
-   return ret;
-
+i40e_phy_conf_link(__rte_unused struct i40e_hw *hw,
+  __rte_unused uint8_t abilities,
+  __rte_unused uint8_t force_speed)
+{
+   /* Skip any phy config on both 10G and 40G interfaces, as a workaround
+* for the link control limitation of that all link control should be
+* handled by firmware. It should follow up if link control will be
+* opened to software driver in future firmware versions.
+*/
return I40E_SUCCESS;
 }

-- 
1.9.3



[dpdk-dev] [PATCH v3 2/2] examples: add pthread-shim in performance-thread sample app

2015-11-17 Thread Mcnamara, John


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of ibetts
> Sent: Tuesday, November 17, 2015 12:11 PM
> To: dev at dpdk.org
> Cc: Betts, Ian
> Subject: [dpdk-dev] [PATCH v3 2/2] examples: add pthread-shim in
> performance-thread sample app

Hi Ian,

For some reason it looks like [PATCH v3 1/2] didn't make it to the list.

John.
-- 



[dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-11-17 Thread Flavio Leitner
On Tue, Nov 17, 2015 at 10:23:38AM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote:
> > On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
> > > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > > > > > control the placement of incoming packets in RX queues.
> > > > > > > > > > 
> > > > > > > > > > I may not follow you.
> > > > > > > > > > 
> > > > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, 
> > > > > > > > > > outside the
> > > > > > > > > > guest, how could the guest take the control here?
> > > > > > > > > > 
> > > > > > > > > > --yliu
> > > > > > > > > 
> > > > > > > > > vhost should do what guest told it to.
> > > > > > > > > 
> > > > > > > > > See virtio spec:
> > > > > > > > >   5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > > > > > 
> > > > > > > > Spec says:
> > > > > > > > 
> > > > > > > > After the driver transmitted a packet of a flow on 
> > > > > > > > transmitqX,
> > > > > > > > the device SHOULD cause incoming packets for that flow to be
> > > > > > > > steered to receiveqX.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Michael, I still have no idea how vhost could know the flow even
> > > > > > > > after discussion with Huawei. Could you be more specific about
> > > > > > > > this? Say, how could guest know that? And how could guest tell
> > > > > > > > vhost which RX is gonna to use?
> > > > > > > > 
> > > > > > > > Thanks.
> > > > > > > > 
> > > > > > > > --yliu
> > > > > > > 
> > > > > > > I don't really understand the question.
> > > > > > > 
> > > > > > > When guests transmits a packet, it makes a decision
> > > > > > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > > > > > 
> > > > > > > It sends packets out on the tx queue and expects device to
> > > > > > > return packets from the same flow on the rx queue.
> > > > > > 
> > > > > > Why? I can understand that there should be a mapping between
> > > > > > flows and queues in a way that there is no re-ordering, but
> > > > > > I can't see the relation of receiving a flow with a TX queue.
> > > > > > 
> > > > > > fbl
> > > > > 
> > > > > That's the way virtio chose to program the rx steering logic.
> > > > > 
> > > > > It's low overhead (no special commands), and
> > > > > works well for TCP when user is an endpoint since rx and tx
> > > > > for tcp are generally tied (because of ack handling).
> > 
> > It is low overhead for the control plane, but not for the data plane.
> 
> Well, there's zero data plane overhead within the guest.
> You can't go lower :)

I agree, but I am talking about vhost-user or whatever means we use to
provide packets to the virtio backend. That will have to distribute
the packets according to the guest's mapping which is not zero overhead.


> > > > > We can discuss other ways, e.g. special commands for guests to
> > > > > program steering.
> > > > > We'd have to first see some data showing the current scheme
> > > > > is problematic somehow.
> > 
> > The issue is that the spec assumes the packets are coming in
> > a serialized way and the distribution will be made by vhost-user
> > but that isn't necessarily true.
> > 
> 
> Making the distribution guest controlled is obviously the right
> thing to do if guest is the endpoint: we need guest scheduler to
> make the decisions, it's the only entity that knows
> how are tasks distributed across VCPUs.

Again, I agree.  My point is that it can also allows no mapping
or full freedom. I don't see that as an option now.

> It's possible that this is not the right thing for when guest
> is just doing bridging between two VNICs:
> are you saying packets should just go from RX queue N
> on eth0 to TX queue N on eth1, making host make all
> the queue selection decisions?

The idea is that the guest could TX on queue N and the host
would push packets from the same stream on RX queue Y. So,
guest is free to send packets on any queue and the host is
free to send packet on any queue as long as both keep a stable
mapping to avoid re-ordering.

What if the guest is not trustable and the host has the requirement
to send priority packets to queue#0?  That is not possible if 
backend is forced to follow guest mapping.

> This sounds reasonable. Since there's a mix of local and
> bridged traffic normally, does this mean we need
> a 

[dpdk-dev] [PATCH v7 01/10] ethdev: rename macros to have RTE_ prefix

2015-11-17 Thread Declan Doherty
On 13/11/15 18:58, Declan Doherty wrote:
> The macros to check that the function pointers and port ids are valid
> for an ethdev are potentially useful to have in a common headers for
> use with all PMDs. However, since they would then become externally
> visible, we apply the RTE_ & RTE_ETH_ prefix to them as approtiate.
>
> Signed-off-by: Declan Doherty 
> Acked-by: Bruce Richardson 
>
> ---
>   lib/librte_ether/rte_ethdev.c | 595 
> +-
>   1 file changed, 298 insertions(+), 297 deletions(-)
> 
>

Hey Thomas,

this patch needs to be re-based due to the committal of Daniel's patch 
"ethdev: add ieee1588 functions for device clock time" is it ok to just 
send an updated patch for this single patch as it doesn't effect the 
other 9 patches in the series?

Thanks
Declan


[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> Sent: Tuesday, November 17, 2015 2:24 PM
> To: Stephen Hemminger
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] How to approach packet TX lockups
> 
> Yes, we're on 1.6r2.  That said, I've tried a number of different values
> for the thresholds without a lot of luck.  Setting wthresh/hthresh/pthresh
> to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> suggested, I'm pretty sure using 0 for the thresholds leads to auto-config
> by the driver.  I also tried 1/1/32, which required that I also change the
> rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
> 
> Any other suggestions?

That's not only DPDK code changed since 1.6.
I am pretty sure that we also have a new update of shared code since then
(and as I remember probably more than one).
One suggestion would be at least try to upgrade the shared code up to the 
latest.
Another one - even if you can't upgrade to 2.2 in you production environment,
it probably worth to do that in some test environment and then check does the 
problem persist.
If yes,  then we'll need some guidance how to reproduce it.

Another question it is not clear what TX function do you use?
Konstantin

> 
> On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> stephen at networkplumber.org> wrote:
> 
> > On Mon, 16 Nov 2015 18:49:15 -0600
> > Matt Laswell  wrote:
> >
> > > Hey Stephen,
> > >
> > > Thanks a lot; that's really useful information.  Unfortunately, I'm at a
> > > stage in our release cycle where upgrading to a new version of DPDK isn't
> > > feasible.  Any chance you (or others reading this) has a pointer to the
> > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > backporting targeted fixes is more doable.
> > >
> > > Again, thanks.
> > >
> > > - Matt
> > >
> > >
> > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > stephen at networkplumber.org> wrote:
> > >
> > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > Matt Laswell  wrote:
> > > >
> > > > > Hey Folks,
> > > > >
> > > > > I sent this to the users email list, but I'm not sure how many
> > people are
> > > > > actively reading that list at this point.  I'm dealing with a
> > situation
> > > > in
> > > > > which my application loses the ability to transmit packets out of a
> > port
> > > > > during times of moderate stress.  I'd love to hear suggestions for
> > how to
> > > > > approach this problem, as I'm a bit at a loss at the moment.
> > > > >
> > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> > Haswell
> > > > > processors.  I'm using the 82599 controller, configured to spread
> > packets
> > > > > across multiple queues.  Each queue is accessed by a different lcore
> > in
> > > > my
> > > > > application; there is therefore concurrent access to the controller,
> > but
> > > > > not to any of the queues.  We're binding the ports to the igb_uio
> > driver.
> > > > > The symptoms I see are these:
> > > > >
> > > > >
> > > > >- All transmit out of a particular port stops
> > > > >- rte_eth_tx_burst() indicates that it is sending all of the
> > packets
> > > > >that I give to it
> > > > >- rte_eth_stats_get() gives me stats indicating that no packets
> > are
> > > > >being sent on the affected port.  Also, no tx errors, and no pause
> > > > frames
> > > > >sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> > > > >- All other ports continue to work normally
> > > > >- The affected port continues to receive packets without problems;
> > > > only
> > > > >TX is affected
> > > > >- Resetting the port via rte_eth_dev_stop() and
> > rte_eth_dev_start()
> > > > >restores things and packets can flow again
> > > > >- The problem is replicable on multiple devices, and doesn't
> > follow
> > > > one
> > > > >particular port
> > > > >
> > > > > I've tried calling rte_mbuf_sanity_check() on all packets before
> > sending
> > > > > them.  I've also instrumented my code to look for packets that have
> > > > already
> > > > > been sent or freed, as well as cycles in chained packets being
> > sent.  I
> > > > > also put a lock around all accesses to rte_eth* calls to synchronize
> > > > access
> > > > > to the NIC.  Given some recent discussion here, I also tried
> > changing the
> > > > > TX RS threshold from 0 to 32, 16, and 1.  None of these strategies
> > proved
> > > > > effective.
> > > > >
> > > > > Like I said at the top, I'm a little at a loss at this point.  If you
> > > > were
> > > > > dealing with this set of symptoms, how would you proceed?
> > > > >
> > > >
> > > > I remember some issues with old DPDK 1.6 with some of the prefetch
> > > > thresholds on 82599. You would be better off going to a later DPDK
> > > > version.
> > > >
> >
> > I hope you are on 1.6.0r2 at least??
> >
> > 

[dpdk-dev] [PATCH 1/1] config/tile: disable KNI kmod option on tile

2015-11-17 Thread Zhigang Lu
Commit 36080ff96b0e causes compiling error on tile, as tile
does not support KNI, so we disable the CONFIG_RTE_KNI_KMOD.

Fixes: 36080ff96b0e ("config: add KNI kmod option")

Reported-by: Guo Xin 
Signed-off-by: Zhigang Lu 
---
 config/defconfig_tile-tilegx-linuxapp-gcc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/defconfig_tile-tilegx-linuxapp-gcc 
b/config/defconfig_tile-tilegx-linuxapp-gcc
index a5d8bd6..9df9d7f 100644
--- a/config/defconfig_tile-tilegx-linuxapp-gcc
+++ b/config/defconfig_tile-tilegx-linuxapp-gcc
@@ -51,6 +51,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
 CONFIG_RTE_EAL_IGB_UIO=n
 CONFIG_RTE_EAL_VFIO=n
 CONFIG_RTE_LIBRTE_KNI=n
+CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
 CONFIG_RTE_LIBRTE_IGB_PMD=n
 CONFIG_RTE_LIBRTE_EM_PMD=n
-- 
2.1.2



[dpdk-dev] [PATCH v4 2/2] ethdev: add sanity checks to functions

2015-11-17 Thread Bruce Richardson
The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are
supported by very few PMDs. Therefore, it is best to check for support
for the functions in the ethdev library, so as to avoid run-time crashes
at run-time if the application goes to use those APIs. Similarly, the
port parameter should also be checked for validity.

Signed-off-by: Bruce Richardson 

---
 lib/librte_ether/rte_ethdev.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index a00cd46..028be59 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2533,16 +2533,16 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
  * @param queue_id
  *  The queue id on the specific port.
  * @return
- *  The number of used descriptors in the specific queue.
+ *  The number of used descriptors in the specific queue, or:
+ * (-EINVAL) if *port_id* is invalid
+ * (-ENOTSUP) if the device does not support this function
  */
-static inline uint32_t
+static inline int
 rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id)
 {
struct rte_eth_dev *dev = _eth_devices[port_id];
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, 0);
-#endif
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, -ENOTSUP);
 return (*dev->dev_ops->rx_queue_count)(dev, queue_id);
 }

@@ -2559,15 +2559,14 @@ rte_eth_rx_queue_count(uint8_t port_id, uint16_t 
queue_id)
  *  - (1) if the specific DD bit is set.
  *  - (0) if the specific DD bit is not set.
  *  - (-ENODEV) if *port_id* invalid.
+ *  - (-ENOTSUP) if the device does not support this function
  */
 static inline int
 rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t offset)
 {
struct rte_eth_dev *dev = _eth_devices[port_id];
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
-#endif
return (*dev->dev_ops->rx_descriptor_done)( \
dev->data->rx_queues[queue_id], offset);
 }
-- 
2.5.0



[dpdk-dev] [PATCH v4 1/2] ethdev: remove duplicated debug functions

2015-11-17 Thread Bruce Richardson
The functions for rx/tx burst, for rx_queue_count and descriptor_done in
the ethdev library all had two copies of the code. One copy in
rte_ethdev.h was inlined for performance, while a second was in
rte_ethdev.c for debugging purposes only. We can eliminate the second
copy of the functions by moving the additional debug checks into the
copies of the functions in the header file. [Any compilation for
debugging at optimization level 0 will not inline the function so the
result should be same as when the function was in the .c file.]

Signed-off-by: Bruce Richardson 
---
 lib/librte_ether/rte_ethdev.c | 64 ---
 lib/librte_ether/rte_ethdev.h | 59 ---
 2 files changed, 29 insertions(+), 94 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index f4648ac..739db81 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2439,70 +2439,6 @@ rte_eth_mirror_rule_reset(uint8_t port_id, uint8_t 
rule_id)
return (*dev->dev_ops->mirror_rule_reset)(dev, rule_id);
 }

-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-uint16_t
-rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
-struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-
-   dev = _eth_devices[port_id];
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);
-   if (queue_id >= dev->data->nb_rx_queues) {
-   RTE_PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
-   return 0;
-   }
-   return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
-   rx_pkts, nb_pkts);
-}
-
-uint16_t
-rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
-struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-
-   dev = _eth_devices[port_id];
-
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->tx_pkt_burst, 0);
-   if (queue_id >= dev->data->nb_tx_queues) {
-   RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
-   return 0;
-   }
-   return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id],
-   tx_pkts, nb_pkts);
-}
-
-uint32_t
-rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-
-   dev = _eth_devices[port_id];
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, 0);
-   return (*dev->dev_ops->rx_queue_count)(dev, queue_id);
-}
-
-int
-rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t offset)
-{
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-   dev = _eth_devices[port_id];
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
-   return 
(*dev->dev_ops->rx_descriptor_done)(dev->data->rx_queues[queue_id],
-  offset);
-}
-#endif
-
 int
 rte_eth_dev_callback_register(uint8_t port_id,
enum rte_eth_event_type event,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b51b8aa..a00cd46 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2492,18 +2492,21 @@ extern int rte_eth_dev_set_vlan_pvid(uint8_t port_id, 
uint16_t pvid, int on);
  *   of pointers to *rte_mbuf* structures effectively supplied to the
  *   *rx_pkts* array.
  */
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-extern uint16_t rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
-struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
-#else
 static inline uint16_t
 rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
 {
-   struct rte_eth_dev *dev;
+   struct rte_eth_dev *dev = _eth_devices[port_id];

-   dev = _eth_devices[port_id];
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);

+   if (queue_id >= dev->data->nb_rx_queues) {
+   RTE_PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
+   return 0;
+   }
+#endif
int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
rx_pkts, nb_pkts);

@@ -2521,7 +2524,6 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,

return nb_rx;
 }
-#endif

 /**
  * Get the number of used descriptors in a specific queue
@@ -2533,18 +2535,16 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
  * @return
  *  The number of used descriptors in the specific queue.
  */
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-extern uint32_t rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id);
-#else
 

[dpdk-dev] [PATCH v4 0/2] ethdev: debug code cleanup

2015-11-17 Thread Bruce Richardson
This patchset performs two cleanups:
1. Four functions in ethdev.c which were enabled for debug only have been
  merged into their inlined header-file counterparts. This change required that
  a number of macros be renamed and moved to the header file too. The macro 
changes
  are in patches 1 & 2, and the elimination of the separate debug fns are in 
patch 3.
2. Checks for valid function pointers are added to the API calls for reading
  the descriptor ring count, and checking for a valid descriptor. This is 
because
  these functions are not implemented by most drivers, and so it's far safer to
  have the check.

NOTE: This patchset now depends upon the cryptodev patchset

---

V4 Changes:
* Originally this was a 4-patch set, but patches 1 and 2 duplicated changes 
being
  made in the patchset to add crypto device support. Therefore this set has
  been reduced to two patches to sit on top of that set.
* As suggested on-list, when adding checks for the function pointers being
  valid we can also add in the similarly lightweight checks for the port id
  being valid.

V3 Changes:
* Rebased to latest DPDK codebase
* Fixed checkpatch issues in patches 2 and 3.

V2 Changes:
* Rebased to latest DPDK codebase
* Changed type from uint32_t to int for the count function, on the basis of
feedback received.

Bruce Richardson (2):
  ethdev: remove duplicated debug functions
  ethdev: add sanity checks to functions

 lib/librte_ether/rte_ethdev.c | 64 ---
 lib/librte_ether/rte_ethdev.h | 62 -
 2 files changed, 30 insertions(+), 96 deletions(-)

-- 
2.5.0



[dpdk-dev] [PATCH v3 1/2] examples: add performance thread sample application

2015-11-17 Thread ibetts
From: Ian Betts 

This example comprises a layer 3 forwarding derivative intended to
facilitate characterization of performance with different
threading models, specifically:-

1. EAL threads running on different physical cores
2. EAL threads running on the same physical core
3. Lightweight threads running in an EAL thread

Purpose and justification

Since dpdk 2.0 it has been possible to assign multiple EAL threads to
a physical core ( case 2 above ).
Currently no example application has focused on demonstrating the
performance constraints of differing threading models.

Whilst purpose built applications that fully comprehend the DPDK
single threaded programming model will always yield superior
performance, the desire to preserve ROI in legacy code written for
multithreaded operating environments  makes lightweight threads
(case 3 above) worthy of consideration.

As well as aiding with legacy code reuse, it is anticipated that
lightweight threads will make it possible to scale a multithreaded
application with fine granularity allowing an application  to more
easily take advantage of headroom on EAL cores, or conversely occupy
more cores, as dictated by system load.

To explore performance with lightweight threads a simple cooperative
scheduler subsystem is being included in this example application.
If the expected benefits and use cases prove to be of value, it is
anticipated that this lightweight thread subsystem would become a
library in some future DPDK release.

Signed-off-by: Ian Betts 
---
 config/common_linuxapp |5 +
 config/defconfig_i686-native-linuxapp-gcc  |5 +
 config/defconfig_i686-native-linuxapp-icc  |5 +
 config/defconfig_x86_x32-native-linuxapp-gcc   |5 +
 doc/guides/rel_notes/release_2_2.rst   |   11 +
 .../sample_app_ug/img/performance_thread_1.svg |  799 +
 .../sample_app_ug/img/performance_thread_2.svg |  803 +
 doc/guides/sample_app_ug/index.rst |1 +
 doc/guides/sample_app_ug/performance_thread.rst| 1150 +++
 examples/Makefile  |1 +
 examples/performance-thread/Makefile   |   44 +
 .../performance-thread/common/arch/x86/atomic.h|   59 +
 examples/performance-thread/common/arch/x86/ctx.c  |   93 +
 examples/performance-thread/common/arch/x86/ctx.h  |   57 +
 examples/performance-thread/common/common.mk   |   40 +
 examples/performance-thread/common/lthread.c   |  546 +++
 examples/performance-thread/common/lthread.h   |   99 +
 examples/performance-thread/common/lthread_api.h   |  822 +
 examples/performance-thread/common/lthread_cond.c  |  240 ++
 examples/performance-thread/common/lthread_cond.h  |   77 +
 examples/performance-thread/common/lthread_diag.c  |  321 ++
 examples/performance-thread/common/lthread_diag.h  |  129 +
 .../performance-thread/common/lthread_diag_api.h   |  319 ++
 examples/performance-thread/common/lthread_int.h   |  212 ++
 examples/performance-thread/common/lthread_mutex.c |  255 ++
 examples/performance-thread/common/lthread_mutex.h |   52 +
 .../performance-thread/common/lthread_objcache.h   |  160 +
 examples/performance-thread/common/lthread_pool.h  |  333 ++
 examples/performance-thread/common/lthread_queue.h |  303 ++
 examples/performance-thread/common/lthread_sched.c |  598 
 examples/performance-thread/common/lthread_sched.h |  152 +
 examples/performance-thread/common/lthread_timer.h |   47 +
 examples/performance-thread/common/lthread_tls.c   |  242 ++
 examples/performance-thread/common/lthread_tls.h   |   64 +
 examples/performance-thread/l3fwd-thread/Makefile  |   57 +
 examples/performance-thread/l3fwd-thread/main.c| 3628 
 36 files changed, 11734 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/img/performance_thread_1.svg
 create mode 100644 doc/guides/sample_app_ug/img/performance_thread_2.svg
 create mode 100644 doc/guides/sample_app_ug/performance_thread.rst
 create mode 100644 examples/performance-thread/Makefile
 create mode 100644 examples/performance-thread/common/arch/x86/atomic.h
 create mode 100644 examples/performance-thread/common/arch/x86/ctx.c
 create mode 100644 examples/performance-thread/common/arch/x86/ctx.h
 create mode 100644 examples/performance-thread/common/common.mk
 create mode 100644 examples/performance-thread/common/lthread.c
 create mode 100644 examples/performance-thread/common/lthread.h
 create mode 100644 examples/performance-thread/common/lthread_api.h
 create mode 100644 examples/performance-thread/common/lthread_cond.c
 create mode 100644 examples/performance-thread/common/lthread_cond.h
 create mode 100644 examples/performance-thread/common/lthread_diag.c
 create mode 100644 examples/performance-thread/common/lthread_diag.h
 create mode 100644 examples/performance-thread/common/lthread_diag_api.h
 create mode 100644 

[dpdk-dev] [PATCH] fm10k: fix a crash bug when quit from testpmd

2015-11-17 Thread Qiu, Michael
On 2015/11/12 12:58, Chen Jing D(Mark) wrote:
> From: "Chen Jing D(Mark)" 
>
> When the fm10k port is closed, both func tx_queue_clean() and
> fm10k_tx_queue_release_mbufs_vec() will try to release buffer in
> SW ring. The latter func won't do sanity check on those pointers
> and cause crash.
>
> The fix include 2 parts.
> 1. Remove Vector TX buffer release func since it can share the
>release functions with regular TX.
> 2. Add log to print out what actual Rx/Tx func is used.
>
> Signed-off-by: Chen Jing D(Mark) 
> ---

Acked-by: Michael Qiu 
>  drivers/net/fm10k/fm10k.h  |1 -
>  drivers/net/fm10k/fm10k_ethdev.c   |   17 -
>  drivers/net/fm10k/fm10k_rxtx_vec.c |   28 
>  3 files changed, 12 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> index 754aa6a..38d5489 100644
> --- a/drivers/net/fm10k/fm10k.h
> +++ b/drivers/net/fm10k/fm10k.h
> @@ -237,7 +237,6 @@ struct fm10k_tx_queue {
>  };
>  
>  struct fm10k_txq_ops {
> - void (*release_mbufs)(struct fm10k_tx_queue *txq);
>   void (*reset)(struct fm10k_tx_queue *txq);
>  };
>  
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index cf7ada7..af7b0c2 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -386,7 +386,6 @@ fm10k_check_mq_mode(struct rte_eth_dev *dev)
>  }
>  
>  static const struct fm10k_txq_ops def_txq_ops = {
> - .release_mbufs = tx_queue_free,
>   .reset = tx_queue_reset,
>  };
>  
> @@ -1073,7 +1072,7 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
>   for (i = 0; i < dev->data->nb_tx_queues; i++) {
>   struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
>  
> - txq->ops->release_mbufs(txq);
> + tx_queue_free(txq);
>   }
>   }
>  
> @@ -1793,7 +1792,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
> queue_id,
>   if (dev->data->tx_queues[queue_id] != NULL) {
>   struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
>  
> - txq->ops->release_mbufs(txq);
> + tx_queue_free(txq);
>   dev->data->tx_queues[queue_id] = NULL;
>   }
>  
> @@ -1872,7 +1871,7 @@ fm10k_tx_queue_release(void *queue)
>   struct fm10k_tx_queue *q = queue;
>   PMD_INIT_FUNC_TRACE();
>  
> - q->ops->release_mbufs(q);
> + tx_queue_free(q);
>  }
>  
>  static int
> @@ -2439,13 +2438,16 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
>   }
>  
>   if (use_sse) {
> + PMD_INIT_LOG(ERR, "Use vector Tx func");
>   for (i = 0; i < dev->data->nb_tx_queues; i++) {
>   txq = dev->data->tx_queues[i];
>   fm10k_txq_vec_setup(txq);
>   }
>   dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
> - } else
> + } else {
>   dev->tx_pkt_burst = fm10k_xmit_pkts;
> + PMD_INIT_LOG(ERR, "Use regular Tx func");
> + }
>  }
>  
>  static void __attribute__((cold))
> @@ -2469,6 +2471,11 @@ fm10k_set_rx_function(struct rte_eth_dev *dev)
>   (dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
>   dev->rx_pkt_burst == fm10k_recv_pkts_vec);
>  
> + if (rx_using_sse)
> + PMD_INIT_LOG(ERR, "Use vector Rx func");
> + else
> + PMD_INIT_LOG(ERR, "Use regular Rx func");
> +
>   for (i = 0; i < dev->data->nb_rx_queues; i++) {
>   struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
>  
> diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
> b/drivers/net/fm10k/fm10k_rxtx_vec.c
> index 06beca9..6042568 100644
> --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> @@ -45,8 +45,6 @@
>  #endif
>  
>  static void
> -fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
> -static void
>  fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
>  
>  /* Handling the offload flags (olflags) field takes computation
> @@ -634,7 +632,6 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
>  }
>  
>  static const struct fm10k_txq_ops vec_txq_ops = {
> - .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
>   .reset = fm10k_reset_tx_queue,
>  };
>  
> @@ -795,31 +792,6 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>  }
>  
>  static void __attribute__((cold))
> -fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
> -{
> - unsigned i;
> - const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
> -
> - if (txq->sw_ring == NULL || txq->nb_free == max_desc)
> - return;
> -
> - /* release the used mbufs in sw_ring */
> - for (i = txq->next_dd - (txq->rs_thresh - 1);
> -  i != txq->next_free;
> -  i = (i + 1) & max_desc)
> - rte_pktmbuf_free_seg(txq->sw_ring[i]);
> -
> - txq->nb_free = max_desc;
> -
> - /* reset 

[dpdk-dev] Recent changes related to interrupt thread

2015-11-17 Thread Ananyev, Konstantin



> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Monday, November 16, 2015 5:40 PM
> To: Ananyev, Konstantin
> Cc: Thomas Monjalon; dev at dpdk.org; Nirranjan Kirubaharan; Felix Marti; 
> Kumar Sanghvi
> Subject: Re: [dpdk-dev] Recent changes related to interrupt thread
> 
> I was thinking of something like:
> 
> rte_intr_affinity(portid, queueid, lcoreid)
> 
> And per-lcore interrupt threads.

But that's probably too expensive to have interrupt thread per each lcore.
Again, now we can have an ability to run several lcores over one physical core.

Probably 2 new API functions:
one to create a new intr thread (so user can create as as many as he needs),
second to bind , interrupt to particular interrupt thread.  
?
Again in that case, if user doesn't want to create extra interrupt threads at 
all
and just call  rte_epoll_wait() manually - he can do it that way too.

Konstantin

> 
> On Mon, Nov 16, 2015 at 9:19 AM, Ananyev, Konstantin  intel.com> wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Monday, November 16, 2015 5:07 PM
> > To: Thomas Monjalon
> > Cc: dev at dpdk.org; Nirranjan Kirubaharan; Felix Marti; Kumar Sanghvi
> > Subject: Re: [dpdk-dev] Recent changes related to interrupt thread
> >
> > On Mon, 16 Nov 2015 14:48:42 +0100
> > Thomas Monjalon  wrote:
> >
> > > Hi,
> > >
> > > 2015-11-16 18:02, Rahul Lakkireddy:
> > > > Hi,
> > > >
> > > > I notice that the following changeset:
> > > >
> > > > Fixes: fd6949c55c9a ("eal: fix io permission for virtio interrupt
> > > > handler")
> > > >
> > > > has moved the initialization of the interrupt thread to after the master
> > > > lcore has been initialized.? However, this causes the interrupt thread
> > > > to _inherit_ the affinity of the master lcore. Hence, this seems to
> > > > make all interrupts to be handled by _only_ the master lcore. Because
> > > > of this change, it seems that now alarm interrupts would also be handled
> > > > by master lcore only, IIUC.
> > > >
> > > > We are seeing a performance regression for cxgbe PMD after this commit
> > > > since, cxgbe PMD relies on alarm to periodically transmit pending
> > > > coalesced packets.
> > > >
> > > > Also, this perf degradation is only seen if there's a queue allocated
> > > > on the master lcore, such as in l3fwd app.? If the master lcore has
> > > > been skipped, then no degradation in perf is seen since only the alarm
> > > > will run on the master lcore.
> > > >
> > > > So, is the change done to make all interrupts, including alarm
> > > > interrupts, be handled by _only_ the master lcore intended?
> > >
> > > No it was not intended. The idea was to inherit settings (iopl) from
> > > the device initialization into the interrupt thread.
> > > Though a DPDK driver is not really supposed to rely on interrupt 
> > > performance.
> > > So having interrupts managed on any core was more or less a side effect.
> > >
> > > > BTW, I have tried setting the affinity to all cpus instead in
> > > > eal_intr_init() and this seems to restore the perf back. Perhaps it's
> > > > better to move the master lcore initialization to after the interrupt
> > > > thread has been initialized as well? Thoughts?
> > >
> > > Yes, i think it's possible.
> > > We can also imagine a command line option to set the interrupt affinity
> > > with a default which mimics the old behaviour.
> > >
> > > In order to make this conversation clearer, and for later references,
> > > below is the DPDK init call tree:
> > >
> >
> > With the new interrupt mode, the interrupt thread needs some rework anyway.
> > Ideally, there would be multiple interrupt threads, one per core;
> > then use SMP affinity to align the MSI-x interrupt for the device queue
> > to run on the core that is processing that queue.
> >
> > This would require new API's to do SMP affinity, wrapper around /proc/irq
> > and an API to tell DPDK which lcore is being to process a RX (and TX)
> > queue.
> There is no one to one mapping between lcore and device queue.
> Any lcore can do RX/TX on the device queue.
> Of course it is preferable to do it from the core on the same socket, but not 
> required.
> You can even have multiple threads? RX/TX from/to the same queue -
> as long as you provide some sync mechanism between them.
> Konstantin
> 
> >
> >



[dpdk-dev] URGENT please help. Issue on ixgbe_tx_free_bufs version 2.0.0

2015-11-17 Thread Bruce Richardson
On Sun, Nov 15, 2015 at 07:58:27PM -0300, Ariel Rodriguez wrote:
> Hi Bruce, im going to list the results after the test?s.
> 
> I will start with the second hint you proposed:
> 
> 2) I upgrade our custom dpdk application with the latest dpdk code (2.1.0)
> and the issue still there.
> 
> 1) I test the load balancer app with the latest dpdk code (2.1.0) with the nic
> 82599ES 10-Gigabit SFI/SFP+ with tapped traffic and the results are:
> 
>a) Work fine after 6 hours of running. (For timing issues i cant wait
> longer but the issue always happend before 5 hours of running so i supposed
> we are fine in this test).
> 
>b) I made a change to load balancer code to behave as our dpdk
> application in the workers code. This change is just for giving  the
> workers code enough load (load in terms of core frecuency) that made the rx
> core drop several packet because ring between workers and rx core is full.
> (Our application drop several packets because the workers code are not fast
> enough).
> 
>In the last test, the segmentation fault arise , just in the same
> line that i previously report.
> 
What is the workload you are putting into the worker core? Can you provide a
diff for the load balancer app that reproduces this issue, since from your
description the problem may be in the extra code added in.

/Bruce


[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Matt Laswell
Thanks, I'll give that a try.

In my environment, I'm pretty sure we're using the fully-featured
ixgbe_xmit_pkts() and not _simple().   If setting rs_thresh=1 is safer,
I'll stick with that.

Again, thanks to all for the assistance.

- Matt

On Tue, Nov 17, 2015 at 10:20 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> Hi Matt,
>
>
>
> As I said, at least  try to upgrade contents of shared code to the latest
> one.
>
> In previous releases: lib/librte_pmd_ixgbe/ixgbe, now located at:
> drivers/net/ixgbe/.
>
>
>
> > For reference, my transmit function is  rte_eth_tx_burst().
>
> I meant what ixgbe TX function it points to: ixgbe_xmit_pkts or
> ixgbe_xmit_pkts_simple()?
>
> For ixgbe_xmit_pkts_simple() don?t set tx_rs_thresh > 32,
>
> for ixgbe_xmit_pkts() the safest way is to set  tx_rs_thresh=1.
>
> Though as I understand from your previous mails, you already did that, and
> it didn?t help.
>
> Konstantin
>
>
>
>
>
> *From:* Matt Laswell [mailto:laswell at infiniteio.com]
> *Sent:* Tuesday, November 17, 2015 3:05 PM
> *To:* Ananyev, Konstantin
> *Cc:* Stephen Hemminger; dev at dpdk.org
>
> *Subject:* Re: [dpdk-dev] How to approach packet TX lockups
>
>
>
> Hey Konstantin,
>
>
>
> Moving from 1.6r2 to 2.2 is going to be a pretty significant change due to
> things like changes in the MBuf format, API differences, etc.  Even as an
> experiment, that's an awfully large change to absorb.  Is there a subset
> that you're referring to that could be more readily included without
> modifying so many touch points into DPDK?
>
>
>
> For reference, my transmit function is  rte_eth_tx_burst().  It seems to
> reliably tell me that it has enqueued all of the packets that I gave it,
> however the stats from rte_eth_stats_get() indicate that no packets are
> actually being sent.
>
>
>
> Thanks,
>
>
>
> - Matt
>
>
>
> On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
>
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> > Sent: Tuesday, November 17, 2015 2:24 PM
> > To: Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] How to approach packet TX lockups
> >
> > Yes, we're on 1.6r2.  That said, I've tried a number of different values
> > for the thresholds without a lot of luck.  Setting wthresh/hthresh/
> pthresh
> > to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> > suggested, I'm pretty sure using 0 for the thresholds leads to auto-
> config
> > by the driver.  I also tried 1/1/32, which required that I also change
> the
> > rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> > ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
> >
> > Any other suggestions?
>
> That's not only DPDK code changed since 1.6.
> I am pretty sure that we also have a new update of shared code since then
> (and as I remember probably more than one).
> One suggestion would be at least try to upgrade the shared code up to the
> latest.
> Another one - even if you can't upgrade to 2.2 in you production
> environment,
> it probably worth to do that in some test environment and then check does
> the problem persist.
> If yes,  then we'll need some guidance how to reproduce it.
>
> Another question it is not clear what TX function do you use?
> Konstantin
>
>
> >
> > On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 18:49:15 -0600
> > > Matt Laswell  wrote:
> > >
> > > > Hey Stephen,
> > > >
> > > > Thanks a lot; that's really useful information.  Unfortunately, I'm
> at a
> > > > stage in our release cycle where upgrading to a new version of DPDK
> isn't
> > > > feasible.  Any chance you (or others reading this) has a pointer to
> the
> > > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > > backporting targeted fixes is more doable.
> > > >
> > > > Again, thanks.
> > > >
> > > > - Matt
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > > stephen at networkplumber.org> wrote:
> > > >
> > > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > > Matt Laswell  wrote:
> > > > >
> > > > > > Hey Folks,
> > > > > >
> > > > > > I sent this to the users email list, but I'm not sure how many
> > > people are
> > > > > > actively reading that list at this point.  I'm dealing with a
> > > situation
> > > > > in
> > > > > > which my application loses the ability to transmit packets out
> of a
> > > port
> > > > > > during times of moderate stress.  I'd love to hear suggestions
> for
> > > how to
> > > > > > approach this problem, as I'm a bit at a loss at the moment.
> > > > > >
> > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> > > Haswell
> > > > > > processors.  I'm using the 82599 controller, configured to spread
> > > packets
> > > > > > across multiple queues.  Each queue is accessed by a different
> lcore
> 

[dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-11-17 Thread Michael S. Tsirkin
On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote:
> On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
> > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > > > > control the placement of incoming packets in RX queues.
> > > > > > > > > 
> > > > > > > > > I may not follow you.
> > > > > > > > > 
> > > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside 
> > > > > > > > > the
> > > > > > > > > guest, how could the guest take the control here?
> > > > > > > > > 
> > > > > > > > >   --yliu
> > > > > > > > 
> > > > > > > > vhost should do what guest told it to.
> > > > > > > > 
> > > > > > > > See virtio spec:
> > > > > > > > 5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > > > > 
> > > > > > > Spec says:
> > > > > > > 
> > > > > > > After the driver transmitted a packet of a flow on transmitqX,
> > > > > > > the device SHOULD cause incoming packets for that flow to be
> > > > > > > steered to receiveqX.
> > > > > > > 
> > > > > > > 
> > > > > > > Michael, I still have no idea how vhost could know the flow even
> > > > > > > after discussion with Huawei. Could you be more specific about
> > > > > > > this? Say, how could guest know that? And how could guest tell
> > > > > > > vhost which RX is gonna to use?
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > >   --yliu
> > > > > > 
> > > > > > I don't really understand the question.
> > > > > > 
> > > > > > When guests transmits a packet, it makes a decision
> > > > > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > > > > 
> > > > > > It sends packets out on the tx queue and expects device to
> > > > > > return packets from the same flow on the rx queue.
> > > > > 
> > > > > Why? I can understand that there should be a mapping between
> > > > > flows and queues in a way that there is no re-ordering, but
> > > > > I can't see the relation of receiving a flow with a TX queue.
> > > > > 
> > > > > fbl
> > > > 
> > > > That's the way virtio chose to program the rx steering logic.
> > > > 
> > > > It's low overhead (no special commands), and
> > > > works well for TCP when user is an endpoint since rx and tx
> > > > for tcp are generally tied (because of ack handling).
> 
> It is low overhead for the control plane, but not for the data plane.

Well, there's zero data plane overhead within the guest.
You can't go lower :)

> > > > We can discuss other ways, e.g. special commands for guests to
> > > > program steering.
> > > > We'd have to first see some data showing the current scheme
> > > > is problematic somehow.
> 
> The issue is that the spec assumes the packets are coming in
> a serialized way and the distribution will be made by vhost-user
> but that isn't necessarily true.
> 

Making the distribution guest controlled is obviously the right
thing to do if guest is the endpoint: we need guest scheduler to
make the decisions, it's the only entity that knows
how are tasks distributed across VCPUs.

It's possible that this is not the right thing for when guest
is just doing bridging between two VNICs:
are you saying packets should just go from RX queue N
on eth0 to TX queue N on eth1, making host make all
the queue selection decisions?

This sounds reasonable. Since there's a mix of local and
bridged traffic normally, does this mean we need
a per-packet flag that tells host to
ignore the packet for classification purposes?


> > > The issue is that the restriction imposes operations to be done in the
> > > data path.  For instance, Open vSwitch has N number of threads to manage
> > > X RX queues. We distribute them in round-robin fashion.  So, the thread
> > > polling one RX queue will do all the packet processing and push it to the
> > > TX queue of the other device (vhost-user or not) using the same 'id'.
> > > 
> > > Doing so we can avoid locking between threads and TX queues and any other
> > > extra computation while still keeping the packet ordering/distribution 
> > > fine.
> > > 
> > > However, if vhost-user has to send packets according with guest mapping,
> > > it will require locking between queues and additional operations to select
> > > the appropriate queue.  Those actions will cause performance issues.
> > 
> > You only need to send updates if guest moves a flow to another queue.
> > This is very rare since guest must avoid reordering.
> 
> OK, maybe I 

[dpdk-dev] [PATCH v2] rte_sched: release enqueued mbufs on rte_sched_port_free()

2015-11-17 Thread Simon Kagstrom
Otherwise mbufs will leak when the port is destroyed. The
rte_sched_port_qbase() and rte_sched_port_qsize() functions are used
in free now, so move them up.

Signed-off-by: Simon Kagstrom 
---
ChangeLog:

v2:
* Break long line in rte_sched_port_qbase()
* Provide some air after variable in rte_sched_port_free()
- I did not provide an API to free the buffers without freeing the
  port since I'm unsure how to manually flush the queue (without
  breaking the rest of the functionality!)

Sorry about the delay, I missed Stephens review!

 lib/librte_sched/rte_sched.c | 46 
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9c9419d..c66415d 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -312,6 +312,24 @@ rte_sched_port_queues_per_port(struct rte_sched_port *port)
return RTE_SCHED_QUEUES_PER_PIPE * port->n_pipes_per_subport * 
port->n_subports_per_port;
 }

+static inline struct rte_mbuf **
+rte_sched_port_qbase(struct rte_sched_port *port, uint32_t qindex)
+{
+   uint32_t pindex = qindex >> 4;
+   uint32_t qpos = qindex & 0xF;
+
+   return (port->queue_array + pindex *
+   port->qsize_sum + port->qsize_add[qpos]);
+}
+
+static inline uint16_t
+rte_sched_port_qsize(struct rte_sched_port *port, uint32_t qindex)
+{
+   uint32_t tc = (qindex >> 2) & 0x3;
+
+   return port->qsize[tc];
+}
+
 static int
 rte_sched_port_check_params(struct rte_sched_port_params *params)
 {
@@ -717,11 +735,22 @@ rte_sched_port_config(struct rte_sched_port_params 
*params)
 void
 rte_sched_port_free(struct rte_sched_port *port)
 {
+   unsigned int queue;
+
/* Check user parameters */
if (port == NULL){
return;
}

+   /* Free enqueued mbufs */
+   for (queue = 0; queue < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; queue++) {
+   struct rte_mbuf **mbufs = rte_sched_port_qbase(port, queue);
+   unsigned int i;
+
+   for (i = 0; i < rte_sched_port_qsize(port, queue); i++)
+   rte_pktmbuf_free(mbufs[i]);
+   }
+
rte_bitmap_free(port->bmp);
rte_free(port);
 }
@@ -1032,23 +1061,6 @@ rte_sched_port_qindex(struct rte_sched_port *port, 
uint32_t subport, uint32_t pi
return result;
 }

-static inline struct rte_mbuf **
-rte_sched_port_qbase(struct rte_sched_port *port, uint32_t qindex)
-{
-   uint32_t pindex = qindex >> 4;
-   uint32_t qpos = qindex & 0xF;
-
-   return (port->queue_array + pindex * port->qsize_sum + 
port->qsize_add[qpos]);
-}
-
-static inline uint16_t
-rte_sched_port_qsize(struct rte_sched_port *port, uint32_t qindex)
-{
-   uint32_t tc = (qindex >> 2) & 0x3;
-
-   return port->qsize[tc];
-}
-
 #if RTE_SCHED_DEBUG

 static inline int
-- 
1.9.1



[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Matt Laswell
Hey Konstantin,

Moving from 1.6r2 to 2.2 is going to be a pretty significant change due to
things like changes in the MBuf format, API differences, etc.  Even as an
experiment, that's an awfully large change to absorb.  Is there a subset
that you're referring to that could be more readily included without
modifying so many touch points into DPDK?

For reference, my transmit function is  rte_eth_tx_burst().  It seems to
reliably tell me that it has enqueued all of the packets that I gave it,
however the stats from rte_eth_stats_get() indicate that no packets are
actually being sent.

Thanks,

- Matt

On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> > Sent: Tuesday, November 17, 2015 2:24 PM
> > To: Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] How to approach packet TX lockups
> >
> > Yes, we're on 1.6r2.  That said, I've tried a number of different values
> > for the thresholds without a lot of luck.  Setting
> wthresh/hthresh/pthresh
> > to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> > suggested, I'm pretty sure using 0 for the thresholds leads to
> auto-config
> > by the driver.  I also tried 1/1/32, which required that I also change
> the
> > rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> > ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
> >
> > Any other suggestions?
>
> That's not only DPDK code changed since 1.6.
> I am pretty sure that we also have a new update of shared code since then
> (and as I remember probably more than one).
> One suggestion would be at least try to upgrade the shared code up to the
> latest.
> Another one - even if you can't upgrade to 2.2 in you production
> environment,
> it probably worth to do that in some test environment and then check does
> the problem persist.
> If yes,  then we'll need some guidance how to reproduce it.
>
> Another question it is not clear what TX function do you use?
> Konstantin
>
> >
> > On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 18:49:15 -0600
> > > Matt Laswell  wrote:
> > >
> > > > Hey Stephen,
> > > >
> > > > Thanks a lot; that's really useful information.  Unfortunately, I'm
> at a
> > > > stage in our release cycle where upgrading to a new version of DPDK
> isn't
> > > > feasible.  Any chance you (or others reading this) has a pointer to
> the
> > > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > > backporting targeted fixes is more doable.
> > > >
> > > > Again, thanks.
> > > >
> > > > - Matt
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > > stephen at networkplumber.org> wrote:
> > > >
> > > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > > Matt Laswell  wrote:
> > > > >
> > > > > > Hey Folks,
> > > > > >
> > > > > > I sent this to the users email list, but I'm not sure how many
> > > people are
> > > > > > actively reading that list at this point.  I'm dealing with a
> > > situation
> > > > > in
> > > > > > which my application loses the ability to transmit packets out
> of a
> > > port
> > > > > > during times of moderate stress.  I'd love to hear suggestions
> for
> > > how to
> > > > > > approach this problem, as I'm a bit at a loss at the moment.
> > > > > >
> > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> > > Haswell
> > > > > > processors.  I'm using the 82599 controller, configured to spread
> > > packets
> > > > > > across multiple queues.  Each queue is accessed by a different
> lcore
> > > in
> > > > > my
> > > > > > application; there is therefore concurrent access to the
> controller,
> > > but
> > > > > > not to any of the queues.  We're binding the ports to the igb_uio
> > > driver.
> > > > > > The symptoms I see are these:
> > > > > >
> > > > > >
> > > > > >- All transmit out of a particular port stops
> > > > > >- rte_eth_tx_burst() indicates that it is sending all of the
> > > packets
> > > > > >that I give to it
> > > > > >- rte_eth_stats_get() gives me stats indicating that no
> packets
> > > are
> > > > > >being sent on the affected port.  Also, no tx errors, and no
> pause
> > > > > frames
> > > > > >sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> > > > > >- All other ports continue to work normally
> > > > > >- The affected port continues to receive packets without
> problems;
> > > > > only
> > > > > >TX is affected
> > > > > >- Resetting the port via rte_eth_dev_stop() and
> > > rte_eth_dev_start()
> > > > > >restores things and packets can flow again
> > > > > >- The problem is replicable on multiple devices, and doesn't
> > > follow
> > > > > one
> > > > > >particular port
> > > > > >
> > > > > > I've tried calling 

[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?

2015-11-17 Thread Roger B. Melton
Hi David,  in-line -Roger

On 11/16/15 4:46 AM, David Marchand wrote:
> Hello Roger,
>
> On Sun, Nov 15, 2015 at 3:45 PM, Roger B. Melton  > wrote:
>
> I like the "-b all" and "-w none" idea, but I think it might be
> complicated to implement it the way we would need it to work.  The
> existing -b and -w options  persist for the duration of the
> application, and we would need the "-b all"/"-w none" to persists
> only through rte_eal_init() time.  Otherwise our attempt to to
> attach a device at a later time would be blocked by the option.
>
>
> I agree, the black/white lists should only apply to initial scan.
> I forgot about this problem ...
> I had started some cleanup in the pci scan / attach code but this is 
> too late for 2.2, I will post this in the next merge window.
>
>
> Wouldn't it be simpler to have an option to disable the
> rte_eal_init() time the probe.  Would that address the issue with
> VFIO, prevent automatically attaching to devices while permitting
> on demand attach?
>
>
> I suppose we can do this yes (I think Thomas once proposed off-list an 
> option like --no-pci-scan).
> Do you think you can send a patch ?

What about --no-pci-init-probe?  I know it's long, but it is more 
descriptive of it's purpose to disable only the init time pci probe.

I code and test and have it ready.  I'm still working through internal 
processes to allow me to submit patches, but I hope to have that 
resolved in the next few weeks and at that time I can submit the patch.
>
>
> -- 
> David Marchand
>



[dpdk-dev] [PATCH] rte_sched: release enqueued mbufs on rte_sched_port_free()

2015-11-17 Thread Simon Kågström
On 2015-11-04 19:14, Stephen Hemminger wrote:
> On Wed, 28 Oct 2015 10:56:33 +0100
> Simon Kagstrom  wrote:
> 
>> Otherwise mbufs will leak when the port is destroyed. The
>> rte_sched_port_qbase() and rte_sched_port_qsize() functions are used
>> in free now, so move them up.
>>
>> Signed-off-by: Simon Kagstrom 
> 
> Overall it looks good, and fixes a long standing bug.
> Maybe good to expose it as a API function rte_sched_port_flush
> to allow use from applications.

I'm sorry, I missed this reply! I will fix the issues you point to and
repost.

// Simon



[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-17 Thread Rich Lane
On Tue, Nov 17, 2015 at 5:23 AM, Yuanhan Liu 
wrote:

> On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> > You can reproduce this with l2fwd and the vhost PMD.
> >
> > You'll need this patch on top of the vhost PMD patches:
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> > return -1;
> >
> > if (dev->flags & VIRTIO_DEV_RUNNING)
> > -   notify_ops->destroy_device(dev);
> > +   notify_destroy_device(dev);
> >
> > cleanup_device(dev);
> > reset_device(dev);
> >
> > 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> > eth_vhost0,iface=/run/vhost0.sock -- -p3
> > 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> > 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
> > 4. Kill the l2fwd inside the VM with SIGINT.
> > 5. Start l2fwd inside the VM.
> > 6. l2fwd on the host crashes.
> >
> > I found the source of the memory corruption by setting a watchpoint in
> > gdb: watch -l rte_eth_devices[1].data->rx_queues
>
> Rich,
>
> Thanks for the detailed steps for reproducing this issue, and sorry for
> being a bit late: I finally got the time to dig this issue today.
>
> Put simply, buffer overflow is not the root cause, but the fact "we do
> not release resource on stop/exit" is.
>
> And here is how the issue comes.  After step 4 (terminating l2fwd), neither
> the l2fwd nor the virtio pmd driver does some resource release. Hence,
> l2fwd at HOST will not notice such chage, still trying to receive and
> queue packets to the vhost dev. It's not an issue as far as we don't
> start l2fwd again, for there is actaully no packets to forward, and
> rte_vhost_dequeue_burst returns from:
>
> 596 avail_idx =  *((volatile uint16_t *)>avail->idx);
> 597
> 598 /* If there are no available buffers then return. */
> 599 if (vq->last_used_idx == avail_idx)
> 600 return 0;
>
> But just at the init stage while starting l2fwd (step 5),
> rte_eal_memory_init()
> resets all huge pages memory to zero, resulting all vq->desc[] items
> being reset to zero, which in turn ends up with secure_len being set
> with 0 at return.
>
> (BTW, I'm not quite sure why the inside VM huge pages memory reset
> would results to vq->desc reset).
>
> The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
> as update_secure_len() keeps setting secure_len with 0:
>
> 511do {
> 512avail_idx = *((volatile uint16_t
> *)>avail->idx);
> 513if (unlikely(res_cur_idx == avail_idx))
> {
> 514LOG_DEBUG(VHOST_DATA,
> 515"(%"PRIu64") Failed "
> 516"to get enough desc
> from "
> 517"vring\n",
> 518dev->device_fh);
> 519goto merge_rx_exit;
> 520} else {
> 521update_secure_len(vq,
> res_cur_idx, _len, _idx);
> 522res_cur_idx++;
> 523}
> 524} while (pkt_len > secure_len);
>
> The dead loop causes vec_idx keep increasing then, and overflows
> quickly, leading to the crash in the end as you saw.
>
> So, the following would resolve this issue, in a right way (I
> guess), and it's for virtio-pmd and l2fwd only so far.
>
> ---
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 12fcc23..8d6bf56 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1507,9 +1507,12 @@ static void
>  virtio_dev_stop(struct rte_eth_dev *dev)
>  {
> struct rte_eth_link link;
> +   struct virtio_hw *hw = dev->data->dev_private;
>
> PMD_INIT_LOG(DEBUG, "stop");
>
> +   vtpci_reset(hw);
> +
> if (dev->data->dev_conf.intr_conf.lsc)
> rte_intr_disable(>pci_dev->intr_handle);
>
> diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
> index 720fd5a..565f648 100644
> --- a/examples/l2fwd/main.c
> +++ b/examples/l2fwd/main.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num,
> uint32_t port_mask)
> }
>  }
>
> +static uint8_t nb_ports;
> +static uint8_t nb_ports_available;
> +
> +/* When we receive a INT signal, unregister vhost driver */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> +   uint8_t portid;
> +
> +   for (portid = 0; portid < nb_ports; portid++) {
> +   /* skip ports that are not enabled */
> 

[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Matt Laswell
Yes, we're on 1.6r2.  That said, I've tried a number of different values
for the thresholds without a lot of luck.  Setting wthresh/hthresh/pthresh
to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
suggested, I'm pretty sure using 0 for the thresholds leads to auto-config
by the driver.  I also tried 1/1/32, which required that I also change the
rs_thresh value from 0 to 1 to work around a panic in PMD initialization
("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").

Any other suggestions?

On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Mon, 16 Nov 2015 18:49:15 -0600
> Matt Laswell  wrote:
>
> > Hey Stephen,
> >
> > Thanks a lot; that's really useful information.  Unfortunately, I'm at a
> > stage in our release cycle where upgrading to a new version of DPDK isn't
> > feasible.  Any chance you (or others reading this) has a pointer to the
> > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > backporting targeted fixes is more doable.
> >
> > Again, thanks.
> >
> > - Matt
> >
> >
> > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > Matt Laswell  wrote:
> > >
> > > > Hey Folks,
> > > >
> > > > I sent this to the users email list, but I'm not sure how many
> people are
> > > > actively reading that list at this point.  I'm dealing with a
> situation
> > > in
> > > > which my application loses the ability to transmit packets out of a
> port
> > > > during times of moderate stress.  I'd love to hear suggestions for
> how to
> > > > approach this problem, as I'm a bit at a loss at the moment.
> > > >
> > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> Haswell
> > > > processors.  I'm using the 82599 controller, configured to spread
> packets
> > > > across multiple queues.  Each queue is accessed by a different lcore
> in
> > > my
> > > > application; there is therefore concurrent access to the controller,
> but
> > > > not to any of the queues.  We're binding the ports to the igb_uio
> driver.
> > > > The symptoms I see are these:
> > > >
> > > >
> > > >- All transmit out of a particular port stops
> > > >- rte_eth_tx_burst() indicates that it is sending all of the
> packets
> > > >that I give to it
> > > >- rte_eth_stats_get() gives me stats indicating that no packets
> are
> > > >being sent on the affected port.  Also, no tx errors, and no pause
> > > frames
> > > >sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> > > >- All other ports continue to work normally
> > > >- The affected port continues to receive packets without problems;
> > > only
> > > >TX is affected
> > > >- Resetting the port via rte_eth_dev_stop() and
> rte_eth_dev_start()
> > > >restores things and packets can flow again
> > > >- The problem is replicable on multiple devices, and doesn't
> follow
> > > one
> > > >particular port
> > > >
> > > > I've tried calling rte_mbuf_sanity_check() on all packets before
> sending
> > > > them.  I've also instrumented my code to look for packets that have
> > > already
> > > > been sent or freed, as well as cycles in chained packets being
> sent.  I
> > > > also put a lock around all accesses to rte_eth* calls to synchronize
> > > access
> > > > to the NIC.  Given some recent discussion here, I also tried
> changing the
> > > > TX RS threshold from 0 to 32, 16, and 1.  None of these strategies
> proved
> > > > effective.
> > > >
> > > > Like I said at the top, I'm a little at a loss at this point.  If you
> > > were
> > > > dealing with this set of symptoms, how would you proceed?
> > > >
> > >
> > > I remember some issues with old DPDK 1.6 with some of the prefetch
> > > thresholds on 82599. You would be better off going to a later DPDK
> > > version.
> > >
>
> I hope you are on 1.6.0r2 at least??
>
> With older DPDK there was no way to get driver to tell you what the
> preferred settings were for pthresh/hthresh/wthresh. And the values
> in Intel sample applications were broken on some hardware.
>
> I remember reverse engineering the safe values from reading the Linux
> driver.
>
> The Linux driver is much better tested than the DPDK one...
> In the Linux driver, the Transmit Descriptor Controller (txdctl)
> is fixed at (for transmit)
>wthresh = 1
>hthresh = 1
>pthresh = 32
>
> The DPDK 2.2 driver uses:
> wthresh = 0
> hthresh = 0
> pthresh = 32
>
>
>
>
>
>
>


[dpdk-dev] [PATCH v4 2/2] ethdev: add sanity checks to functions

2015-11-17 Thread Stephen Hemminger
On Tue, 17 Nov 2015 12:21:07 +
Bruce Richardson  wrote:

> The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are
> supported by very few PMDs. Therefore, it is best to check for support
> for the functions in the ethdev library, so as to avoid run-time crashes
> at run-time if the application goes to use those APIs. Similarly, the
> port parameter should also be checked for validity.
> 
> Signed-off-by: Bruce Richardson 
> 
> ---
>  lib/librte_ether/rte_ethdev.h | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index a00cd46..028be59 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -2533,16 +2533,16 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
>   * @param queue_id
>   *  The queue id on the specific port.
>   * @return
> - *  The number of used descriptors in the specific queue.
> + *  The number of used descriptors in the specific queue, or:
> + * (-EINVAL) if *port_id* is invalid
> + * (-ENOTSUP) if the device does not support this function
>   */
> -static inline uint32_t
> +static inline int
>  rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id)
>  {
>   struct rte_eth_dev *dev = _eth_devices[port_id];
> -#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> - RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, 0);
> -#endif
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, -ENOTSUP);
>  return (*dev->dev_ops->rx_queue_count)(dev, queue_id);
>  }
>  
> @@ -2559,15 +2559,14 @@ rte_eth_rx_queue_count(uint8_t port_id, uint16_t 
> queue_id)
>   *  - (1) if the specific DD bit is set.
>   *  - (0) if the specific DD bit is not set.
>   *  - (-ENODEV) if *port_id* invalid.
> + *  - (-ENOTSUP) if the device does not support this function
>   */
>  static inline int
>  rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t 
> offset)
>  {
>   struct rte_eth_dev *dev = _eth_devices[port_id];
> -#ifdef RTE_LIBRTE_ETHDEV_DEBUG
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
> -#endif
>   return (*dev->dev_ops->rx_descriptor_done)( \
>   dev->data->rx_queues[queue_id], offset);
>  }

This breaks ABI since older application built with debug will try
and find the shared library entry for the routine.


[dpdk-dev] [PATCH 1/2] ixgbe: fix vfio ioctl SET_IRQS error

2015-11-17 Thread Lu, Wenzhuo
Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liu, Yong
> Sent: Friday, November 13, 2015 2:08 PM
> To: Liang, Cunming ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/2] ixgbe: fix vfio ioctl SET_IRQS error
> 
> Tested-by: Yong Liu 
> 
> > -Original Message-
> > From: Liang, Cunming
> > Sent: Friday, November 13, 2015 10:50 AM
> > To: dev at dpdk.org
> > Cc: Liu, Yong; Liang, Cunming
> > Subject: [PATCH 1/2] ixgbe: fix vfio ioctl SET_IRQS error
> >
> > The vector number may change during 'dev_start'. Before enabling a new
> > vector mapping, it's necessary to disable/unmap the previous setting.
> >
> > Fixes: 7ab8500037f6 ("ixgbe: fix VF start with PF stopped")
> >
> > Reported-by: Yong Liu 
> > Signed-off-by: Cunming Liang 
Acked-by: Wenzhuo Lu 


[dpdk-dev] [PATCH 2/2] igb: fix vfio ioctl SET_IRQS error

2015-11-17 Thread Lu, Wenzhuo
Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Friday, November 13, 2015 10:50 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 2/2] igb: fix vfio ioctl SET_IRQS error
> 
> The vector number may change during 'dev_start'. Before enabling a new
> vector mapping, it's necessary to disable/unmap the previous setting.
> 
> Fixes: fe685de2b1b6 ("igb: fix VF start with PF stopped")
> 
> Reported-by: Yong Liu 
> Signed-off-by: Cunming Liang 
Acked-by: Wenzhuo Lu 



[dpdk-dev] [PATCH 4/4] fm10k: remove crc size from all byte counters

2015-11-17 Thread Qiu, Michael
Hi, Harry

Have you ever tested this patch by yourself?

fm10k's stats should already remove the crc bytes by default.

After your patch applied, if send a packet without vlan(64 bytes),
we expect receive 60 bytes, but it will disappoint you, that only
56 bytes shows in system.

Thanks,
Michael

On 2015/11/16 18:36, Harry van Haaren wrote:
> This patch removes the crc bytes from byte counter statistics.
>
> Signed-off-by: Harry van Haaren 
> ---
>  drivers/net/fm10k/fm10k_ethdev.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index 441f713..fdb2e81 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -1183,11 +1183,13 @@ fm10k_stats_get(struct rte_eth_dev *dev, struct 
> rte_eth_stats *stats)
>  
>   ipackets = opackets = ibytes = obytes = 0;
>   for (i = 0; (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) &&
> - (i < hw->mac.max_queues); ++i) {
> + (i < hw->mac.max_queues); ++i) {
>   stats->q_ipackets[i] = hw_stats->q[i].rx_packets.count;
>   stats->q_opackets[i] = hw_stats->q[i].tx_packets.count;
> - stats->q_ibytes[i]   = hw_stats->q[i].rx_bytes.count;
> - stats->q_obytes[i]   = hw_stats->q[i].tx_bytes.count;
> + stats->q_ibytes[i]   = hw_stats->q[i].rx_bytes.count -
> + (stats->q_ipackets[i] * 4);
> + stats->q_obytes[i]   = hw_stats->q[i].tx_bytes.count -
> + (stats->q_opackets[i] * 4);
>   ipackets += stats->q_ipackets[i];
>   opackets += stats->q_opackets[i];
>   ibytes   += stats->q_ibytes[i];