date:20151013

[dpdk-dev] DPDK User Space: Session onUseability and Ease of Use

2015-10-13 Thread Thomas Monjalon

Thanks John for the summary.

2015-10-13 16:36, Mcnamara, John:
>   - Move the EAL to the kernel.

Please explain what you mean here.
It's difficult to imagine.

> * Latest version of the docs. - Needs support from 6Wind.

OK
It's as simple as a git hook.
Anybody to write and test one locally?
The script could be hosted in the future website git repo.

> * Distributed testing. - Needs support from Intel initially.
>   Some of this is already being rolled out on the
>   test-report at dpdk.org list for Intel hardware:
>   http://dpdk.org/ml/archives/test-report/. Other hardware
>   vendors could use the same automated test framework to host
>   something similar.

IBM has first started a daily compilation test.
Others are welcome.
We had a small session dedicated to this topic during the useability session.

> * Create a User Mailing List. - Needs support from 6Wind.

OK
Name: user at dpdk.org or other suggestion?

> * Make the dpdk.org website patchable. - Needs support from
>   6Wind.

OK
Needs a maintainer, a name for the git repo, the mailing list and the patchwork.

> * Add a Contributing Guide. - I will submit a doc patch.
> 
> * Add a README .txt or .1st to the root dir. - I will submit a
>   doc patch.

Thanks

> * Too much duplicated code in the PMDs. - Any volunteers to
>   refactor common PMD code up into the ethdev layer?

We need more details.
Please start a new thread or a RFC.

> * Logging and debugging via a secondary process. Any volunteers
>   to add a sample app that demonstrates the technique?

There is already one:
http://dpdk.org/browse/dpdk/tree/app/proc_info/main.c#n289

[dpdk-dev] Question about unsupported transceivers

2015-10-13 Thread Alex Forster

I believe I've discovered my problem: 
https://gist.github.com/AlexForster/0fb4699bcdf196cf5462

As mentioned previously, I have two X520-Q1 cards installed. It appears that 
initialization of the first card obeys allow_unsupported_sfp=1, but 
initialization of the second card does not.

Is this a bug, or is there a way to work around this that I'm not aware of?

Alex Forster

[dpdk-dev] [PATCH v5 resend 07/12] virtio: resolve for control queue

2015-10-13 Thread Yuanhan Liu

On Mon, Oct 12, 2015 at 10:58:17PM +0200, Steffen Bauch wrote:
> On 10/12/2015 10:39 AM, Yuanhan Liu wrote:
> >Hi,
> >
> >I just recognized that this dead loop is the same one that I have
> >experienced (see
> >http://dpdk.org/ml/archives/dev/2015-October/024737.html for
> >reference). Just applying the changes in this patch (only 07/12)
> >will not fix the dead loop at least in my setup.
> >Try to enable CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT, and dump more log?
> I enabled the additional debug output. First try was without any
> additional changes in master, but it blocked also. Second try was
> with
> 
> [dpdk-dev] [PATCH v6 06/13] virtio: read virtio_net_config correctly
> 
> applied, but same result.
> 
> If you want to recreate my setup, just follow instructions in
> 
> http://dpdk.org/ml/archives/dev/2015-October/024737.html
> 
> 
> vagrant at vagrant-ubuntu-vivid-64:~/dpdk$ git status
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   config/defconfig_x86_64-native-linuxapp-gcc
> 
> ..

Don't have clear clue there. But you could try Huawei's solution first.
It's likely that it will fix your problem.

If not, would you please try to reproduce it with qemu (you were using
virtualbox, right)?  And then dump the whoe command line here so that I
can try to reproduce and debug it on my side. Sorry that I don't use
virtualbox, as well as vagrant.

--yliu

> 
> vagrant at vagrant-ubuntu-vivid-64:~/dpdk/x86_64-native-linuxapp-gcc/app$
> sudo ./testpmd -b :00:03.0 -c 3 -n 1 -- -i
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Support maximum 128 logical core(s) by configuration.
> EAL: Detected 2 lcore(s)
> EAL: VFIO modules not all loaded, skip VFIO support...
> EAL: Setting up physically contiguous memory...
> EAL: Ask a virtual area of 0x40 bytes
> EAL: Virtual area found at 0x7f2a3a80 (size = 0x40)
> EAL: Ask a virtual area of 0xe00 bytes
> EAL: Virtual area found at 0x7f2a2c60 (size = 0xe00)
> EAL: Ask a virtual area of 0x30c0 bytes
> EAL: Virtual area found at 0x7f29fb80 (size = 0x30c0)
> EAL: Ask a virtual area of 0x40 bytes
> EAL: Virtual area found at 0x7f29fb20 (size = 0x40)
> EAL: Ask a virtual area of 0xa0 bytes
> EAL: Virtual area found at 0x7f29fa60 (size = 0xa0)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7f29fa20 (size = 0x20)
> EAL: Requesting 512 pages of size 2MB from socket 0
> EAL: TSC frequency is ~2198491 KHz
> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
> unreliable clock cycles !
> EAL: Master lcore 0 is ready (tid=3c9938c0;cpuset=[0])
> EAL: lcore 1 is ready (tid=fa1ff700;cpuset=[1])
> EAL: PCI device :00:03.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> EAL:   Device is blacklisted, not initializing
> EAL: PCI device :00:08.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> PMD: parse_sysfs_value(): parse_sysfs_value(): cannot open sysfs
> value /sys/bus/pci/devices/:00:08.0/uio/uio0/portio/port0/size
> PMD: virtio_resource_init_by_uio(): virtio_resource_init_by_uio():
> cannot parse size
> PMD: virtio_resource_init_by_ioports(): PCI Port IO found
> start=0xd040 with size=0x20
> PMD: virtio_negotiate_features(): guest_features before negotiate = cf8020
> PMD: virtio_negotiate_features(): host_features before negotiate = 410fdda3
> PMD: virtio_negotiate_features(): features after negotiate = f8020
> PMD: eth_virtio_dev_init(): PORT MAC: 08:00:27:CC:DE:CD
> PMD: eth_virtio_dev_init(): VIRTIO_NET_F_MQ is not supported
> PMD: virtio_dev_cq_queue_setup():  >>
> PMD: virtio_dev_queue_setup(): selecting queue: 2
> PMD: virtio_dev_queue_setup(): vq_size: 16 nb_desc:0
> PMD: virtio_dev_queue_setup(): vring_size: 4228, rounded_vring_size: 8192
> PMD: virtio_dev_queue_setup(): vq->vq_ring_mem:  0x67b54000
> PMD: virtio_dev_queue_setup(): vq->vq_ring_virt_mem: 0x7f29fb354000
> PMD: eth_virtio_dev_init(): config->max_virtqueue_pairs=1
> PMD: eth_virtio_dev_init(): config->status=1
> PMD: eth_virtio_dev_init(): PORT MAC: 08:00:27:CC:DE:CD
> PMD: eth_virtio_dev_init(): hw->max_rx_queues=1 hw->max_tx_queues=1
> PMD: eth_virtio_dev_init(): port 0 vendorID=0x1af4 deviceID=0x1000
> PMD: virtio_dev_vring_start():  >>
> EAL: PCI device :00:09.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> PMD: parse_sysfs_value(): parse_sysfs_value(): cannot open sysfs
> value /sys/bus/pci/devices/:00:09.0/uio/uio1/portio/port0/size
> PMD: virtio_resource_init_by_uio(): virtio_resource_init_by_uio():
> cannot parse size
> PMD: virtio_resource_init_by_ioports(): PCI Port IO found
> start=0xd060 with size=0x20
> PMD: virtio_negotiate_features(): guest_feature

[dpdk-dev] [PATCH] rte_alarm: modify it to make it not to be affected by discontinuous jumps in the system time

2015-10-13 Thread Stephen Hemminger

On Fri,  5 Jun 2015 10:46:36 +0800
Wen-Chi Yang  wrote:

> Due to eal_alarm_callback() and rte_eal_alarm_set() use gettimeofday()
> to get the current time, and gettimeofday() is affected by jumps.
> 
> For example, set up a rte_alarm which will be triggerd next second (
> current time + 1 second) by rte_eal_alarm_set(). And the callback
> function of this rte_alarm sets up another rte_alarm which will be
> triggered next second (current time + 2 second).
> Once we change the system time when the callback function is triggered,
> it is possiblb that rte alarm functionalities work out of expectation.
> 
> Replace gettimeofday() with clock_gettime(CLOCK_MONOTONIC_RAW, &now)
> could avoid this phenomenon.
> 
> Signed-off-by: Wen-Chi Yang 

Agreed, this should be applied.
Does BSD version have same problem?

Acked-by: Stephen Hemminger

[dpdk-dev] [PATCH v3] Implement memcmp using Intel SIMD instrinsics.

2015-10-13 Thread Stephen Hemminger

On Mon, 18 May 2015 13:01:43 -0700
Ravi Kerur  wrote:

> This patch implements memcmp and use librte_hash as the first candidate
> to use rte_memcmp which is implemented using AVX/SSE intrinsics.
> 
> Tested with GCC(4.8.2) and Clang(3.4-1) compilers and both tests show better
> performance on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04
> x86_64 shows when compared to memcmp.
> 
> Changes in v3:
> Implement complete memcmp functionality.
> Implement functional and performance tests and add it to
> "make test" infrastructure code.
> 
> Changes in v2:
> Modified code to support only upto 64 bytes as that's the max bytes
> used by hash for comparison.
> 
> Changes in v1:
> Initial changes to support memcmp with support upto 128 bytes.
> 
> Signed-off-by: Ravi Kerur 

I think this idea is best taken over to glibc not here.
The issue is that Gcc default version of memcmp inline is bad and that
is what needs to be fixed.

See later discussion in email thread with Gcc intrinsic developer.

[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes

2015-10-13 Thread Nissim Nisimov

Hi Bruce,

Using "--base-virtaddr" requires knowledge on the huge pages wanted address 
going to be used and might vary on different uses of the application.

We suggest a more generic solution which wont require any previous knowledge 
and will be "bullet proof" as much as possible.

Regards,
Nissim

On Oct 13, 2015 18:49, "Richardson, Bruce"  
wrote:


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov
> Sent: Tuesday, October 13, 2015 4:40 PM
> To: 'dev at dpdk.org'
> Subject: [dpdk-dev] propose a solution for mapping same virtual address
> space to asymmetric processes
>
> Hi all,
>
> The below will try to suggest a modification to the initialization of
> Environment Abstraction Layer (AKA EAL) so it will be able to allocate
> memory zones from same virtual memory addresses even if the primary
> process is not similar to the secondary processes.
>
> Problem:
> The DPDK Primary/Secondary model requires that the exact same hugepage
> memory mappings be present in all applications.
> An issue may occur when the Primary and secondary processes are not
> symmetric in such way that the code has big differences (for example,
> Primary process is a traffic distributer and secondary is a worker).
> The result may be that specific virtual address region in the first
> process won't be available in the second process.
>
>
> Suggested solution:
> Map all related rte and uio sections somewhere close to the end of huge
> pages memory (that mean rte_eal_memory_init() should be called before
> rte_config_init() in primary process) According to our observations there
> will be more probability to success when allocating the above sections
> after huge pages section (actually uio is already allocated after the huge
> pages area)
>
> It solved our problem when trying to work with a primary traffic
> distributer which is a very "light" process and few secondary worker
> processes.
>
>
> Please share your thoughts on this before I will try to commit our patch
> for review
>
> Thanks,
> Nissim

Hi,

out of interest, have you tried fixing the issue using the "--base-virtaddr" 
EAL flag to hint a base address to the primary process? It was put into the 
code some time ago to help solve exactly this problem.

/Bruce

[dpdk-dev] Question about unsupported transceivers

2015-10-13 Thread Alex Forster

Hi everybody, apologies for coming to this list with a tech support question.

I'm completely stumped about using non-Intel transceivers with DPDK. testpmd is 
bailing here: PMD: eth_ixgbe_dev_init(): Unsupported SFP+ Module / PMD: 
eth_ixgbe_dev_init(): Hardware Initialization Failure: -19

My box is an x64 server running Debian 8 (Jessie) with two X520-Q1 cards using 
Finisar QSFP transceivers. Here are the things that I've tried so far, 
unsuccessfully-

  *   Added CONFIG_RTE_LIBRTE_IXGBE_ALLOW_UNSUPPORTED_SFP=y to 
config/defconfig_x86_64-native-linuxapp-gcc and rebuilt/reinstalled/rebooted
  *   Tried various incantations of modprobe/insmod with 
allow_unsupported_sfp=1 appended
  *   Added options ixgbe allow_unsupported_sfp=1 to /etc/modprobe.d/dpdk.conf 
and rebuilt the initrd

Can anybody lead me in the right direction here? It seems like a lot of the 
information floating around about this issue may be out of date.

Alex Forster

[dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1

2015-10-13 Thread Don Provan

Actually, this is a good opportunity to fix a bug that's been in this code 
forever: it shouldn't be resetting optind to some arbitrary value: it should be 
saving optind (and optarg and optopt) at the beginning, initializing optind to 
1 before calling getopt_long(), then restoring all the values after. (And, from 
what you're saying, optreset should be handled the same as optind.)

This avoids broken behavior if rte_eal_init() is called by code that's in the 
middle of using getopt() to parse its own unrelated argc/argv parameters. 

-don provan
dprovan at bivio.net

-Original Message-
From: Tiwei Bie [mailto:b...@mail.ustc.edu.cn] 
Sent: Tuesday, October 13, 2015 1:54 AM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1

The variable optind must be reinitialized to 1 in order to skip over argv[0] on 
FreeBSD. Because getopt() on FreeBSD will return -1 when it meets an argument 
which doesn't start with '-'.

The variable optreset is provided on FreeBSD to indicate the additional set of 
calls to getopt(). So, also reinitialize it to 1.

Signed-off-by: Tiwei Bie 
---
 lib/librte_eal/bsdapp/eal/eal.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c 
index 1b6f705..35feaee 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -334,7 +334,8 @@ eal_log_level_parse(int argc, char **argv)
break;
}

-   optind = 0; /* reset getopt lib */
+   optind = 1; /* reset getopt lib */
+   optreset = 1;
 }

 /* Parse the argument given in the command line of the application */ @@ 
-403,7 +404,8 @@ eal_parse_args(int argc, char **argv)
if (optind >= 0)
argv[optind-1] = prgname;
ret = optind-1;
-   optind = 0; /* reset getopt lib */
+   optind = 1; /* reset getopt lib */
+   optreset = 1;
return ret;
 }

--
2.6.0

[dpdk-dev] [PATCH v8 0/9] Dynamic memzones

2015-10-13 Thread Stephen Hemminger

On Tue, 14 Jul 2015 09:57:04 +0100
Sergio Gonzalez Monroy  wrote:

> Current implemetation allows reserving/creating memzones but not the opposite
> (unreserve/free). This affects mempools and other memzone based objects.
> 
> From my point of view, implementing free functionality for memzones would look
> like malloc over memsegs.
> Thus, this approach moves malloc inside eal (which in turn removes a circular
> dependency), where malloc heaps are composed of memsegs.
> We keep both malloc and memzone APIs as they are, but memzones allocate its
> memory by calling malloc_heap_alloc.
> Some extra functionality is required in malloc to allow for boundary 
> constrained
> memory requests.
> In summary, currently malloc is based on memzones, and with this approach
> memzones are based on malloc.
> 
> v8:
>  - Rebase against current HEAD to factor for changes made by new Tile-Gx arch

Following rules in kernel. You need to fix the 32 bit build and resubmit whole
series.

Thomas, this patchset should be marked "Changes requested" in patchwork.

[dpdk-dev] [PATCH v2] Clean up rte_memcpy.h file

2015-10-13 Thread Stephen Hemminger

On Mon, 20 Apr 2015 13:33:29 -0700
Ravi Kerur  wrote:

> Remove unnecessary type casting in functions.
> 
> Tested on Ubuntu (14.04 x86_64) with "make test".
> "make test" results match the results with baseline.
> "Memcpy perf" results match the results with baseline.
> 
> Signed-off-by: Ravi Kerur 

Getting rid of casts looks good.
My guess is no one reviewed it because no one is using rte_memcpy much..


Acked-by: Stephen Hemminger

[dpdk-dev] [PATCH 2/2] virtio: change io privilege level as early as possible

2015-10-13 Thread Stephen Hemminger

On Thu, 1 Oct 2015 07:25:45 -0400
Neil Horman  wrote:

> On Wed, Sep 30, 2015 at 05:37:05PM +0200, Thomas Monjalon wrote:
> > 2015-09-30 10:52, Neil Horman:
> > > On Wed, Sep 30, 2015 at 10:28:53AM +0200, David Marchand wrote:
> > > > On Tue, Sep 29, 2015 at 9:25 PM, Stephen Hemminger <
> > > > stephen at networkplumber.org> wrote:
> > > > 
> > > > > On Tue, 10 Mar 2015 09:14:28 -0400
> > > > > Neil Horman  wrote:
> > > > > > I don't see how this works for all cases.  The constructor is called
> > > > > once when
> > > > > > the library is first loaded.  What if you have multiple independent
> > > > > (i.e. not
> > > > > > forked children) processes that are using the dpdk in parallel?  
> > > > > > Only the
> > > > > > process that triggered the library load will have io permissions set
> > > > > > appropriately.  I think what you need is to have every application 
> > > > > > that
> > > > > expects
> > > > > > to call through the transmit path or poll the receive path call 
> > > > > > iopl,
> > > > > which I
> > > > > > think speaks to having this requirement documented, so each 
> > > > > > application
> > > > > can call
> > > > > > iopl prior to calling fork/daemonize/etc.
> > > > > >
> > > > >
> > > > > I am still seeing this problem with DPDK 2.0 and 2.1.
> > > > > It seems to me that doing the iopl init in eal_init is the only safe 
> > > > > way.
> > > > > Other workaround is to have application calling iopl_init before 
> > > > > eal_init
> > > > > but that kind of violates the current method of all things being
> > > > > initialized by eal_init
> > > > 
> > > > Putting it in the virtio pmd constructor is my preferred solution and we
> > > > don't need to pollute the eal for virtio (specific to x86, btw).
> > > 
> > > Preferred solution or not, you can't just call iopl from the constructor,
> > > because not all process will get appropriate permissions.  It needs to be 
> > > called
> > > by every process.  What Stephen is saying is that your solution has use 
> > > cases
> > > for which it doesn't work, and that needs to be solved.
> > 
> > I think it may be solved by calling iopl in the constructor.
> > We just need an extra call in rte_virtio_pmd_init() to detect iopl failures.
> > We can also simply move rte_eal_intr_init() after rte_eal_dev_init().
> > Please read my previous post on this topic:
> > 
> > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/14761/focus=22341
> > 
> > About the multiprocess case, I don't see the problem as the RX/TX and 
> > interrupt
> > threads are forked in the rte_eal_init() context which should call iopl 
> > even in
> > secondary processes.
> > 
> 
> I'm not talking about secondary processes here (i.e. processes forked from a
> parent that was the process which initialized the dpdk).  I'm referring to two
> completely independent processes, both of which link to and use the dpdk.
> 
> Though I think we're saying the same thing.  When you say 'constructor' above,
> you don't mean 'constructor' in the strict sense, but rather the pmd init
> routine (the one called from rte_eal_vdev_init and rte_eal_dev_init).  If this
> is the case, then yes, that works fine, since each process linking to the DPDK
> will enter those routines and call iopl.  In fact, if thats the case, then no
> call is needed in the constructor at all.

I think this patch should be rebased and resubmitted for 2.2.

It fixes a real problem (virtio link state). The driver changed directory
and the the patch could be redone to minimize changes.

[dpdk-dev] [PATCH 1/2] eal/linux: move plugin load to very start of eal init

2015-10-13 Thread Stephen Hemminger

On Tue, 10 Mar 2015 06:55:41 -0400
Neil Horman  wrote:

> On Tue, Mar 10, 2015 at 10:08:24AM +0100, David Marchand wrote:
> > Hello Neil,
> > 
> > On Mon, Mar 9, 2015 at 4:21 PM, Neil Horman  
> > wrote:
> > 
> > > On Mon, Mar 09, 2015 at 03:56:38PM +0100, David Marchand wrote:
> > > > Loading shared libraries should be done at the very start of eal init so
> > > that
> > > > the code statically built in dpdk and the code loaded from shared
> > > objects is
> > > > handled (almost) the same way wrt to call to rte_eal_init().
> > > > The only thing that must be done before is filling the solib_list which
> > > is done
> > > > by eal_parse_args().
> > > >
> > >
> > >
> > > I don't see anything explicitly wrong with this, but at the same time it
> > > doesn't
> > > seem to fix anything.  Is there a particular bug that you're fixing in
> > > relation
> > > to your cover letter here?  Or is there some expectation that PMD's loaded
> > > in
> > > this fashion expect the dpdk to be completely uninitalized?  That would
> > > seem
> > > like a strange operational requirement to me.
> > >
> > 
> > Well, at first, I wanted to fix the virtio pmd init issue (iopl() not
> > called at the right place wrt to other pthreads created in rte_eal_init()).
> Ah, this is what you were addressing:
> http://dpdk.org/ml/archives/dev/2015-March/014765.html
> 
> > With next patch, this issue is fixed for statically builtin virtio pmd, but
> > for virtio pmd as a shared object, the dlopen comes too late.
> > So, yes, I moved the dlopen() for this reason.
> > 
> But this doesn't do anything to help you.  The goal, according to the above
> thread, is to initalize the pmd earlier so that you can call iopl prior to 
> doing
> any forks (so that io privlidges are inherited).  But both static and dynamic
> pmd have constructors that just register their driver structures.  No
> initalization happens until rte_eal_dev_init is called.  So this movement does
> nothing to change the time any given drivers init routine is called.
> 
> > From a more general point of view, since we support both static and dso
> > pmds, I would say that this is more logical to have dlopen comes very
> > early, since static code is "loaded" even earlier : if the current pmds
> > needed more than just register to the driver list, they would already have
> > triggered segfaults and/or bugs.
> > 
> No, not really.  I suppose it doesn't hurt anything, but moving this earlier 
> in
> a function doesn't really buy you anything, as statically allocate pmds are
> called by the gcc start code prior to an applications main routine running, so
> we're never actually going to get close to parity there, nor do we need to,
> because the actual init happens at rte_eal_dev_init, which is in parity for 
> both
> static and dynamic drivers.
> 
> > 
> > I know this change comes really late for 2.0.
> > I am open to other ideas but I don't want to see more #ifdef 
> > in eal.c (especially for a pmd), this is a non sense.
> > 
> > I would say that at least the patch 2 is needed for 2.0 : it fixes the
> > static case, but without patch 1 virtio pmd triggers a segfault on
> > interrupt receipt when built as a dso.
> > 
> The static case suffers from problems as well I think, in that its possible to
> architect multiple processes that are not started from fork that use the same
> pmd, which would create the same issue.  I think a better course of action 
> would
> be to document the need for an application to call iopl before rte_eal_init.
> 

Given all this, I recommend that Thomas not apply this patch.
Please resubmit if there is a real problem with drivers (something in tree).
There are enough other bugs to fix without chasing ghosts.

[dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1

2015-10-13 Thread Tiwei Bie

The variable optind must be reinitialized to 1 in order to skip over
argv[0] on FreeBSD. Because getopt() on FreeBSD will return -1 when
it meets an argument which doesn't start with '-'.

The variable optreset is provided on FreeBSD to indicate the additional
set of calls to getopt(). So, also reinitialize it to 1.

Signed-off-by: Tiwei Bie 
---
 lib/librte_eal/bsdapp/eal/eal.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 1b6f705..35feaee 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -334,7 +334,8 @@ eal_log_level_parse(int argc, char **argv)
break;
}

-   optind = 0; /* reset getopt lib */
+   optind = 1; /* reset getopt lib */
+   optreset = 1;
 }

 /* Parse the argument given in the command line of the application */
@@ -403,7 +404,8 @@ eal_parse_args(int argc, char **argv)
if (optind >= 0)
argv[optind-1] = prgname;
ret = optind-1;
-   optind = 0; /* reset getopt lib */
+   optind = 1; /* reset getopt lib */
+   optreset = 1;
return ret;
 }

-- 
2.6.0

[dpdk-dev] [PATCH] Found a bug related to getopt() in eal/bsd module

2015-10-13 Thread Tiwei Bie

I found a bug when trying to make my DPDK application work on FreeBSD.
The variable optind must be reinitialized to 1 on FreeBSD to skip over
argv[0]. Because getopt() on FreeBSD will return -1 when it meets an
argument which doesn't start with '-'. This behaviour is implemented
by the 13-17 lines:

01 /*
02  * getopt --
03  *   Parse argc/argv argument vector.
04  */
05 int
06 getopt(int nargc, char * const nargv[], const char *ostr)
07 {
08  static char *place = EMSG;  /* option letter processing */
09  char *oli;  /* option letter list index */
10
11  if (optreset || *place == 0) {  /* update scanning pointer */
12  optreset = 0;
13  place = nargv[optind];
14  if (optind >= nargc || *place++ != '-') {
15  /* Argument is absent or is not an option */
16  place = EMSG;
17  return (-1);
18  }
19  ..
20  }
21  ..
22 }

The variable optreset is also provided on FreeBSD to indicate the
additional set of calls to getopt(). So, also reinitialize it to 1.

References:

1. https://svnweb.freebsd.org/base/head/lib/libc/stdlib/getopt.c?view=markup#l70
2. 
https://www.freebsd.org/cgi/man.cgi?query=getopt&apropos=0&sektion=3&manpath=FreeBSD+11-current&arch=default&format=html

Tiwei Bie (1):
  eal/bsd: reinitialize optind and optreset to 1

 lib/librte_eal/bsdapp/eal/eal.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
2.6.0

[dpdk-dev] DPDK User Space: Session onUseability and Ease of Use

2015-10-13 Thread Mcnamara, John

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas F Herbert
> Sent: Thursday, October 8, 2015 7:30 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] DPDK User Space: Session onUseability and Ease of Use
> 
> All:
> 
> Captured white board notes from Jim McNamara's session on Useability and
> Ease of Use at DPDK User Space today are here:
> 
> http://people.redhat.com/therbert/Useability_and_Ease_of_Use_DPDK_User_Space/


Hi Tom,

Thanks for that. Here is my summary of the "Usability and Ease of
Use" session from memory and notes. Correction and additions welcome.

* Latest version of the DPDK docs

  - (From an earlier session). Add a doc/latest build of the docs
to dpdk.org.

* PMD lite

  - Do we need a lighter PMD model? Perhaps based on the Mellanox
model.

  - Vincent suggested be could remove 90% of the code. I'll leave
Vincent explain this one.

* OvS issues with Usability

  - Discussion of the DPSK usability issues highlighted on the
OvS mailing list.

  - http://openvswitch.org/pipermail/dev/2015-August/058814.html

* Distributed testing

  - There should be some form of distributed testing so patches
can be tested on OSes and hardware that a dev doesn't have.

  - Suggestion to have an Open Lab/Pod similar to the OPNFV model
that participants can use.

* Debuggability

  - User expectation for tools like tcpdump, ip, ipconfig to work
with DPDK bound NICs.

  - Easier in a pipeline application where debug is added as a
pipeline stage.

  - Maybe add debug hooks via rx/tx callbacks.

  - Add/extend a solution based on KNI.

  - Use systemd naming algorithm for KNI.

* Create a User Mailing List

  - An observation was made that the dev at dpdk.org list was very
developer orientated and patch heavy.

  - Suggestion to add a user at dpdk.org mailing list for people
with issues or subjects that aren't development related.

  - This seams easy to implement. It may not be well supported
however resulting in users cross posting into dev at dpdk.org.

  - Probably worth trying anyway.

* Make the dpdk.org website patchable

  - There are already plans to host the dpdk.org code in a git
repo.

* Add a Contributing Guide.

  - We are at the stage where we need one.

  - Suggestion to just use the Kernel guide.

  - Tailor it for DPDK.

  - Also explain the review process, acks, nacks, etc.

* Add a README .txt or .1st to the root dir.

  - This could just include links to the getting started guides
and other docs. Either to the online docs or how to build the
local html versions.

* EAL annoyances

  - Move the EAL to the kernel.

  - Have more/better/all default options. EAL figures out its own
requirements.

  - Have a default for -n.

* Hugepage consumption

  - Do not allow DPDK applications to grab all available hugepages.

  - Issues with running DPDK apps in tandem with other hugepage
hungry apps such as Java/Eclipse.

* rte_malloc()

  - Don't use rte_malloc() for non critical objects where
malloc() would do.

  - Suggestion to allow the type of required memory to be
specified by rte_malloc()-like function.

* The Build system

  - Make install needs to be improved. Doesn't so what the user expects.

  - Use autotools and configure. (There were some objections that
this may not be an improvement.)

  - Use kconfig.

  - Keep going with what we have now until it gets too unwieldy
and needs to be changed. Then use kconfig.

  - Add better support for cross compilation. Useful for arm target.

* Should DPDK applications be running as root

  - Clearly not a great option.

  - Currently required due to kernel.

* Mempool debugging

  - We need better tools to debug memory leaks in the mempools.

  - Suggestion to do this via a valgrind plugin.

* Kernel management of drivers

* Too much duplicated code in the PMDs

  - Duplicated code has crept in organically as PMDs have been
added.

  - Should be moved up to moved up to the ethdev level

* Logging and debugging via a secondary process

  - Not a well known technique but very useful/powerful

* Run DPDK as a daemon.

* Issues with config files

  - Too many options turned off by default: code paths don't get
compiled/tested.

* More sample apps

  - Some more examples of using secondary processes.



Of these we the following could be addressed in the near term:


* Latest version of the docs. - Needs support from 6Wind.

* Distributed testing. - Needs support from Intel initially.
  Some of this is already being rolled out on the
  test-report at dpdk.org list for Intel hardware:
  http://dpdk.org/ml/archives/test-report/. Other hardware
  vendors could use the same automated test framework to host
  something similar.

* Debuggability. - Need some volunteers or workable suggestions.

* Create a User Mailing List. - Needs support from 6Wind.

* Make the dpdk.org website patchable. - Needs support from
  6Wind.

* Add a Cont

[dpdk-dev] [PATCH 2/4] rte_ring: store memzone pointer inside ring

2015-10-13 Thread Olivier MATZ

Hi Bruce,

On 09/30/2015 02:12 PM, Bruce Richardson wrote:
> Add a new field to the rte_ring structure to store the memzone pointer which
> contains the ring. For rings created using rte_ring_create(), the field will
> be set automatically.
> 
> This new field will allow users of the ring to query the numa node a ring is
> allocated on, or to get the physical address of the ring, if so needed.
> 
> The rte_ring structure will also maintain ABI compatibility, as the
> structure members, after the new one, are set to be cache line aligned,
> so leaving a space.
> 
> Signed-off-by: Bruce Richardson 

Acked-by: Olivier Matz

[dpdk-dev] [PATCH] crc: deinline crc functions

2015-10-13 Thread Richardson, Bruce

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Friday, October 2, 2015 12:38 AM
> To: Richardson, Bruce; De Lara Guarch, Pablo
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH] crc: deinline crc functions
> 
> Because the CRC functions are inline and defined purely in the header
> file, every component that uses these functions gets its own copy
> of the software CRC table which is a big space waster.
> 
> Just deinline which give better long term ABI stablity anyway.
> 
> Signed-off-by: Stephen Hemminger 

While I think it's a good idea to de-inline the functions that
do the calculations using the lookup tables, I think the
functions that consist of a single assembly instruction should be
kept as inline. 

/Bruce

[dpdk-dev] [PATCH v4] ring: add function to free a ring

2015-10-13 Thread Olivier MATZ

Hi Pablo,

On 10/02/2015 05:53 PM, Pablo de Lara wrote:
> From: "De Lara Guarch, Pablo" 
> 
> When creating a ring, a memzone is created to allocate it in memory,
> but the ring could not be freed, as memzones could not be.
> 
> Since memzones can be freed now, then rings can be as well,
> taking into account if they were initialized using pre-allocated memory
> (in which case, memory should be freed externally) or using 
> rte_memzone_reserve
> (with rte_ring_create), freeing the memory with rte_memzone_free.
> 
> Signed-off-by: Pablo de Lara 

Acked-by: Olivier Matz

[dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1

2015-10-13 Thread Bruce Richardson

On Tue, Oct 13, 2015 at 04:54:06PM +0800, Tiwei Bie wrote:
> The variable optind must be reinitialized to 1 in order to skip over
> argv[0] on FreeBSD. Because getopt() on FreeBSD will return -1 when
> it meets an argument which doesn't start with '-'.
> 
> The variable optreset is provided on FreeBSD to indicate the additional
> set of calls to getopt(). So, also reinitialize it to 1.
> 
> Signed-off-by: Tiwei Bie 

Acked-by: Bruce Richardson

[dpdk-dev] [PATCH] ethdev: remove the imissed deprecation tag

2015-10-13 Thread Olivier MATZ

Hi Maryam,

On 09/30/2015 10:20 AM, Maryam Tahhan wrote:
> Remove the deprecation tag and notice for imissed.
> 
> Signed-off-by: Maryam Tahhan 
> ---
>  doc/guides/rel_notes/deprecation.rst | 2 +-
>  lib/librte_ether/rte_ethdev.h| 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)

Could you please add some more details about why it is finally
kept? I think it could be helpful for people that did not follow
the thread http://dpdk.org/dev/patchwork/patch/6410/

You can also reference the commit id of the patch that
introduced the deprecation notice.

It could also be a good occasion to remind the definition of
imissed: number of packets dropped by hardware because the software
does not poll fast enough (= queue full)

Thanks!
Olivier

[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes

2015-10-13 Thread Richardson, Bruce



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov
> Sent: Tuesday, October 13, 2015 4:40 PM
> To: 'dev at dpdk.org'
> Subject: [dpdk-dev] propose a solution for mapping same virtual address
> space to asymmetric processes
> 
> Hi all,
> 
> The below will try to suggest a modification to the initialization of
> Environment Abstraction Layer (AKA EAL) so it will be able to allocate
> memory zones from same virtual memory addresses even if the primary
> process is not similar to the secondary processes.
> 
> Problem:
> The DPDK Primary/Secondary model requires that the exact same hugepage
> memory mappings be present in all applications.
> An issue may occur when the Primary and secondary processes are not
> symmetric in such way that the code has big differences (for example,
> Primary process is a traffic distributer and secondary is a worker).
> The result may be that specific virtual address region in the first
> process won't be available in the second process.
> 
> 
> Suggested solution:
> Map all related rte and uio sections somewhere close to the end of huge
> pages memory (that mean rte_eal_memory_init() should be called before
> rte_config_init() in primary process) According to our observations there
> will be more probability to success when allocating the above sections
> after huge pages section (actually uio is already allocated after the huge
> pages area)
> 
> It solved our problem when trying to work with a primary traffic
> distributer which is a very "light" process and few secondary worker
> processes.
> 
> 
> Please share your thoughts on this before I will try to commit our patch
> for review
> 
> Thanks,
> Nissim

Hi,

out of interest, have you tried fixing the issue using the "--base-virtaddr" 
EAL flag to hint a base address to the primary process? It was put into the 
code some time ago to help solve exactly this problem.

/Bruce

[dpdk-dev] [PATCH 2/2] igb: fix VF statistic wraparound handling macro

2015-10-13 Thread Roger B. Melton

ack

On 10/12/15 12:45 PM, Harry van Haaren wrote:
> Fix a misinterpreatation of VF statistic macro in e1000/igb.
>
> Signed-off-by: Harry van Haaren 
> ---
>   drivers/net/e1000/igb_ethdev.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
> index 848ef6e..e4911fc 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -246,11 +246,10 @@ static void eth_igb_configure_msix_intr(struct 
> rte_eth_dev *dev);
>   #define UPDATE_VF_STAT(reg, last, cur)\
>   { \
>   u32 latest = E1000_READ_REG(hw, reg); \
> - cur += latest - last; \
> + cur += (latest-last) & UINT_MAX;  \
>   last = latest;\
>   }
>   
> -
>   #define IGB_FC_PAUSE_TIME 0x0680
>   #define IGB_LINK_UPDATE_CHECK_TIMEOUT  90  /* 9s */
>   #define IGB_LINK_UPDATE_CHECK_INTERVAL 100 /* ms */

-- 

|Roger B. Melton|  |  Cisco Systems  |
|CPP Software  :|::|: 7100 Kit Creek Rd  |
|+1.919.476.2332 phone:|||:  :|||:RTP, NC 27709-4987 |
|+1.919.392.1094 fax   .:|||:..:|||:. rmelton at cisco.com  |
||
| This email may contain confidential and privileged material for the|
| sole use of the intended recipient. Any review, use, distribution  |
| or disclosure by others is strictly prohibited. If you are not the |
| intended recipient (or authorized to receive for the recipient),   |
| please contact the sender by reply email and delete all copies of  |
| this message.  |
||
| For corporate legal information go to: |
| http://www.cisco.com/web/about/doing_business/legal/cri/index.html |
|__ http://www.cisco.com |

[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro

2015-10-13 Thread Roger B. Melton

ack

On 10/12/15 12:45 PM, Harry van Haaren wrote:
> Fix a misinterpretation of VF stats in ixgbe
>
> Signed-off-by: Harry van Haaren 
> ---
>   drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c 
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index ec2918c..86dcd87 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -329,10 +329,10 @@ static int ixgbe_timesync_read_tx_timestamp(struct 
> rte_eth_dev *dev,
>   /*
>* Define VF Stats MACRO for Non "cleared on read" register
>*/
> -#define UPDATE_VF_STAT(reg, last, cur)   \
> +#define UPDATE_VF_STAT(reg, last, cur)  \
>   {   \
>   uint32_t latest = IXGBE_READ_REG(hw, reg);  \
> - cur += latest - last;   \
> + cur += (latest-last) & UINT_MAX;\
>   last = latest;  \
>   }
>   

-- 

|Roger B. Melton|  |  Cisco Systems  |
|CPP Software  :|::|: 7100 Kit Creek Rd  |
|+1.919.476.2332 phone:|||:  :|||:RTP, NC 27709-4987 |
|+1.919.392.1094 fax   .:|||:..:|||:. rmelton at cisco.com  |
||
| This email may contain confidential and privileged material for the|
| sole use of the intended recipient. Any review, use, distribution  |
| or disclosure by others is strictly prohibited. If you are not the |
| intended recipient (or authorized to receive for the recipient),   |
| please contact the sender by reply email and delete all copies of  |
| this message.  |
||
| For corporate legal information go to: |
| http://www.cisco.com/web/about/doing_business/legal/cri/index.html |
|__ http://www.cisco.com |

[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro

2015-10-13 Thread Roger B. Melton

Agreed, this handles the off by one error on wrap around and should be 
faster.

-Roger

On 10/12/15 11:41 AM, Alexander Duyck wrote:
> On 10/12/2015 06:33 AM, Harry van Haaren wrote:
>> Fix a misinterpretation of VF stats in ixgbe
>>
>> Signed-off-by: Harry van Haaren 
>> ---
>>   drivers/net/ixgbe/ixgbe_ethdev.c | 8 ++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c 
>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>> index ec2918c..d226e8d 100644
>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>> @@ -329,10 +329,14 @@ static int 
>> ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev,
>>   /*
>>* Define VF Stats MACRO for Non "cleared on read" register
>>*/
>> -#define UPDATE_VF_STAT(reg, last, cur)\
>> +#define UPDATE_VF_STAT(reg, last, cur) \
>> { \
>>   uint32_t latest = IXGBE_READ_REG(hw, reg);  \
>> -cur += latest - last;   \
>> +if(likely(latest > last)) { \
>> +cur += latest - last;   \
>> +} else {\
>> +cur += (UINT_MAX - last) + latest;  \
>> +}   \
>>   last = latest;  \
>>   }
>
> From what I can tell your math is adding an off by one error.  You 
> should probably be using UINT_MAX as a mask for the result, not as a 
> part of the calculation itself.
>
> So the correct way to compute this would be "cur += (latest - last) & 
> UINT_MAX".  Also the mask approach should be faster as it avoids any 
> conditional jumps.
>
> - Alex
> .
>

-- 

|Roger B. Melton|  |  Cisco Systems  |
|CPP Software  :|::|: 7100 Kit Creek Rd  |
|+1.919.476.2332 phone:|||:  :|||:RTP, NC 27709-4987 |
|+1.919.392.1094 fax   .:|||:..:|||:. rmelton at cisco.com  |
||
| This email may contain confidential and privileged material for the|
| sole use of the intended recipient. Any review, use, distribution  |
| or disclosure by others is strictly prohibited. If you are not the |
| intended recipient (or authorized to receive for the recipient),   |
| please contact the sender by reply email and delete all copies of  |
| this message.  |
||
| For corporate legal information go to: |
| http://www.cisco.com/web/about/doing_business/legal/cri/index.html |
|__ http://www.cisco.com |

[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes

2015-10-13 Thread Nissim Nisimov

Hi all,

The below will try to suggest a modification to the initialization of 
Environment Abstraction Layer (AKA EAL) so it will be able to allocate memory 
zones from same virtual memory addresses even if the primary process is not 
similar to the secondary processes.

Problem:
The DPDK Primary/Secondary model requires that the exact same hugepage memory 
mappings be present in all applications.
An issue may occur when the Primary and secondary processes are not symmetric 
in such way that the code has big differences (for example, Primary process is 
a traffic distributer and secondary is a worker).
The result may be that specific virtual address region in the first process 
won't be available in the second process.


Suggested solution:
Map all related rte and uio sections somewhere close to the end of huge pages 
memory (that mean rte_eal_memory_init() should be called before 
rte_config_init() in primary process)
According to our observations there will be more probability to success when 
allocating the above sections after huge pages section (actually uio is already 
allocated after the huge pages area)

It solved our problem when trying to work with a primary traffic distributer 
which is a very "light" process and few secondary worker processes.


Please share your thoughts on this before I will try to commit our patch for 
review

Thanks,
Nissim

[dpdk-dev] [PATCH] examples/vmdq: Fix the core dump issue when mem_pool is more than 34

2015-10-13 Thread Xutao Sun

Macro MAX_QUEUES was defined to 128, only allow 16 mem_pools in theory.
When running vmdq_app with more than 34 mem_pools,
it will cause the core_dump issue.
Change MAX_QUEUES to 1024 will solve this issue.

Signed-off-by: Xutao Sun 
---
 examples/vmdq/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c
index a142d49..b463cfb 100644
--- a/examples/vmdq/main.c
+++ b/examples/vmdq/main.c
@@ -69,7 +69,7 @@
 #include 
 #include 

-#define MAX_QUEUES 128
+#define MAX_QUEUES 1024
 /*
  * For 10 GbE, 128 queues require roughly
  * 128*512 (RX/TX_queue_nb * RX/TX_ring_descriptors_nb) per port.
-- 
1.9.3

[dpdk-dev] [PATCH] Add error message when trying to use make option T= during build/clean

2015-10-13 Thread Olivier MATZ

Hi Francesco,

On 09/29/2015 06:04 PM, Francesco Montorsi wrote:
> From: Francesco Montorsi 
> 
> ---
>  mk/rte.sdkbuild.mk | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mk/rte.sdkbuild.mk b/mk/rte.sdkbuild.mk
> index 38ec7bd..013aa89 100644
> --- a/mk/rte.sdkbuild.mk
> +++ b/mk/rte.sdkbuild.mk
> @@ -29,6 +29,12 @@
>  #   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>  #   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>  
> +ifdef T
> +  ifeq ("$(origin T)", "command line")
> +$(error "Cannot use T= with a build/clean target")
> +  endif
> +endif
> +
>  # If DESTDIR variable is given, install binary dpdk

I tested this patch but it breaks the "make install" command:

  $ make install T=x86_64-native-linuxapp-gcc
  make[5]: Nothing to be done for 'depdirs'.
  Configuration done
  rte.sdkbuild.mk:34: *** "Cannot use T= with a build/clean target".

As the T= argument is given as a command line variable, it is
propagated to the "$(MAKE) all" in rte.sdkinstall.mk.
So I think it's better to keep the current code as is, except if
you have a better idea.

Regards,
Olivier

[dpdk-dev] [PATCH 3/3] i40evf: add support of AQ based RSS config

2015-10-13 Thread Helin Zhang

It supports both Admin queue based and directly writing registers
based RSS hash key and lookup table configuration, as X722 supports
AQ based configuration.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.h|   3 +
 drivers/net/i40e/i40e_ethdev_vf.c | 230 --
 2 files changed, 173 insertions(+), 60 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 57366ac..a8d8cac 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -449,6 +449,7 @@ struct i40e_vf {
struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */
struct i40e_virtchnl_vsi_resource *vsi_res; /* LAN VSI */
struct i40e_vsi vsi;
+   uint64_t flags;
 };

 /*
@@ -541,6 +542,8 @@ i40e_get_vsi_from_adapter(struct i40e_adapter *adapter)
(&(((struct i40e_vsi *)vsi)->adapter->hw))
 #define I40E_VSI_TO_PF(vsi) \
(&(((struct i40e_vsi *)vsi)->adapter->pf))
+#define I40E_VSI_TO_VF(vsi) \
+   (&(((struct i40e_vsi *)vsi)->adapter->vf))
 #define I40E_VSI_TO_DEV_DATA(vsi) \
(((struct i40e_vsi *)vsi)->adapter->pf.dev_data)
 #define I40E_VSI_TO_ETH_DEV(vsi) \
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index b694400..02ee87b 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1126,9 +1126,12 @@ i40evf_init_vf(struct rte_eth_dev *dev)
goto err_alloc;
}

+   if (hw->mac.type == I40E_MAC_X722_VF)
+   vf->flags = I40E_FLAG_RSS_AQ_CAPABLE;
vf->vsi.vsi_id = vf->vsi_res->vsi_id;
vf->vsi.type = vf->vsi_res->vsi_type;
vf->vsi.nb_qps = vf->vsi_res->num_queue_pairs;
+   vf->vsi.adapter = I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);

/* check mac addr, if it's not valid, genrate one */
if (I40E_SUCCESS != i40e_validate_mac_addr(\
@@ -1778,15 +1781,71 @@ i40evf_dev_close(struct rte_eth_dev *dev)
 }

 static int
+i40evf_get_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size)
+{
+   struct i40e_vf *vf = I40E_VSI_TO_VF(vsi);
+   struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
+   int ret;
+
+   if (!lut)
+   return -EINVAL;
+
+   if (vf->flags & I40E_FLAG_RSS_AQ_CAPABLE) {
+   ret = i40e_aq_get_rss_lut(hw, vsi->vsi_id, FALSE,
+ lut, lut_size);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to get RSS lookup table");
+   return ret;
+   }
+   } else {
+   uint32_t *lut_dw = (uint32_t *)lut;
+   uint16_t i, lut_size_dw = lut_size / 4;
+
+   for (i = 0; i < lut_size_dw; i++)
+   lut_dw[i] = I40E_READ_REG(hw, I40E_VFQF_HLUT(i));
+   }
+
+   return 0;
+}
+
+static int
+i40evf_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size)
+{
+   struct i40e_vf *vf = I40E_VSI_TO_VF(vsi);
+   struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
+   int ret;
+
+   if (!vsi || !lut)
+   return -EINVAL;
+
+   if (vf->flags & I40E_FLAG_RSS_AQ_CAPABLE) {
+   ret = i40e_aq_set_rss_lut(hw, vsi->vsi_id, FALSE,
+ lut, lut_size);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to set RSS lookup table");
+   return ret;
+   }
+   } else {
+   uint32_t *lut_dw = (uint32_t *)lut;
+   uint16_t i, lut_size_dw = lut_size / 4;
+
+   for (i = 0; i < lut_size_dw; i++)
+   I40E_WRITE_REG(hw, I40E_VFQF_HLUT(i), lut_dw[i]);
+   I40EVF_WRITE_FLUSH(hw);
+   }
+
+   return 0;
+}
+
+static int
 i40evf_dev_rss_reta_update(struct rte_eth_dev *dev,
   struct rte_eth_rss_reta_entry64 *reta_conf,
   uint16_t reta_size)
 {
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t lut, l;
-   uint16_t i, j;
-   uint16_t idx, shift;
-   uint8_t mask;
+   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   uint8_t *lut;
+   uint16_t i, idx, shift;
+   int ret;

if (reta_size != ETH_RSS_RETA_SIZE_64) {
PMD_DRV_LOG(ERR, "The size of hash lookup table configured "
@@ -1795,29 +1854,26 @@ i40evf_dev_rss_reta_update(struct rte_eth_dev *dev,
return -EINVAL;
}

-   for (i = 0; i < reta_size; i += I40E_4_BIT_WIDTH) {
+   lut = rte_zmalloc("i40e_rss_lut", reta_size, 0);
+   if (!lut) {
+   PMD_DRV_LOG(ERR, "No memory can be allocated");
+   return -ENOMEM;
+   }
+   ret = i40evf_get_rss_lut(&vf->vsi, lut, reta_size);
+   if (ret)
+   goto out;
+   for (i = 0; i < reta_size; i++) {
idx = i / RTE_RETA

[dpdk-dev] [PATCH 2/3] i40e: add support of AQ based RSS config

2015-10-13 Thread Helin Zhang

It supports both Admin queue based and directly writing registers
based RSS hash key and lookup table configuration, as X722 supports
AQ based configuration.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.c | 229 ++---
 drivers/net/i40e/i40e_ethdev.h |   4 +-
 2 files changed, 173 insertions(+), 60 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2dd9fdc..0f4ef5b 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1994,16 +1994,72 @@ i40e_mac_filter_handle(struct rte_eth_dev *dev, enum 
rte_filter_op filter_op,
 }

 static int
+i40e_get_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size)
+{
+   struct i40e_pf *pf = I40E_VSI_TO_PF(vsi);
+   struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
+   int ret;
+
+   if (!lut)
+   return -EINVAL;
+
+   if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE) {
+   ret = i40e_aq_get_rss_lut(hw, vsi->vsi_id, TRUE,
+ lut, lut_size);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to get RSS lookup table");
+   return ret;
+   }
+   } else {
+   uint32_t *lut_dw = (uint32_t *)lut;
+   uint16_t i, lut_size_dw = lut_size / 4;
+
+   for (i = 0; i < lut_size_dw; i++)
+   lut_dw[i] = I40E_READ_REG(hw, I40E_PFQF_HLUT(i));
+   }
+
+   return 0;
+}
+
+static int
+i40e_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size)
+{
+   struct i40e_pf *pf = I40E_VSI_TO_PF(vsi);
+   struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
+   int ret;
+
+   if (!vsi || !lut)
+   return -EINVAL;
+
+   if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE) {
+   ret = i40e_aq_set_rss_lut(hw, vsi->vsi_id, TRUE,
+ lut, lut_size);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to set RSS lookup table");
+   return ret;
+   }
+   } else {
+   uint32_t *lut_dw = (uint32_t *)lut;
+   uint16_t i, lut_size_dw = lut_size / 4;
+
+   for (i = 0; i < lut_size_dw; i++)
+   I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i), lut_dw[i]);
+   I40E_WRITE_FLUSH(hw);
+   }
+
+   return 0;
+}
+
+static int
 i40e_dev_rss_reta_update(struct rte_eth_dev *dev,
 struct rte_eth_rss_reta_entry64 *reta_conf,
 uint16_t reta_size)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t lut, l;
-   uint16_t i, j, lut_size = pf->hash_lut_size;
+   uint16_t i, lut_size = pf->hash_lut_size;
uint16_t idx, shift;
-   uint8_t mask;
+   uint8_t *lut;
+   int ret;

if (reta_size != lut_size ||
reta_size > ETH_RSS_RETA_SIZE_512) {
@@ -2013,28 +2069,26 @@ i40e_dev_rss_reta_update(struct rte_eth_dev *dev,
return -EINVAL;
}

-   for (i = 0; i < reta_size; i += I40E_4_BIT_WIDTH) {
+   lut = rte_zmalloc("i40e_rss_lut", reta_size, 0);
+   if (!lut) {
+   PMD_DRV_LOG(ERR, "No memory can be allocated");
+   return -ENOMEM;
+   }
+   ret = i40e_get_rss_lut(pf->main_vsi, lut, reta_size);
+   if (ret)
+   goto out;
+   for (i = 0; i < reta_size; i++) {
idx = i / RTE_RETA_GROUP_SIZE;
shift = i % RTE_RETA_GROUP_SIZE;
-   mask = (uint8_t)((reta_conf[idx].mask >> shift) &
-   I40E_4_BIT_MASK);
-   if (!mask)
-   continue;
-   if (mask == I40E_4_BIT_MASK)
-   l = 0;
-   else
-   l = I40E_READ_REG(hw, I40E_PFQF_HLUT(i >> 2));
-   for (j = 0, lut = 0; j < I40E_4_BIT_WIDTH; j++) {
-   if (mask & (0x1 << j))
-   lut |= reta_conf[idx].reta[shift + j] <<
-   (CHAR_BIT * j);
-   else
-   lut |= l & (I40E_8_BIT_MASK << (CHAR_BIT * j));
-   }
-   I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i >> 2), lut);
+   if (reta_conf[idx].mask & (1ULL << shift))
+   lut[i] = reta_conf[idx].reta[shift];
}
+   ret = i40e_set_rss_lut(pf->main_vsi, lut, reta_size);

-   return 0;
+out:
+   rte_free(lut);
+
+   return ret;
 }

 static int
@@ -2043,11 +2097,10 @@ i40e_dev_rss_reta_query(struct rte_eth_dev *dev,
uint16_t reta_size)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-   struc

[dpdk-dev] [PATCH 1/3] i40e: add support of X722 and its A0 hardware

2015-10-13 Thread Helin Zhang

In order to provide users early access of X722 and its A0 hardware,
new device IDs are added, and also compilation with those support
in base driver is enabled.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/Makefile   |  1 +
 lib/librte_eal/common/include/rte_pci_dev_ids.h | 14 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 55b7d31..5744c0d 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -38,6 +38,7 @@ LIB = librte_pmd_i40e.a

 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -DPF_DRIVER -DVF_DRIVER -DINTEGRATED_VF
+CFLAGS += -DX722_SUPPORT -DX722_A0_SUPPORT

 EXPORT_MAP := rte_pmd_i40e_version.map

diff --git a/lib/librte_eal/common/include/rte_pci_dev_ids.h 
b/lib/librte_eal/common/include/rte_pci_dev_ids.h
index 265e66c..fb29650 100644
--- a/lib/librte_eal/common/include/rte_pci_dev_ids.h
+++ b/lib/librte_eal/common/include/rte_pci_dev_ids.h
@@ -4,7 +4,7 @@
  *
  *   GPL LICENSE SUMMARY
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *
  *   This program is free software; you can redistribute it and/or modify
  *   it under the terms of version 2 of the GNU General Public License as
@@ -499,6 +499,10 @@ RTE_PCI_DEV_ID_DECL_IXGBE(PCI_VENDOR_ID_INTEL, 
IXGBE_DEV_ID_82599_BYPASS)
 #define I40E_DEV_ID_20G_KR2 0x1587
 #define I40E_DEV_ID_20G_KR2_A   0x1588
 #define I40E_DEV_ID_10G_BASE_T4 0x1589
+#define I40E_DEV_ID_X722_A0 0x374C
+#define I40E_DEV_ID_SFP_X7220x37D0
+#define I40E_DEV_ID_1G_BASE_T_X722  0x37D1
+#define I40E_DEV_ID_10G_BASE_T_X722 0x37D2

 RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_SFP_XL710)
 RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_QEMU)
@@ -512,6 +516,10 @@ RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, 
I40E_DEV_ID_10G_BASE_T)
 RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_20G_KR2)
 RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_20G_KR2_A)
 RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_10G_BASE_T4)
+RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_X722_A0)
+RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_SFP_X722)
+RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_1G_BASE_T_X722)
+RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_10G_BASE_T_X722)

 /*** Physical FM10K devices from fm10k_type.h ***/

@@ -555,9 +563,13 @@ RTE_PCI_DEV_ID_DECL_IXGBEVF(PCI_VENDOR_ID_INTEL, 
IXGBE_DEV_ID_X550EM_X_VF_HV)

 #define I40E_DEV_ID_VF  0x154C
 #define I40E_DEV_ID_VF_HV   0x1571
+#define I40E_DEV_ID_X722_VF 0x37CD
+#define I40E_DEV_ID_X722_VF_HV  0x37D9

 RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_VF)
 RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_VF_HV)
+RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_X722_VF)
+RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_X722_VF_HV)

 /** Virtio devices from virtio.h **/

-- 
1.9.3

[dpdk-dev] [PATCH 0/3] add i40e series x722 support

2015-10-13 Thread Helin Zhang

It supports i40e series x722 and its A0 hardware for early access.

Helin Zhang (3):
  i40e: add support of X722 and its A0 hardware
  i40e: add support of AQ based RSS config
  i40evf: add support of AQ based RSS config

 drivers/net/i40e/Makefile   |   1 +
 drivers/net/i40e/i40e_ethdev.c  | 229 +--
 drivers/net/i40e/i40e_ethdev.h  |   7 +-
 drivers/net/i40e/i40e_ethdev_vf.c   | 230 +---
 lib/librte_eal/common/include/rte_pci_dev_ids.h |  14 +-
 5 files changed, 360 insertions(+), 121 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v2] i40e: Add a workaround to drop flow control frames from VFs

2015-10-13 Thread Jingjing Wu

This patch adds a workaround to drop flow control frames from being
transmitted from VSIs.
With this patch in place a malicious VF cannot send flow control or PFC
packets out on the wire.

Signed-off-by: Jingjing Wu 
---
v2:
 - reword comments

 drivers/net/i40e/i40e_ethdev.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2dd9fdc..3d19f42 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -382,6 +382,30 @@ static inline void i40e_flex_payload_reg_init(struct 
i40e_hw *hw)
I40E_WRITE_REG(hw, I40E_GLQF_PIT(17), 0x7440);
 }

+#define I40E_FLOW_CONTROL_ETHERTYPE  0x8808
+
+/*
+ * Add a ethertype filter to drop all flow control frames transimited
+ * from VSIs.
+*/
+static void
+i40e_add_tx_flow_control_drop_filter(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint16_t flags = I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC |
+   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP |
+   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TX;
+   int ret;
+
+   ret = i40e_aq_add_rem_control_packet_filter(hw, NULL,
+   I40E_FLOW_CONTROL_ETHERTYPE, flags,
+   pf->main_vsi_seid, 0,
+   TRUE, NULL, NULL);
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to add filter to drop flow control "
+ " frames from VSIs.");
+}
+
 static int
 eth_i40e_dev_init(struct rte_eth_dev *dev)
 {
@@ -584,6 +608,12 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)

/* enable uio intr after callback register */
rte_intr_enable(&(pci_dev->intr_handle));
+   /*
+* Add an ethertype filter to drop all flow control frames transimited
+* from VSIs. By doing so, we stop VF from sening out PAUSE or PFC
+* frames to wire.
+*/
+   i40e_add_tx_flow_control_drop_filter(pf);

/* initialize mirror rule list */
TAILQ_INIT(&pf->mirror_list);
-- 
2.4.0

[dpdk-dev] [PATCH v3] mbuf/ip_frag: Move mbuf chaining to common code

2015-10-13 Thread Thomas Monjalon

2015-10-13 14:50, Simon Kagstrom:
> Ping?

OK you apply the ping method we have just talked about :)
To make it really effective, you should have these headers:
To: Olivier Matz
Cc: dev at dpdk.org
Indeed, as the mbuf maintainer, he's the target of your ping.

And to make it clear, the title should be
mbuf: move chaining from ip_frag library
(note the lowercase in "move")

Now you know how to do it so you can spread the word when someone forget
these advices.
Thanks

[dpdk-dev] [PATCH v3] mbuf/ip_frag: Move mbuf chaining to common code

2015-10-13 Thread Olivier MATZ

Hi Simon,

On 09/07/2015 02:50 PM, Simon Kagstrom wrote:
> Chaining/segmenting mbufs can be useful in many places, so make it
> global.
> 
> Signed-off-by: Simon Kagstrom 
> Signed-off-by: Johan Faltstrom 
> 
> [...]
> 
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1775,6 +1775,40 @@ static inline int rte_pktmbuf_is_contiguous(const 
> struct rte_mbuf *m)
>  }
>  
>  /**
> + * Chain an mbuf to another, thereby creating a segmented packet.
> + *
> + * Note: The implementation will do a linear walk over the segments to find
> + * the tail entry. For cases when there are many segments, it's better to
> + * chain the entries manually.
> + *
> + * @param head the head of the mbuf chain (the first packet)
> + * @param tail the mbuf to put last in the chain
> + *
> + * @return 0 on success, -EOVERFLOW if the chain is full (256 entries)
> + */

Small nit about the API comment, it should be:

@param head
  The head of the mbuf chain (the first packet).
...

(note the uppercase and the dot at the end, see the other functions
in the file)

I know Thomas usually fixes this kind of stuff when he pushes
the patches, but it's better if we can avoid him this load :)


Regards,
Olivier

[dpdk-dev] [PATCH v2] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-10-13 Thread Tom Ghyselinck

Hi Simon,

I'm looking forward to this patch since we also build the DPDK for
kernel versions which differ from the host kernel.

IMHO, the check for "Ubuntu" also needs change since this is also a
"host system" check instead of "target system" check.

I.e. A user may want to build for Ubuntu on a non-Ubuntu system and
vice versa.

With best regards,
Tom.

-- 
Tom Ghyselinck 
Excentis N.V.

On di, 2015-10-13 at 14:50 +0200, Simon Kagstrom wrote:
> Ping?
> 
> // Simon
> 
> On Thu, 20 Aug 2015 08:51:06 +0200
> Simon Kagstrom  wrote:
> 
> > /proc/version_signature is the version for the host machine, but in
> > e.g., chroots, this does not necessarily match that DPDK is built
> > for. DPDK will then build for the wrong kernel version - that of
> > the
> > server, and not that installed in the (build) chroot.
> > 
> > The patch uses utsrelease.h from the kernel sources instead and
> > fakes
> > the upload version.
> > 
> > Tested on a server with Ubuntu 12.04, building in a chroot for
> > Ubuntu
> > 14.04.
> > 
> > Signed-off-by: Simon Kagstrom 
> > Signed-off-by: Johan Faltstrom 
> > ---
> > ChangeLog:
> > 
> > v2: Improve description and motivation for the patch.
> > 
> >  lib/librte_eal/linuxapp/kni/Makefile | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/kni/Makefile
> > b/lib/librte_eal/linuxapp/kni/Makefile
> > index fb673d9..ac99d3f 100644
> > --- a/lib/librte_eal/linuxapp/kni/Makefile
> > +++ b/lib/librte_eal/linuxapp/kni/Makefile
> > @@ -44,10 +44,10 @@ MODULE_CFLAGS += -I$(RTE_OUTPUT)/include 
> > -I$(SRCDIR)/ethtool/ixgbe -I$(SRCDIR)/e
> >  MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h
> >  MODULE_CFLAGS += -Wall -Werror
> >  
> > -ifeq ($(shell test -f /proc/version_signature && lsb_release -si
> > 2>/dev/null),Ubuntu)
> > +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu)
> >  MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr |
> > tr -d .)
> > -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2
> > /proc/version_signature | \
> > -cut -d'~' -f1 | cut -d- -f1,2 | tr .-
> > $(comma))
> > +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> > $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> > +| cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> >  MODULE_CFLAGS += 
> > -D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_CODE))"
> >  endif
> >  
>

[dpdk-dev] IXGBE RX packet loss with 5+ cores

2015-10-13 Thread Bruce Richardson

On Mon, Oct 12, 2015 at 10:18:30PM -0700, Stephen Hemminger wrote:
> On Tue, 13 Oct 2015 02:57:46 +
> "Sanford, Robert"  wrote:
> 
> > I'm hoping that someone (perhaps at Intel) can help us understand
> > an IXGBE RX packet loss issue we're able to reproduce with testpmd.
> > 
> > We run testpmd with various numbers of cores. We offer line-rate
> > traffic (~14.88 Mpps) to one ethernet port, and forward all received
> > packets via the second port.
> > 
> > When we configure 1, 2, 3, or 4 cores (per port, with same number RX
> > queues per port), there is no RX packet loss. When we configure 5 or
> > more cores, we observe the following packet loss (approximate):
> >  5 cores - 3% loss
> >  6 cores - 7% loss
> >  7 cores - 11% loss
> >  8 cores - 15% loss
> >  9 cores - 18% loss
> > 
> > All of the "lost" packets are accounted for in the device's Rx Missed
> > Packets Count register (RXMPC[0]). Quoting the datasheet:
> >  "Packets are missed when the receive FIFO has insufficient space to
> >  store the incoming packet. This might be caused due to insufficient
> >  buffers allocated, or because there is insufficient bandwidth on the
> >  IO bus."
> > 
> > RXMPC, and our use of API rx_descriptor_done to verify that we don't
> > run out of mbufs (discussed below), lead us to theorize that packet
> > loss occurs because the device is unable to DMA all packets from its
> > internal packet buffer (512 KB, reported by register RXPBSIZE[0])
> > before overrun.
> > 
> > Questions
> > =
> > 1. The 82599 device supports up to 128 queues. Why do we see trouble
> > with as few as 5 queues? What could limit the system (and one port
> > controlled by 5+ cores) from receiving at line-rate without loss?
> > 
> > 2. As far as we can tell, the RX path only touches the device
> > registers when it updates a Receive Descriptor Tail register (RDT[n]),
> > roughly every rx_free_thresh packets. Is there a big difference
> > between one core doing this and N cores doing it 1/N as often?
> > 
> > 3. Do CPU reads/writes from/to device registers have a higher priority
> > than device reads/writes from/to memory? Could the former transactions
> > (CPU <-> device) significantly impede the latter (device <-> RAM)?
> > 
> > Thanks in advance for any help you can provide.
> 
> As you add cores, there is more traffic on the PCI bus from each core
> polling. There is a fix number of PCI bus transactions per second possible.
> Each core is increasing the number of useless (empty) transactions.
> Why do you think adding more cores will help?
>
The polling for packets by the core should not be using PCI bandwidth directly,
as the ixgbe driver (and other drivers) check for the DD bit being set on the
descriptor in memory/cache. However, using an increased number of queues can
use PCI bandwidth in other ways, for instance, with more queues you reduce the
amount of descriptor coalescing that can be done by the NICs, so that instead of
having a single transaction of 4 descriptors to one queue, the NIC may instead
have to do 4 transactions each writing 1 descriptor to 4 different queues. This
is possibly why sending all traffic to a single queue works ok - the polling on
the other queues is still being done, but has little effect.

Regards,
/Bruce

[dpdk-dev] [PATCH v2 8/8] librte_table: modify release notes and deprecation notice

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

The LIBABIVER number is incremented. The release notes is
updated and the deprecation announce is removed.

Signed-off-by: Fan Zhang 
---
 doc/guides/rel_notes/deprecation.rst | 3 ---
 doc/guides/rel_notes/release_2_2.rst | 4 +++-
 lib/librte_table/Makefile| 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index fa55117..06e0078 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -56,9 +56,6 @@ Deprecation Notices
 * librte_table: New functions for table entry bulk add/delete will be added
   to the table operations structure.

-* librte_table hash: Key mask parameter will be added to the hash table
-  parameter structure for 8-byte key and 16-byte key extendible bucket and
-  LRU tables.

 * librte_pipeline: The prototype for the pipeline input port, output port
   and table action handlers will be updated:
diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..30197ec 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -98,6 +98,8 @@ ABI Changes

 * The LPM structure is changed. The deprecated field mem_location is removed.

+* Key mask parameter is added to the hash table parameter structure for
+  8-byte key and 16-byte key extendible bucket and LRU tables.

 Shared Library Versions
 ---
@@ -130,6 +132,6 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_reorder.so.1
  librte_ring.so.1
  librte_sched.so.1
- librte_table.so.1
+ librte_table.so.2
  librte_timer.so.1
  librte_vhost.so.1
diff --git a/lib/librte_table/Makefile b/lib/librte_table/Makefile
index c5b3eaf..7f02af3 100644
--- a/lib/librte_table/Makefile
+++ b/lib/librte_table/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_table_version.map

-LIBABIVER := 1
+LIBABIVER := 2

 #
 # all source are stored in SRCS-y
-- 
2.1.0

[dpdk-dev] [PATCH v2 7/8] example/ip_pipeline/pipeline: update flow_classification pipeline

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

This patch updates the flow_classification pipeline for added key_mask
parameter in 8/16-byte key hash parameters. The update provides user
optional key_mask configuration item applying to the packets.

Signed-off-by: Fan Zhang 
---
 .../pipeline/pipeline_flow_classification_be.c | 56 --
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c 
b/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c
index 06a648d..e22f96f 100644
--- a/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c
+++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "pipeline_flow_classification_be.h"
 #include "hash_func.h"
@@ -49,6 +50,7 @@ struct pipeline_flow_classification {
uint32_t key_offset;
uint32_t key_size;
uint32_t hash_offset;
+   uint8_t *key_mask;
 } __rte_cache_aligned;

 static void *
@@ -125,8 +127,12 @@ pipeline_fc_parse_args(struct pipeline_flow_classification 
*p,
uint32_t key_offset_present = 0;
uint32_t key_size_present = 0;
uint32_t hash_offset_present = 0;
+   uint32_t key_mask_present = 0;

uint32_t i;
+   char *key_mask_str = NULL;
+
+   p->hash_offset = 0;

for (i = 0; i < params->n_args; i++) {
char *arg_name = params->args_name[i];
@@ -171,6 +177,20 @@ pipeline_fc_parse_args(struct pipeline_flow_classification 
*p,
continue;
}

+   /* key_mask */
+   if (strcmp(arg_name, "key_mask") == 0) {
+   if (key_mask_present)
+   return -1;
+
+   key_mask_str = strdup(arg_value);
+   if (key_mask_str == NULL)
+   return -1;
+
+   key_mask_present = 1;
+
+   continue;
+   }
+
/* hash_offset */
if (strcmp(arg_name, "hash_offset") == 0) {
if (hash_offset_present)
@@ -189,10 +209,23 @@ pipeline_fc_parse_args(struct 
pipeline_flow_classification *p,
/* Check that mandatory arguments are present */
if ((n_flows_present == 0) ||
(key_offset_present == 0) ||
-   (key_size_present == 0) ||
-   (hash_offset_present == 0))
+   (key_size_present == 0))
return -1;

+   if (key_mask_present) {
+   p->key_mask = rte_malloc(NULL, p->key_size, 0);
+   if (p->key_mask == NULL)
+   return -1;
+
+   if (parse_hex_string(key_mask_str, p->key_mask, &p->key_size)
+   != 0) {
+   free(p->key_mask);
+   return -1;
+   }
+
+   free(key_mask_str);
+   }
+
return 0;
 }

@@ -297,6 +330,7 @@ static void *pipeline_fc_init(struct pipeline_params 
*params,
.signature_offset = p_fc->hash_offset,
.key_offset = p_fc->key_offset,
.f_hash = hash_func[(p_fc->key_size / 8) - 1],
+   .key_mask = p_fc->key_mask,
.seed = 0,
};

@@ -307,6 +341,7 @@ static void *pipeline_fc_init(struct pipeline_params 
*params,
.signature_offset = p_fc->hash_offset,
.key_offset = p_fc->key_offset,
.f_hash = hash_func[(p_fc->key_size / 8) - 1],
+   .key_mask = p_fc->key_mask,
.seed = 0,
};

@@ -336,12 +371,25 @@ static void *pipeline_fc_init(struct pipeline_params 
*params,

switch (p_fc->key_size) {
case 8:
-   table_params.ops = &rte_table_hash_key8_lru_ops;
+   if (p_fc->hash_offset != 0) {
+   table_params.ops =
+   &rte_table_hash_key8_ext_ops;
+   } else {
+   table_params.ops =
+   &rte_table_hash_key8_ext_dosig_ops;
+   }
table_params.arg_create = &table_hash_key8_params;
break;
+   break;

case 16:
-   table_params.ops = &rte_table_hash_key16_ext_ops;
+   if (p_fc->hash_offset != 0) {
+   table_params.ops =
+   &rte_table_hash_key16_ext_ops;
+   } else {
+   table_params.ops =
+   &rte_table_hash_key16_ext_dosig_ops;
+   }
table_p

[dpdk-dev] [PATCH v2 6/8] example/ip_pipeline: add parse_hex_string for internal use

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

This patch adds parse_hex_string function to parse hex string to uint8_t
array.

Signed-off-by: Fan Zhang 
---
 examples/ip_pipeline/config_parse.c | 70 +
 examples/ip_pipeline/pipeline.h |  4 +++
 2 files changed, 74 insertions(+)

diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index c9b78f9..d7ee707 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -455,6 +455,76 @@ parse_pipeline_core(uint32_t *socket,
return 0;
 }

+static uint32_t
+get_hex_val(char c)
+{
+   switch (c) {
+   case '0':
+   case '1':
+   case '2':
+   case '3':
+   case '4':
+   case '5':
+   case '6':
+   case '7':
+   case '8':
+   case '9':
+   return c - '0';
+   case 'A':
+   case 'B':
+   case 'C':
+   case 'D':
+   case 'E':
+   case 'F':
+   return c - 'A' + 10;
+   case 'a':
+   case 'b':
+   case 'c':
+   case 'd':
+   case 'e':
+   case 'f':
+   return c - 'a' + 10;
+   default:
+   return 0;
+   }
+}
+
+int
+parse_hex_string(char *src, uint8_t *dst, uint32_t *size)
+{
+   char *c;
+   uint32_t len, i;
+
+   /* Check input parameters */
+   if ((src == NULL) ||
+   (dst == NULL) ||
+   (size == NULL) ||
+   (*size == 0))
+   return -1;
+
+   len = strlen(src);
+   if (((len & 3) != 0) ||
+   (len > (*size) * 2))
+   return -1;
+   *size = len / 2;
+
+   for (c = src; *c != 0; c++) {
+   if *c) >= '0') && ((*c) <= '9')) ||
+   (((*c) >= 'A') && ((*c) <= 'F')) ||
+   (((*c) >= 'a') && ((*c) <= 'f')))
+   continue;
+
+   return -1;
+   }
+
+   /* Convert chars to bytes */
+   for (i = 0; i < *size; i++)
+   dst[i] = get_hex_val(src[2 * i]) * 16 +
+   get_hex_val(src[2 * i + 1]);
+
+   return 0;
+}
+
 static size_t
 skip_digits(const char *src)
 {
diff --git a/examples/ip_pipeline/pipeline.h b/examples/ip_pipeline/pipeline.h
index b9a56ea..4063594 100644
--- a/examples/ip_pipeline/pipeline.h
+++ b/examples/ip_pipeline/pipeline.h
@@ -84,4 +84,8 @@ pipeline_type_cmds_count(struct pipeline_type *ptype)
return n_cmds;
 }

+/* Parse hex string to uint8_t array */
+int
+parse_hex_string(char *src, uint8_t *dst, uint32_t *size);
+
 #endif
-- 
2.1.0

[dpdk-dev] [PATCH v2 5/8] app/test-pipeline: modify pipeline test

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

Test-pipeline have been updated to work on added key_mask parameter for
8-byte key extendible bucket and LRU tables.

Signed-off-by: Fan Zhang 
---
 app/test-pipeline/pipeline_hash.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/app/test-pipeline/pipeline_hash.c 
b/app/test-pipeline/pipeline_hash.c
index 548615f..dda0d4d 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -216,6 +216,7 @@ app_main_loop_worker_pipeline_hash(void) {
.n_entries_ext = 1 << 23,
.signature_offset = 0,
.key_offset = 32,
+   .key_mask = NULL,
.f_hash = test_hash,
.seed = 0,
};
@@ -240,6 +241,7 @@ app_main_loop_worker_pipeline_hash(void) {
.n_entries = 1 << 24,
.signature_offset = 0,
.key_offset = 32,
+   .key_mask = NULL,
.f_hash = test_hash,
.seed = 0,
};
@@ -267,6 +269,7 @@ app_main_loop_worker_pipeline_hash(void) {
.key_offset = 32,
.f_hash = test_hash,
.seed = 0,
+   .key_mask = NULL,
};

struct rte_pipeline_table_params table_params = {
@@ -291,6 +294,7 @@ app_main_loop_worker_pipeline_hash(void) {
.key_offset = 32,
.f_hash = test_hash,
.seed = 0,
+   .key_mask = NULL
};

struct rte_pipeline_table_params table_params = {
-- 
2.1.0

[dpdk-dev] [PATCH v2 4/8] app/test: modify app/test_table_combined and app/test_table_tables

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

Tests have been updated to work on added key_mask parameter for 8-byte
key extendible bucket and LRU tables.

Signed-off-by: Fan Zhang 
---
 app/test/test_table_combined.c | 4 
 app/test/test_table_tables.c   | 6 --
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/app/test/test_table_combined.c b/app/test/test_table_combined.c
index dd09da5..359ac45 100644
--- a/app/test/test_table_combined.c
+++ b/app/test/test_table_combined.c
@@ -419,6 +419,7 @@ test_table_hash8lru(void)
.seed = 0,
.signature_offset = 0,
.key_offset = 32,
+   .key_mask = NULL,
};

uint8_t key8lru[8];
@@ -477,6 +478,7 @@ test_table_hash16lru(void)
.seed = 0,
.signature_offset = 0,
.key_offset = 32,
+   .key_mask = NULL,
};

uint8_t key16lru[16];
@@ -594,6 +596,7 @@ test_table_hash8ext(void)
.seed = 0,
.signature_offset = 0,
.key_offset = 32,
+   .key_mask = NULL,
};

uint8_t key8ext[8];
@@ -660,6 +663,7 @@ test_table_hash16ext(void)
.seed = 0,
.signature_offset = 0,
.key_offset = 32,
+   .key_mask = NULL,
};

uint8_t key16ext[16];
diff --git a/app/test/test_table_tables.c b/app/test/test_table_tables.c
index 566964b..cc222f1 100644
--- a/app/test/test_table_tables.c
+++ b/app/test/test_table_tables.c
@@ -651,7 +651,8 @@ test_table_hash_lru_generic(struct rte_table_ops *ops)
.f_hash = pipeline_test_hash,
.seed = 0,
.signature_offset = 1,
-   .key_offset = 32
+   .key_offset = 32,
+   .key_mask = NULL,
};

hash_params.n_entries = 0;
@@ -766,7 +767,8 @@ test_table_hash_ext_generic(struct rte_table_ops *ops)
.f_hash = pipeline_test_hash,
.seed = 0,
.signature_offset = 1,
-   .key_offset = 32
+   .key_offset = 32,
+   .key_mask = NULL,
};

hash_params.n_entries = 0;
-- 
2.1.0

[dpdk-dev] [PATCH v2 3/8] librte_table: add 16 byte hash table operations with computed lookup

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

This patch is to adding hash table operations for key signature
computed on lookup ("do-sig") for LRU hash tables and Extendible buckets.

Signed-off-by: Fan Zhang 
---
 lib/librte_table/rte_table_hash.h   |   8 +
 lib/librte_table/rte_table_hash_key16.c | 358 +++-
 2 files changed, 363 insertions(+), 3 deletions(-)

diff --git a/lib/librte_table/rte_table_hash.h 
b/lib/librte_table/rte_table_hash.h
index e2c60e1..9d17516 100644
--- a/lib/librte_table/rte_table_hash.h
+++ b/lib/librte_table/rte_table_hash.h
@@ -271,6 +271,10 @@ struct rte_table_hash_key16_lru_params {
 /** LRU hash table operations for pre-computed key signature */
 extern struct rte_table_ops rte_table_hash_key16_lru_ops;

+/** LRU hash table operations for key signature computed on lookup
+("do-sig") */
+extern struct rte_table_ops rte_table_hash_key16_lru_dosig_ops;
+
 /** Extendible bucket hash table parameters */
 struct rte_table_hash_key16_ext_params {
/** Maximum number of entries (and keys) in the table */
@@ -301,6 +305,10 @@ struct rte_table_hash_key16_ext_params {
 /** Extendible bucket operations for pre-computed key signature */
 extern struct rte_table_ops rte_table_hash_key16_ext_ops;

+/** Extendible bucket hash table operations for key signature computed on
+lookup ("do-sig") */
+extern struct rte_table_ops rte_table_hash_key16_ext_dosig_ops;
+
 /**
  * 32-byte key hash tables
  *
diff --git a/lib/librte_table/rte_table_hash_key16.c 
b/lib/librte_table/rte_table_hash_key16.c
index ffd3249..427b534 100644
--- a/lib/librte_table/rte_table_hash_key16.c
+++ b/lib/librte_table/rte_table_hash_key16.c
@@ -620,6 +620,27 @@ rte_table_hash_entry_delete_key16_ext(
rte_prefetch0((void *)(((uintptr_t) bucket1) + RTE_CACHE_LINE_SIZE));\
 }

+#define lookup1_stage1_dosig(mbuf1, bucket1, f)\
+{  \
+   uint64_t *key;  \
+   uint64_t signature = 0; \
+   uint32_t bucket_index;  \
+   uint64_t hash_key_buffer[2];\
+   \
+   key = RTE_MBUF_METADATA_UINT64_PTR(mbuf1, f->key_offset);\
+   \
+   hash_key_buffer[0] = key[0] & f->key_mask[0];   \
+   hash_key_buffer[1] = key[1] & f->key_mask[1];   \
+   signature = f->f_hash(hash_key_buffer,  \
+   RTE_TABLE_HASH_KEY_SIZE, f->seed);  \
+   \
+   bucket_index = signature & (f->n_buckets - 1);  \
+   bucket1 = (struct rte_bucket_4_16 *)\
+   &f->memory[bucket_index * f->bucket_size];  \
+   rte_prefetch0(bucket1); \
+   rte_prefetch0((void *)(((uintptr_t) bucket1) + RTE_CACHE_LINE_SIZE));\
+}
+
 #define lookup1_stage2_lru(pkt2_index, mbuf2, bucket2, \
pkts_mask_out, entries, f)  \
 {  \
@@ -769,6 +790,36 @@ rte_table_hash_entry_delete_key16_ext(
rte_prefetch0((void *)(((uintptr_t) bucket11) + RTE_CACHE_LINE_SIZE));\
 }

+#define lookup2_stage1_dosig(mbuf10, mbuf11, bucket10, bucket11, f)\
+{  \
+   uint64_t *key10, *key11;\
+   uint64_t hash_offset_buffer[2]; \
+   uint64_t signature10, signature11;  \
+   uint32_t bucket10_index, bucket11_index;\
+   \
+   key10 = RTE_MBUF_METADATA_UINT64_PTR(mbuf10, f->key_offset);\
+   hash_offset_buffer[0] = key10[0] & f->key_mask[0];  \
+   hash_offset_buffer[1] = key10[1] & f->key_mask[1];  \
+   signature10 = f->f_hash(hash_offset_buffer, \
+   RTE_TABLE_HASH_KEY_SIZE, f->seed);\
+   bucket10_index = signature10 & (f->n_buckets - 1);  \
+   bucket10 = (struct rte_bucket_4_16 *)   \
+   &f->memory[bucket10_index * f->bucket_size];\
+   rte_prefetch0(bucket10);\
+   rte_prefetch0((void *)(((uintptr_t) bucket10) + RTE_CACHE_LINE_SIZE));\
+   \
+   key11 = RTE_MBUF_METADATA_UINT64_PTR(mbuf11, f->key_offset);\
+   hash_offset_buffer[0] = key11[0] & f->key_mask[0];  \
+   hash_offset_buffer[1] = key11[1] & f->key_mask[1];  \
+   signature11 = f->f_hash(hash_offset_buffer, \
+   RTE_TABLE_HASH_KEY_SIZE, f->seed);\
+   bucket11_index = signature11 & (f

[dpdk-dev] [PATCH v2 2/8] librte_table: add key_mask parameter to 16-byte key hash parameters

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

This patch relates to ABI change proposed for librte_table. key_mask
parameter is added for 16-byte key extendible bucket and LRU tables.

Signed-off-by: Fan Zhang 
---
 lib/librte_table/rte_table_hash.h   |  6 
 lib/librte_table/rte_table_hash_key16.c | 53 -
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/librte_table/rte_table_hash.h 
b/lib/librte_table/rte_table_hash.h
index ef65355..e2c60e1 100644
--- a/lib/librte_table/rte_table_hash.h
+++ b/lib/librte_table/rte_table_hash.h
@@ -263,6 +263,9 @@ struct rte_table_hash_key16_lru_params {

/** Byte offset within packet meta-data where the key is located */
uint32_t key_offset;
+
+   /** Bit-mask to be AND-ed to the key on lookup */
+   uint8_t *key_mask;
 };

 /** LRU hash table operations for pre-computed key signature */
@@ -290,6 +293,9 @@ struct rte_table_hash_key16_ext_params {

/** Byte offset within packet meta-data where the key is located */
uint32_t key_offset;
+
+   /** Bit-mask to be AND-ed to the key on lookup */
+   uint8_t *key_mask;
 };

 /** Extendible bucket operations for pre-computed key signature */
diff --git a/lib/librte_table/rte_table_hash_key16.c 
b/lib/librte_table/rte_table_hash_key16.c
index f6a3306..ffd3249 100644
--- a/lib/librte_table/rte_table_hash_key16.c
+++ b/lib/librte_table/rte_table_hash_key16.c
@@ -85,6 +85,7 @@ struct rte_table_hash {
uint32_t bucket_size;
uint32_t signature_offset;
uint32_t key_offset;
+   uint64_t key_mask[2];
rte_table_hash_op_hash f_hash;
uint64_t seed;

@@ -164,6 +165,14 @@ rte_table_hash_create_key16_lru(void *params,
f->f_hash = p->f_hash;
f->seed = p->seed;

+   if (p->key_mask != NULL) {
+   f->key_mask[0] = ((uint64_t *)p->key_mask)[0];
+   f->key_mask[1] = ((uint64_t *)p->key_mask)[1];
+   } else {
+   f->key_mask[0] = 0xLLU;
+   f->key_mask[1] = 0xLLU;
+   }
+
for (i = 0; i < n_buckets; i++) {
struct rte_bucket_4_16 *bucket;

@@ -384,6 +393,14 @@ rte_table_hash_create_key16_ext(void *params,
for (i = 0; i < n_buckets_ext; i++)
f->stack[i] = i;

+   if (p->key_mask != NULL) {
+   f->key_mask[0] = (((uint64_t *)p->key_mask)[0]);
+   f->key_mask[1] = (((uint64_t *)p->key_mask)[1]);
+   } else {
+   f->key_mask[0] = 0xLLU;
+   f->key_mask[1] = 0xLLU;
+   }
+
return f;
 }

@@ -609,11 +626,14 @@ rte_table_hash_entry_delete_key16_ext(
void *a;\
uint64_t pkt_mask;  \
uint64_t *key;  \
+   uint64_t hash_key_buffer[2];\
uint32_t pos;   \
\
key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\
+   hash_key_buffer[0] = key[0] & f->key_mask[0];   \
+   hash_key_buffer[1] = key[1] & f->key_mask[1];   \
\
-   lookup_key16_cmp(key, bucket2, pos);\
+   lookup_key16_cmp(hash_key_buffer, bucket2, pos);\
\
pkt_mask = (bucket2->signature[pos] & 1LLU) << pkt2_index;\
pkts_mask_out |= pkt_mask;  \
@@ -631,11 +651,14 @@ rte_table_hash_entry_delete_key16_ext(
void *a;\
uint64_t pkt_mask, bucket_mask; \
uint64_t *key;  \
+   uint64_t hash_key_buffer[2];\
uint32_t pos;   \
\
key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\
+   hash_key_buffer[0] = key[0] & f->key_mask[0];   \
+   hash_key_buffer[1] = key[1] & f->key_mask[1];   \
\
-   lookup_key16_cmp(key, bucket2, pos);\
+   lookup_key16_cmp(hash_key_buffer, bucket2, pos);\
\
pkt_mask = (bucket2->signature[pos] & 1LLU) << pkt2_index;\
pkts_mask_out |= pkt_mask;  \
@@ -658,12 +681,15 @@ rte_table_hash_entry_delete_key16_ext(
void *a;\
uint64_t pkt_mask, bucket_mask; \
uint64_t *key;  \
+   uint64_t hash_key_buffer[2];

[dpdk-dev] [PATCH v2 1/8] librte_table: add key_mask parameter to 8-byte key hash parameters

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 

This patch relates to ABI change proposed for librte_table. key_mask
parameter is added for 8-byte key extendible bucket and LRU tables.

Signed-off-by: Fan Zhang 
---
 lib/librte_table/rte_table_hash.h  |  6 
 lib/librte_table/rte_table_hash_key8.c | 54 +++---
 2 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/lib/librte_table/rte_table_hash.h 
b/lib/librte_table/rte_table_hash.h
index 9181942..ef65355 100644
--- a/lib/librte_table/rte_table_hash.h
+++ b/lib/librte_table/rte_table_hash.h
@@ -196,6 +196,9 @@ struct rte_table_hash_key8_lru_params {

/** Byte offset within packet meta-data where the key is located */
uint32_t key_offset;
+
+   /** Bit-mask to be AND-ed to the key on lookup */
+   uint8_t *key_mask;
 };

 /** LRU hash table operations for pre-computed key signature */
@@ -226,6 +229,9 @@ struct rte_table_hash_key8_ext_params {

/** Byte offset within packet meta-data where the key is located */
uint32_t key_offset;
+
+   /** Bit-mask to be AND-ed to the key on lookup */
+   uint8_t *key_mask;
 };

 /** Extendible bucket hash table operations for pre-computed key signature */
diff --git a/lib/librte_table/rte_table_hash_key8.c 
b/lib/librte_table/rte_table_hash_key8.c
index b351a49..ccb20cf 100644
--- a/lib/librte_table/rte_table_hash_key8.c
+++ b/lib/librte_table/rte_table_hash_key8.c
@@ -82,6 +82,7 @@ struct rte_table_hash {
uint32_t bucket_size;
uint32_t signature_offset;
uint32_t key_offset;
+   uint64_t key_mask;
rte_table_hash_op_hash f_hash;
uint64_t seed;

@@ -160,6 +161,11 @@ rte_table_hash_create_key8_lru(void *params, int 
socket_id, uint32_t entry_size)
f->f_hash = p->f_hash;
f->seed = p->seed;

+   if (p->key_mask != NULL)
+   f->key_mask = ((uint64_t *)p->key_mask)[0];
+   else
+   f->key_mask = 0xLLU;
+
for (i = 0; i < n_buckets; i++) {
struct rte_bucket_4_8 *bucket;

@@ -372,6 +378,11 @@ rte_table_hash_create_key8_ext(void *params, int 
socket_id, uint32_t entry_size)
f->stack = (uint32_t *)
&f->memory[(n_buckets + n_buckets_ext) * f->bucket_size];

+   if (p->key_mask != NULL)
+   f->key_mask = ((uint64_t *)p->key_mask)[0];
+   else
+   f->key_mask = 0xLLU;
+
for (i = 0; i < n_buckets_ext; i++)
f->stack[i] = i;

@@ -586,9 +597,12 @@ rte_table_hash_entry_delete_key8_ext(
uint64_t *key;  \
uint64_t signature; \
uint32_t bucket_index;  \
+   uint64_t hash_key_buffer;   \
\
key = RTE_MBUF_METADATA_UINT64_PTR(mbuf1, f->key_offset);\
-   signature = f->f_hash(key, RTE_TABLE_HASH_KEY_SIZE, f->seed);\
+   hash_key_buffer = *key & f->key_mask;   \
+   signature = f->f_hash(&hash_key_buffer, \
+   RTE_TABLE_HASH_KEY_SIZE, f->seed);  \
bucket_index = signature & (f->n_buckets - 1);  \
bucket1 = (struct rte_bucket_4_8 *) \
&f->memory[bucket_index * f->bucket_size];  \
@@ -602,10 +616,12 @@ rte_table_hash_entry_delete_key8_ext(
uint64_t pkt_mask;  \
uint64_t *key;  \
uint32_t pos;   \
+   uint64_t hash_key_buffer;   \
\
key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\
+   hash_key_buffer = key[0] & f->key_mask; \
\
-   lookup_key8_cmp(key, bucket2, pos); \
+   lookup_key8_cmp((&hash_key_buffer), bucket2, pos);  \
\
pkt_mask = ((bucket2->signature >> pos) & 1LLU) << pkt2_index;\
pkts_mask_out |= pkt_mask;  \
@@ -624,10 +640,12 @@ rte_table_hash_entry_delete_key8_ext(
uint64_t pkt_mask, bucket_mask; \
uint64_t *key;  \
uint32_t pos;   \
+   uint64_t hash_key_buffer;   \
\
key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\
+   hash_key_buffer = *key & f->key_mask;   \
\
-   lookup_key8_cmp(key, bucket2,

[dpdk-dev] [PATCH v2 0/8] librte_table: add key_mask parameter to 8-byte key

2015-10-13 Thread Jasvinder Singh

From: Fan Zhang 


This patchset links to ABI change announced for librte_table.
Key_mask parameters has been added to the hash table parameter
structure for 8-byte key and 16-byte key extendible bucket
and LRU tables.

v2:
*change in release note.

Acked-by: Cristian Dumitrescu 

Fan Zhang (8):
  librte_table: add key_mask parameter to 8-byte key hash parameters
  librte_table: add key_mask parameter to 16-byte key hash parameters
  librte_table: add 16 byte hash table operations with computed lookup
  app/test: modify app/test_table_combined and app/test_table_tables
  app/test-pipeline: modify pipeline test
  example/ip_pipeline: add parse_hex_string for internal use
  example/ip_pipeline/pipeline: update flow_classification pipeline
  librte_table: modify release notes and deprecation notice

 app/test-pipeline/pipeline_hash.c  |   4 +
 app/test/test_table_combined.c |   4 +
 app/test/test_table_tables.c   |   6 +-
 doc/guides/rel_notes/deprecation.rst   |   3 -
 doc/guides/rel_notes/release_2_2.rst   |   4 +-
 examples/ip_pipeline/config_parse.c|  70 
 examples/ip_pipeline/pipeline.h|   4 +
 .../pipeline/pipeline_flow_classification_be.c |  56 ++-
 lib/librte_table/Makefile  |   2 +-
 lib/librte_table/rte_table_hash.h  |  20 +
 lib/librte_table/rte_table_hash_key16.c| 411 -
 lib/librte_table/rte_table_hash_key8.c |  54 ++-
 12 files changed, 607 insertions(+), 31 deletions(-)

-- 
2.1.0

[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds

2015-10-13 Thread Simon Kågström

On 2015-10-13 14:45, Thomas Monjalon wrote:
> 2015-10-13 14:39, Simon K?gstr?m:
>> Two of the patches (this one included) I have outstanding are build fixes
>> for use in our build environment, so it would be nice to them upstreamed.
> 
> Waiting for integration of your patches, maybe you have some free time to
> help other developers by making reviews ;)

Waiting for integration is not my only work-task :-)

Anyway, I have too superficial knowledge about DPDK to chime in with
relevant comments in most cases, but I'll comment if I feel I can
contribute.

// Simon

[dpdk-dev] [PATCH v2] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-10-13 Thread Simon Kagstrom

Ping?

// Simon

On Thu, 20 Aug 2015 08:51:06 +0200
Simon Kagstrom  wrote:

> /proc/version_signature is the version for the host machine, but in
> e.g., chroots, this does not necessarily match that DPDK is built
> for. DPDK will then build for the wrong kernel version - that of the
> server, and not that installed in the (build) chroot.
> 
> The patch uses utsrelease.h from the kernel sources instead and fakes
> the upload version.
> 
> Tested on a server with Ubuntu 12.04, building in a chroot for Ubuntu
> 14.04.
> 
> Signed-off-by: Simon Kagstrom 
> Signed-off-by: Johan Faltstrom 
> ---
> ChangeLog:
> 
> v2: Improve description and motivation for the patch.
> 
>  lib/librte_eal/linuxapp/kni/Makefile | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/Makefile 
> b/lib/librte_eal/linuxapp/kni/Makefile
> index fb673d9..ac99d3f 100644
> --- a/lib/librte_eal/linuxapp/kni/Makefile
> +++ b/lib/librte_eal/linuxapp/kni/Makefile
> @@ -44,10 +44,10 @@ MODULE_CFLAGS += -I$(RTE_OUTPUT)/include 
> -I$(SRCDIR)/ethtool/ixgbe -I$(SRCDIR)/e
>  MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h
>  MODULE_CFLAGS += -Wall -Werror
>  
> -ifeq ($(shell test -f /proc/version_signature && lsb_release -si 
> 2>/dev/null),Ubuntu)
> +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu)
>  MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | tr -d .)
> -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE 
> $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> +  | cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
>  MODULE_CFLAGS += 
> -D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_CODE))"
>  endif
>

[dpdk-dev] [PATCH v3] mbuf/ip_frag: Move mbuf chaining to common code

2015-10-13 Thread Simon Kagstrom

Ping?

// Simon

On Mon, 7 Sep 2015 14:50:09 +0200
Simon Kagstrom  wrote:

> Chaining/segmenting mbufs can be useful in many places, so make it
> global.
> 
> Signed-off-by: Simon Kagstrom 
> Signed-off-by: Johan Faltstrom 
> ---
> ChangeLog:
> v2:
>   * Check for nb_segs byte overflow (Olivier MATZ)
>   * Don't reset nb_segs in tail (Olivier MATZ)
> v3:
>   * Describe performance implications of linear search
>   * Correct check-for-out-of-bounds (Konstantin Ananyev)
> 
>  lib/librte_ip_frag/ip_frag_common.h  | 23 -
>  lib/librte_ip_frag/rte_ipv4_reassembly.c |  7 +--
>  lib/librte_ip_frag/rte_ipv6_reassembly.c |  7 +--
>  lib/librte_mbuf/rte_mbuf.h   | 34 
> 
>  4 files changed, 44 insertions(+), 27 deletions(-)
> 
> diff --git a/lib/librte_ip_frag/ip_frag_common.h 
> b/lib/librte_ip_frag/ip_frag_common.h
> index 6b2acee..cde6ed4 100644
> --- a/lib/librte_ip_frag/ip_frag_common.h
> +++ b/lib/librte_ip_frag/ip_frag_common.h
> @@ -166,27 +166,4 @@ ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms)
>   fp->frags[IP_FIRST_FRAG_IDX] = zero_frag;
>  }
>  
> -/* chain two mbufs */
> -static inline void
> -ip_frag_chain(struct rte_mbuf *mn, struct rte_mbuf *mp)
> -{
> - struct rte_mbuf *ms;
> -
> - /* adjust start of the last fragment data. */
> - rte_pktmbuf_adj(mp, (uint16_t)(mp->l2_len + mp->l3_len));
> -
> - /* chain two fragments. */
> - ms = rte_pktmbuf_lastseg(mn);
> - ms->next = mp;
> -
> - /* accumulate number of segments and total length. */
> - mn->nb_segs = (uint8_t)(mn->nb_segs + mp->nb_segs);
> - mn->pkt_len += mp->pkt_len;
> -
> - /* reset pkt_len and nb_segs for chained fragment. */
> - mp->pkt_len = mp->data_len;
> - mp->nb_segs = 1;
> -}
> -
> -
>  #endif /* _IP_FRAG_COMMON_H_ */
> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c 
> b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> index 5d24843..26d07f9 100644
> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> @@ -63,7 +63,9 @@ ipv4_frag_reassemble(const struct ip_frag_pkt *fp)
>   /* previous fragment found. */
>   if(fp->frags[i].ofs + fp->frags[i].len == ofs) {
>  
> - ip_frag_chain(fp->frags[i].mb, m);
> + /* adjust start of the last fragment data. */
> + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + 
> m->l3_len));
> + rte_pktmbuf_chain(fp->frags[i].mb, m);
>  
>   /* update our last fragment and offset. */
>   m = fp->frags[i].mb;
> @@ -78,7 +80,8 @@ ipv4_frag_reassemble(const struct ip_frag_pkt *fp)
>   }
>  
>   /* chain with the first fragment. */
> - ip_frag_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m);
> + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + m->l3_len));
> + rte_pktmbuf_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m);
>   m = fp->frags[IP_FIRST_FRAG_IDX].mb;
>  
>   /* update mbuf fields for reassembled packet. */
> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c 
> b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> index 1f1c172..5969b4a 100644
> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> @@ -86,7 +86,9 @@ ipv6_frag_reassemble(const struct ip_frag_pkt *fp)
>   /* previous fragment found. */
>   if (fp->frags[i].ofs + fp->frags[i].len == ofs) {
>  
> - ip_frag_chain(fp->frags[i].mb, m);
> + /* adjust start of the last fragment data. */
> + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + 
> m->l3_len));
> + rte_pktmbuf_chain(fp->frags[i].mb, m);
>  
>   /* update our last fragment and offset. */
>   m = fp->frags[i].mb;
> @@ -101,7 +103,8 @@ ipv6_frag_reassemble(const struct ip_frag_pkt *fp)
>   }
>  
>   /* chain with the first fragment. */
> - ip_frag_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m);
> + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + m->l3_len));
> + rte_pktmbuf_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m);
>   m = fp->frags[IP_FIRST_FRAG_IDX].mb;
>  
>   /* update mbuf fields for reassembled packet. */
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index d7c9030..f1f1400 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1775,6 +1775,40 @@ static inline int rte_pktmbuf_is_contiguous(const 
> struct rte_mbuf *m)
>  }
>  
>  /**
> + * Chain an mbuf to another, thereby creating a segmented packet.
> + *
> + * Note: The implementation will do a linear walk over the segments to find
> + * the tail entry. For cases when there are many segments, it's better to
> + * chain the entries manually.
> + *

[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-10-13 Thread Olivier MATZ

Hi Chas,

On 10/05/2015 05:26 PM, Chas Williams wrote:
> If a system is using deterministic interface names, it may be easier in
> some cases to use the interface name to blacklist an interface.
> 
> Signed-off-by: Chas Williams <3chas3 at gmail.com>
> ---
>  app/test/test_devargs.c |  2 ++
>  lib/librte_eal/common/eal_common_devargs.c  |  9 +++--
>  lib/librte_eal/common/eal_common_options.c  |  2 +-
>  lib/librte_eal/common/eal_common_pci.c  | 10 --
>  lib/librte_eal/common/include/rte_devargs.h |  2 ++
>  lib/librte_eal/common/include/rte_pci.h |  1 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 15 +++
>  7 files changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test/test_devargs.c b/app/test/test_devargs.c
> index f7fc59c..27855ff 100644

> 
> [...]
> 

> @@ -352,6 +354,19 @@ pci_scan_one(const char *dirname, uint16_t domain, 
> uint8_t bus,
>   return -1;
>   }
>  
> + /* get network interface name */
> + snprintf(filename, sizeof(filename), "%s/net", dirname);
> + dir = opendir(filename);
> + if (dir) {
> + while ((e = readdir(dir)) != NULL) {
> + if (e->d_name[0] == '.')
> + continue;
> +
> + strncpy(dev->name, e->d_name, sizeof(dev->name));
> + }
> + closedir(dir);
> + }
> +
>   if (!ret) {
>   if (!strcmp(driver, "vfio-pci"))
>   dev->kdrv = RTE_KDRV_VFIO;
> 

For PCI devices that have several interfaces (I think it's the case for
some Mellanox boards), maybe we should not store the interface name?

Another small comment about the strncpy(): it's maybe safer to ensure
that dev->name is properly nul-terminated.

Regards,
Olivier

[dpdk-dev] IXGBE RX packet loss with 5+ cores

2015-10-13 Thread Sanford, Robert


>>> [Robert:]
>>> 1. The 82599 device supports up to 128 queues. Why do we see trouble
>>> with as few as 5 queues? What could limit the system (and one port
>>> controlled by 5+ cores) from receiving at line-rate without loss?
>>>
>>> 2. As far as we can tell, the RX path only touches the device
>>> registers when it updates a Receive Descriptor Tail register (RDT[n]),
>>> roughly every rx_free_thresh packets. Is there a big difference
>>> between one core doing this and N cores doing it 1/N as often?

>>[Stephen:]
>>As you add cores, there is more traffic on the PCI bus from each core
>>polling. There is a fix number of PCI bus transactions per second
>>possible.
>>Each core is increasing the number of useless (empty) transactions.

>[Bruce:]
>The polling for packets by the core should not be using PCI bandwidth
>directly,
>as the ixgbe driver (and other drivers) check for the DD bit being set on
>the
>descriptor in memory/cache.

I was preparing to reply with the same point.

>>[Stephen:] Why do you think adding more cores will help?

We're using run-to-completion and sometimes spend too many cycles per pkt.
We realize that we need to move to io+workers model, but wanted a better
understanding of the dynamics involved here.



>[Bruce:] However, using an increased number of queues can
>use PCI bandwidth in other ways, for instance, with more queues you
>reduce the
>amount of descriptor coalescing that can be done by the NICs, so that
>instead of
>having a single transaction of 4 descriptors to one queue, the NIC may
>instead
>have to do 4 transactions each writing 1 descriptor to 4 different
>queues. This
>is possibly why sending all traffic to a single queue works ok - the
>polling on
>the other queues is still being done, but has little effect.

Brilliant! This idea did not occur to me.



--
Thanks guys,
Robert

[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds

2015-10-13 Thread Thomas Monjalon

2015-10-13 14:39, Simon K?gstr?m:
> On 2015-10-13 14:26, Olivier MATZ wrote:
> > There is the patchwork tool:
> > http://dpdk.org/dev/patchwork/project/dpdk/list/
> 
> Thanks! I knew about it, but forgot to look there. It would be nice to
> have tags to signify e.g., for-v2.2, candidate-v2.2 etc. like you can
> have on github to easier see where patches are going, but perhaps
> patchwork doesn't work that way.

No it's not needed currently because every patches in this mailing-list
target an integration in the main branch for the next release.
Exceptions must be notified.

> Is the process to ping old patches like this on the mailing list?

Yes it is the responsibility of the developer and the maintainer to get
reviews. Please check in the MAINTAINERS file who to contact.

> Two of the patches (this one included) I have outstanding are build fixes
> for use in our build environment, so it would be nice to them upstreamed.

Waiting for integration of your patches, maybe you have some free time to
help other developers by making reviews ;)

Thanks

[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds

2015-10-13 Thread Simon Kågström

On 2015-10-13 14:26, Olivier MATZ wrote:
> Sorry for not having answered before.

Thanks for looking at it now though!

>> This is one of three outstanding DPDK patches I have which hasn't seen
>> any activitiy in a while. Is there a list of pending applies somewhere
>> to monitor activity?
> 
> There is the patchwork tool:
> http://dpdk.org/dev/patchwork/project/dpdk/list/

Thanks! I knew about it, but forgot to look there. It would be nice to
have tags to signify e.g., for-v2.2, candidate-v2.2 etc. like you can
have on github to easier see where patches are going, but perhaps
patchwork doesn't work that way.

Is the process to ping old patches like this on the mailing list? Two of
the patches (this one included) I have outstanding are build fixes for
use in our build environment, so it would be nice to them upstreamed.

// Simon

[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds

2015-10-13 Thread Olivier MATZ

Hi Simon,

Sorry for not having answered before.

On 10/13/2015 02:10 PM, Simon Kagstrom wrote:
> Ping?
> 
> This is one of three outstanding DPDK patches I have which hasn't seen
> any activitiy in a while. Is there a list of pending applies somewhere
> to monitor activity?

There is the patchwork tool:
http://dpdk.org/dev/patchwork/project/dpdk/list/

>> Otherwise building with KERNELCC="ccache gcc" will fail:
>>
>>  == Build lib/librte_eal/linuxapp/igb_uio
>>  /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:98: stack 
>> protector enabled but no compiler support
>>  /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:113: 
>> CONFIG_X86_X32 enabled but no binutils support
>>  ccache: invalid option -- 'p'
>>  Usage:
>>  ccache [options]
>>  ccache compiler [compiler options]
>>  compiler [compiler options]  (via symbolic link)
>>
>>  Options:
>>  -c, --cleanup delete old files and recalculate size counters
>>(normally not needed as this is done 
>> automatically)
>>  -C, --clear   clear the cache completely
>>  -F, --max-files=N set maximum number of files in cache to N (use 0 
>> for
>>no limit)
>>  -M, --max-size=SIZE   set maximum size of cache to SIZE (use 0 for no
>>limit; available suffixes: G, M and K; default
>>suffix: G)
>>  -s, --show-stats  show statistics summary
>>  -z, --zero-stats  zero statistics counters
>>
>>  -h, --helpprint this help text
>>  -V, --version print version and copyright information
>>
>> Signed-off-by: Simon Kagstrom 

Acked-by: Olivier Matz

[dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)

2015-10-13 Thread David Marchand

On Tue, Oct 13, 2015 at 11:13 AM, David Marchand 
wrote:

> Hello Christoph,
>
> On Tue, Oct 13, 2015 at 11:10 AM, Christoph Gysin <
> christoph.gysin at gmail.com> wrote:
>
>>
>> Is there anything I can do to help getting this merged?
>>
>
> This is ok for me, cc-ing Thomas.
>

Thought I already did, but just in case,
Acked-by: David Marchand 

-- 
David Marchand

[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds

2015-10-13 Thread Simon Kagstrom

Ping?

This is one of three outstanding DPDK patches I have which hasn't seen
any activitiy in a while. Is there a list of pending applies somewhere
to monitor activity?

// Simon

On Thu, 24 Sep 2015 09:43:28 +0200
Simon Kagstrom  wrote:

> Otherwise building with KERNELCC="ccache gcc" will fail:
> 
>  == Build lib/librte_eal/linuxapp/igb_uio
>  /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:98: stack 
> protector enabled but no compiler support
>  /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:113: 
> CONFIG_X86_X32 enabled but no binutils support
>  ccache: invalid option -- 'p'
>  Usage:
>  ccache [options]
>  ccache compiler [compiler options]
>  compiler [compiler options]  (via symbolic link)
> 
>  Options:
>  -c, --cleanup delete old files and recalculate size counters
>(normally not needed as this is done automatically)
>  -C, --clear   clear the cache completely
>  -F, --max-files=N set maximum number of files in cache to N (use 0 
> for
>no limit)
>  -M, --max-size=SIZE   set maximum size of cache to SIZE (use 0 for no
>limit; available suffixes: G, M and K; default
>suffix: G)
>  -s, --show-stats  show statistics summary
>  -z, --zero-stats  zero statistics counters
> 
>  -h, --helpprint this help text
>  -V, --version print version and copyright information
> 
> Signed-off-by: Simon Kagstrom 
> ---
>  mk/rte.module.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mk/rte.module.mk b/mk/rte.module.mk
> index 7bf77c1..53ed4fe 100644
> --- a/mk/rte.module.mk
> +++ b/mk/rte.module.mk
> @@ -78,7 +78,7 @@ build: _postbuild
>  $(MODULE).ko: $(SRCS_LINKS)
>   @if [ ! -f $(notdir Makefile) ]; then ln -nfs $(SRCDIR)/Makefile . ; fi
>   @$(MAKE) -C $(RTE_KERNELDIR) M=$(CURDIR) O=$(RTE_KERNELDIR) \
> - CC=$(KERNELCC) CROSS_COMPILE=$(CROSS) V=$(if $V,1,0)
> + CC="$(KERNELCC)" CROSS_COMPILE=$(CROSS) V=$(if $V,1,0)
>  
>  # install module in $(RTE_OUTPUT)/kmod
>  $(RTE_OUTPUT)/kmod/$(MODULE).ko: $(MODULE).ko

[dpdk-dev] Question about unsupported transceivers

2015-10-13 Thread Alexander Duyck

On 10/13/2015 11:57 AM, Alex Forster wrote:
> I believe I've discovered my problem: 
> https://gist.github.com/AlexForster/0fb4699bcdf196cf5462
>
> As mentioned previously, I have two X520-Q1 cards installed. It appears that 
> initialization of the first card obeys allow_unsupported_sfp=1, but 
> initialization of the second card does not.
>
> Is this a bug, or is there a way to work around this that I'm not aware of?
>
> Alex Forster

If you are using Intel's out-of-tree ixgbe driver I believe the module 
parameters are comma separated with one index per port.  So if you have 
two ports you should be passing "allow_unsupported_sfp=1,1", and for 4 
you would need four '1's.

- Alex

[dpdk-dev] IXGBE RX packet loss with 5+ cores

2015-10-13 Thread Alexander Duyck

On 10/13/2015 07:47 AM, Sanford, Robert wrote:
 [Robert:]
 1. The 82599 device supports up to 128 queues. Why do we see trouble
 with as few as 5 queues? What could limit the system (and one port
 controlled by 5+ cores) from receiving at line-rate without loss?

 2. As far as we can tell, the RX path only touches the device
 registers when it updates a Receive Descriptor Tail register (RDT[n]),
 roughly every rx_free_thresh packets. Is there a big difference
 between one core doing this and N cores doing it 1/N as often?
>>> [Stephen:]
>>> As you add cores, there is more traffic on the PCI bus from each core
>>> polling. There is a fix number of PCI bus transactions per second
>>> possible.
>>> Each core is increasing the number of useless (empty) transactions.
>> [Bruce:]
>> The polling for packets by the core should not be using PCI bandwidth
>> directly,
>> as the ixgbe driver (and other drivers) check for the DD bit being set on
>> the
>> descriptor in memory/cache.
> I was preparing to reply with the same point.
>
>>> [Stephen:] Why do you think adding more cores will help?
> We're using run-to-completion and sometimes spend too many cycles per pkt.
> We realize that we need to move to io+workers model, but wanted a better
> understanding of the dynamics involved here.
>
>
>
>> [Bruce:] However, using an increased number of queues can
>> use PCI bandwidth in other ways, for instance, with more queues you
>> reduce the
>> amount of descriptor coalescing that can be done by the NICs, so that
>> instead of
>> having a single transaction of 4 descriptors to one queue, the NIC may
>> instead
>> have to do 4 transactions each writing 1 descriptor to 4 different
>> queues. This
>> is possibly why sending all traffic to a single queue works ok - the
>> polling on
>> the other queues is still being done, but has little effect.
> Brilliant! This idea did not occur to me.

You can actually make the throughput regression disappear by altering 
the traffic pattern you are testing with.  In the past I have found that 
sending traffic in bursts where 4 frames belong to the same queue before 
moving to the next one essentially eliminated the dropped packets due to 
PCIe bandwidth limitations.  The trick is you need to have the Rx 
descriptor processing work in batches so that you can get multiple 
descriptors processed for each PCIe read/write.

- Alex

[dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)

2015-10-13 Thread Christoph Gysin

Hi David

Is there anything I can do to help getting this merged?

Thanks,
Chris

On Mon, Oct 5, 2015 at 12:44 PM, Dumitrescu, Cristian
 wrote:
>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Christoph Gysin
>> Sent: Tuesday, September 29, 2015 7:53 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)
>>
>> 'virtual' is a keyword and can't be used if the code is to compile with
>> C++ compilers.
>>
>> If rte_devargs.h was included in C++ code, compilation with clang++
>> failed with an error. g++ did not fail, but only because of a bug
>> that treats it as an anonymous struct with a decl-specifier which it
>> ignores.
>>
>> This simply renames the member to 'virt'.
>>
>> Signed-off-by: Christoph Gysin 
>> ---
>
> Acked-by: Cristian Dumitrescu 
>
> Christoph, please also copy David Marchand for further updates of this patch, 
> as David is the maintainer of this component. Whenever in doubt about the 
> maintainer, you can check file MAINTAINERS from DPDK root folder.
>
> Regards,
> Cristian
>



-- 
echo mailto: NOSPAM !#$.'<*>'|sed 's. ..'|tr "<*> !#:2" org at fr33z3

[dpdk-dev] dpdk 2.1.0: 40gig ports link is down

2015-10-13 Thread Shaham Fridenberg

Updating the firmware using the tool did solve the problem! Now link is up.

Thanks a lot for your time and help Stephen! :)

-Original Message-
From: Stephen Hemminger [mailto:step...@networkplumber.org] 
Sent: Monday, October 12, 2015 8:14 PM
To: Shaham Fridenberg
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] dpdk 2.1.0: 40gig ports link is down

On Mon, 12 Oct 2015 17:04:01 +
Shaham Fridenberg  wrote:

> Hey Stephen,
> 
> Thanks for your help.
> 
> I tried updating i40e driver to the latest version (from 1.0.11-k to 
> 1.3.39.1) but it didn't help.
> 
> By 'Compile i40e with DEBUG flag' you mean adding "CONFIG_RTE_LOG_LEVEL=8" to 
> defconfig_x86_64-wsm-linuxapp-gcc (assuming I'm compiling for westmere)?
> 
> Also, is there any log file generated in that case?
> 
> Thanks,
> Shaham

I was thinking of firmware update tool, which you need to run on Linux (w/o 
DPDK)  
https://downloadcenter.intel.com/download/24769/NVM-Update-Utility-for-Intel-Ethernet-Converged-Network-Adapter-XL710-X710-Series

In current DPDK, the config stuff is in common_linuxapp

[dpdk-dev] DPDK hash function related question

2015-10-13 Thread Dumitrescu, Cristian



> -Original Message-
> From: Yeddula, Avinash [mailto:ayeddula at ciena.com]
> Sent: Monday, October 12, 2015 6:03 PM
> To: Dumitrescu, Cristian; dev at dpdk.org; Bly, Mike
> Subject: RE: DPDK hash function related question
> 
> Hi Cristian,
> I have configured the hash function and it compile fine with "warnings". Since
> librte_hash vs librte_table is 32bit vs 64bit.
> 
> librte_hash library :
> /** Type of function that can be used for calculating the hash value. */
> typedef uint32_t (*rte_hash_function)(const void *key, uint32_t key_len,
> uint32_t init_val);
> 
> librte_table library:
> typedef uint64_t (*rte_table_hash_op_hash) (void *key,uint32_t
> key_size, uint64_t seed);
> 
> I could use one of these hash functions. This is one option, but our first
> priority is  to use crc hash or cukoo hash.
> https://github.com/scylladb/dpdk/blob/master/examples/ip_pipeline/pipeli
> ne/hash_func.h
> 
> We do not want to have those warning in our code. What do you suggest ?

Would function pointer conversion work?

> 
> Thanks
> -Avinash
> 
> -Original Message-
> From: Dumitrescu, Cristian [mailto:cristian.dumitrescu at intel.com]
> Sent: Tuesday, September 22, 2015 3:05 AM
> To: Yeddula, Avinash; dev at dpdk.org; Bly, Mike
> Subject: RE: DPDK hash function related question
> 
> Hi Avinash,
> 
> Yes, the hash function is configurable.
> 
> Are you using a DPDK release older than 2.1? In DPDK we moved away from
> test_hash to CRC-based hashes. Please take a look at DPDK release 2.1
> examples/ip_pipeline application: in pipeline_flow_classification_be.c, we
> use CRC-based hash functions defined in file hash_func.h from the same
> folder.
> 
> Regards,
> Cristian
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash
> > Sent: Tuesday, September 22, 2015 1:34 AM
> > To: dev at dpdk.org; Bly, Mike
> > Subject: [dpdk-dev] DPDK hash function related question
> >
> > Hello All,
> >
> > I'm DPDK extensible bucket hash in the rte_table library of packet
> > framework. My question is related to the actual hash function that
> > computes the hash signature.
> >
> > All the available examples have initialized it to test_hash.   I do not see 
> > any
> > hash function available in rte_table library , that computes the
> > actual signature
> >
> >
> >
> > struct rte_table_hash_ext_params   hash_table_params = {
> >
> > .key_size = TABLE_ENTRY_KEY_SIZE,
> >
> > .n_keys = TABLE_MAX_SIZE,
> >
> > .n_buckets = TABLE_MAX_BUCKET_COUNT,
> >
> > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT,
> >
> > .f_hash = test_hash,
> >
> > .seed = 0,
> >
> > .signature_offset = 0;
> >
> > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key),
> >
> > };
> >
> >
> >
> > So, I wanted to use hash functions from DPDK rte_hash library. This is
> > what I'm doing and looking at the code this looks ok to me.
> >
> > I'm at least a week or 2 away from testing this part of the code. I
> > wanted to confirm that, there is no fundamental flaw in using the DPDK
> > rte_hash library and rte_table library like this. Could someone confirm this
> please ?
> >
> >
> >
> > #define DEFAULT_HASH_FUNC rte_hash_crc
> >
> >
> >
> > struct rte_table_hash_ext_params   hash_table_params = {
> >
> > .key_size = TABLE_ENTRY_KEY_SIZE,
> >
> > .n_keys = TABLE_MAX_SIZE,
> >
> > .n_buckets = TABLE_MAX_BUCKET_COUNT,
> >
> > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT,
> >
> > .f_hash = DEFAULT_HASH_FUNC ,
> >
> > .seed = 0,
> >
> > .signature_offset = 0;
> >
> > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key),
> >
> > };
> >
> >
> >
> > Thanks
> >
> > -Avinash
> >
>

[dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall happened in following sequence start show port info 0

2015-10-13 Thread Jiuling Bie

Hi Pablo,

The issue is related to certain NIC(s). I observed this on Intel
82577LM(em). Basically show port info will read PHY registers to get link
status when lsc interrupt was disabled, which caused TX to stop. I don't
have other NICs so not sure it is a common issue or not.

Regards,
Jiuling

On Tue, Oct 13, 2015 at 5:07 AM, De Lara Guarch, Pablo <
pablo.de.lara.guarch at intel.com> wrote:

> Hi Jiuling,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jiuling Bie
> > Sent: Wednesday, October 07, 2015 5:54 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall
> happened
> > in following sequence start show port info 0
> >
> > ---
> >  app/test-pmd/testpmd.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > index 386bf84..45adefa 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -1779,6 +1779,7 @@ init_port_config(void)
> >   port = &ports[pid];
> >   port->dev_conf.rxmode = rx_mode;
> >   port->dev_conf.fdir_conf = fdir_conf;
> > + port->dev_conf.intr_conf.lsc = 1;
> >   if (nb_rxq > 1) {
> >   port->dev_conf.rx_adv_conf.rss_conf.rss_key =
> > NULL;
> >   port->dev_conf.rx_adv_conf.rss_conf.rss_hf =
> > rss_hf;
> > --
> > 1.9.1
>
> Several things about your patch:
> -  It looks like this is your first patch (plus the other one you sent a
> few minutes later): take a look at http://dpdk.org/dev
> - You forgot to sign off your patches (use --signoff with git commit)
> - The title of this patch is too long, shorten it and include more
> information in the body of the commit message.
> - I don't know what this patch is trying to solve exactly. It looks like
> you are saying that there is a bug
>   that makes TX stop when you run the following commands:
>testpmd> start
>testpmd> show port info 0
>
> I don't see such bug, could you explain better the steps to reproduce the
> issue?
>
> Thanks,
> Pablo
>
>

[dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)

2015-10-13 Thread David Marchand

Hello Christoph,

On Tue, Oct 13, 2015 at 11:10 AM, Christoph Gysin  wrote:

>
> Is there anything I can do to help getting this merged?
>

This is ok for me, cc-ing Thomas.


-- 
David Marchand

[dpdk-dev] [PATCH] librte_eal: Fix wrong header file for old gcc version

2015-10-13 Thread Bruce Richardson

On Mon, Aug 24, 2015 at 05:22:57PM +0800, Michael Qiu wrote:
> For __SSE3__, the corresponding header file should be pmmintrin.h,
> tmmintrin.h works for __SSSE3__.
> 
> Signed-off-by: Michael Qiu 

Since this is a bug-fix, it should probably have a fixes line in the commit.
Otherwise the change looks ok.

Acked-by: Bruce Richardson

[dpdk-dev] [PATCH v3 5/5] doc: modify release notes and deprecation notice for table and pipeline

2015-10-13 Thread Marcin Kerlin

The LIBABIVER number is incremented for table and pipeline libraries.
The release notes is updated and the deprecation announce is removed.

Signed-off-by: Maciej Gajdzica 
Acked-by: Cristian Dumitrescu 
---
 doc/guides/rel_notes/deprecation.rst | 3 ---
 doc/guides/rel_notes/release_2_2.rst | 6 --
 lib/librte_pipeline/Makefile | 2 +-
 lib/librte_pipeline/rte_pipeline_version.map | 8 
 lib/librte_table/Makefile| 2 +-
 5 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index fa55117..2bf2df4 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -53,9 +53,6 @@ Deprecation Notices
 * librte_table LPM: A new parameter to hold the table name will be added to
   the LPM table parameter structure.

-* librte_table: New functions for table entry bulk add/delete will be added
-  to the table operations structure.
-
 * librte_table hash: Key mask parameter will be added to the hash table
   parameter structure for 8-byte key and 16-byte key extendible bucket and
   LRU tables.
diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..b46d2ae 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -98,6 +98,8 @@ ABI Changes

 * The LPM structure is changed. The deprecated field mem_location is removed.

+* Added functions add/delete bulk to table and pipeline libraries.
+

 Shared Library Versions
 ---
@@ -122,7 +124,7 @@ The libraries prepended with a plus sign were incremented 
in this version.
+ librte_mbuf.so.2
  librte_mempool.so.1
  librte_meter.so.1
- librte_pipeline.so.1
+   + librte_pipeline.so.2
  librte_pmd_bond.so.1
+ librte_pmd_ring.so.2
  librte_port.so.1
@@ -130,6 +132,6 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_reorder.so.1
  librte_ring.so.1
  librte_sched.so.1
- librte_table.so.1
+   + librte_table.so.2
  librte_timer.so.1
  librte_vhost.so.1
diff --git a/lib/librte_pipeline/Makefile b/lib/librte_pipeline/Makefile
index 15e406b..1166d3c 100644
--- a/lib/librte_pipeline/Makefile
+++ b/lib/librte_pipeline/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_pipeline_version.map

-LIBABIVER := 1
+LIBABIVER := 2

 #
 # all source are stored in SRCS-y
diff --git a/lib/librte_pipeline/rte_pipeline_version.map 
b/lib/librte_pipeline/rte_pipeline_version.map
index 8f25d0f..4cc86f6 100644
--- a/lib/librte_pipeline/rte_pipeline_version.map
+++ b/lib/librte_pipeline/rte_pipeline_version.map
@@ -29,3 +29,11 @@ DPDK_2.1 {
rte_pipeline_table_stats_read;

 } DPDK_2.0;
+
+DPDK_2.2 {
+   global:
+
+   rte_pipeline_table_entry_add_bulk;
+   rte_pipeline_table_entry_delete_bulk;
+
+} DPDK_2.1;
diff --git a/lib/librte_table/Makefile b/lib/librte_table/Makefile
index c5b3eaf..7f02af3 100644
--- a/lib/librte_table/Makefile
+++ b/lib/librte_table/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_table_version.map

-LIBABIVER := 1
+LIBABIVER := 2

 #
 # all source are stored in SRCS-y
-- 
1.9.1

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [PATCH v3 4/5] ip_pipline: added cli commands for bulk add/delete to firewall pipeline

2015-10-13 Thread Marcin Kerlin

Added two new cli commands to firewall pipeline. Commands bulk add and
bulk delete takes as argument a file with rules to add/delete. The file
is parsed, and then rules are passed to backend functions which
add/delete records from pipeline tables.

Signed-off-by: Maciej Gajdzica 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/pipeline/pipeline_firewall.c  | 858 +
 examples/ip_pipeline/pipeline/pipeline_firewall.h  |  14 +
 .../ip_pipeline/pipeline/pipeline_firewall_be.c| 157 
 .../ip_pipeline/pipeline/pipeline_firewall_be.h|  38 +
 4 files changed, 1067 insertions(+)

diff --git a/examples/ip_pipeline/pipeline/pipeline_firewall.c 
b/examples/ip_pipeline/pipeline/pipeline_firewall.c
index f6924ab..4137923 100644
--- a/examples/ip_pipeline/pipeline/pipeline_firewall.c
+++ b/examples/ip_pipeline/pipeline/pipeline_firewall.c
@@ -51,6 +51,8 @@
 #include "pipeline_common_fe.h"
 #include "pipeline_firewall.h"

+#define BUF_SIZE   1024
+
 struct app_pipeline_firewall_rule {
struct pipeline_firewall_key key;
int32_t priority;
@@ -73,6 +75,18 @@ struct app_pipeline_firewall {
void *default_rule_entry_ptr;
 };

+struct app_pipeline_add_bulk_params {
+   struct pipeline_firewall_key *keys;
+   uint32_t n_keys;
+   uint32_t *priorities;
+   uint32_t *port_ids;
+};
+
+struct app_pipeline_del_bulk_params {
+   struct pipeline_firewall_key *keys;
+   uint32_t n_keys;
+};
+
 static void
 print_firewall_ipv4_rule(struct app_pipeline_firewall_rule *rule)
 {
@@ -256,6 +270,358 @@ app_pipeline_firewall_key_check_and_normalize(struct 
pipeline_firewall_key *key)
}
 }

+static int
+app_pipeline_add_bulk_parse_file(char *filename,
+   struct app_pipeline_add_bulk_params *params)
+{
+   FILE *f;
+   char file_buf[BUF_SIZE];
+   uint32_t i;
+   int status = 0;
+
+   f = fopen(filename, "r");
+   if (f == NULL)
+   return -1;
+
+   params->n_keys = 0;
+   while (fgets(file_buf, BUF_SIZE, f) != NULL)
+   params->n_keys++;
+   rewind(f);
+
+   if (params->n_keys == 0) {
+   status = -1;
+   goto end;
+   }
+
+   params->keys = rte_malloc(NULL,
+   params->n_keys * sizeof(struct pipeline_firewall_key),
+   RTE_CACHE_LINE_SIZE);
+   if (params->keys == NULL) {
+   status = -1;
+   goto end;
+   }
+
+   params->priorities = rte_malloc(NULL,
+   params->n_keys * sizeof(uint32_t),
+   RTE_CACHE_LINE_SIZE);
+   if (params->priorities == NULL) {
+   status = -1;
+   goto end;
+   }
+
+   params->port_ids = rte_malloc(NULL,
+   params->n_keys * sizeof(uint32_t),
+   RTE_CACHE_LINE_SIZE);
+   if (params->port_ids == NULL) {
+   status = -1;
+   goto end;
+   }
+
+   i = 0;
+   while (fgets(file_buf, BUF_SIZE, f) != NULL) {
+   char *str;
+
+   str = strtok(file_buf, " ");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->priorities[i] = atoi(str);
+
+   str = strtok(NULL, " .");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.src_ip = atoi(str)<<24;
+
+   str = strtok(NULL, " .");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.src_ip |= atoi(str)<<16;
+
+   str = strtok(NULL, " .");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.src_ip |= atoi(str)<<8;
+
+   str = strtok(NULL, " .");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.src_ip |= atoi(str);
+
+   str = strtok(NULL, " ");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.src_ip_mask = atoi(str);
+
+   str = strtok(NULL, " .");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.dst_ip = atoi(str)<<24;
+
+   str = strtok(NULL, " .");
+   if (str == NULL) {
+   status = -1;
+   goto end;
+   }
+   params->keys[i].key.ipv4_5tuple.dst_ip |= ato

[dpdk-dev] [PATCH v3 3/5] test_table: added check for bulk add/delete to acl table unit test

2015-10-13 Thread Marcin Kerlin

Added to acl table unit test check for bulk add and bulk delete.

Signed-off-by: Maciej Gajdzica 
Acked-by: Cristian Dumitrescu 
---
 app/test/test_table_acl.c | 166 ++
 1 file changed, 166 insertions(+)

diff --git a/app/test/test_table_acl.c b/app/test/test_table_acl.c
index e4e9b9c..fe8e545 100644
--- a/app/test/test_table_acl.c
+++ b/app/test/test_table_acl.c
@@ -253,6 +253,94 @@ parse_cb_ipv4_rule(char *str, struct 
rte_table_acl_rule_add_params *v)
return 0;
 }

+static int
+parse_cb_ipv4_rule_del(char *str, struct rte_table_acl_rule_delete_params *v)
+{
+   int i, rc;
+   char *s, *sp, *in[CB_FLD_NUM];
+   static const char *dlm = " \t\n";
+
+   /*
+   ** Skip leading '@'
+   */
+   if (strchr(str, '@') != str)
+   return -EINVAL;
+
+   s = str + 1;
+
+   /*
+   * Populate the 'in' array with the location of each
+   * field in the string we're parsing
+   */
+   for (i = 0; i != DIM(in); i++) {
+   in[i] = strtok_r(s, dlm, &sp);
+   if (in[i] == NULL)
+   return -EINVAL;
+   s = NULL;
+   }
+
+   /* Parse x.x.x.x/x */
+   rc = parse_ipv4_net(in[CB_FLD_SRC_ADDR],
+   &v->field_value[SRC_FIELD_IPV4].value.u32,
+   &v->field_value[SRC_FIELD_IPV4].mask_range.u32);
+   if (rc != 0) {
+   RTE_LOG(ERR, PIPELINE, "failed to read src address/mask: %s\n",
+   in[CB_FLD_SRC_ADDR]);
+   return rc;
+   }
+
+   printf("V=%u, mask=%u\n", v->field_value[SRC_FIELD_IPV4].value.u32,
+   v->field_value[SRC_FIELD_IPV4].mask_range.u32);
+
+   /* Parse x.x.x.x/x */
+   rc = parse_ipv4_net(in[CB_FLD_DST_ADDR],
+   &v->field_value[DST_FIELD_IPV4].value.u32,
+   &v->field_value[DST_FIELD_IPV4].mask_range.u32);
+   if (rc != 0) {
+   RTE_LOG(ERR, PIPELINE, "failed to read dest address/mask: %s\n",
+   in[CB_FLD_DST_ADDR]);
+   return rc;
+   }
+
+   printf("V=%u, mask=%u\n", v->field_value[DST_FIELD_IPV4].value.u32,
+   v->field_value[DST_FIELD_IPV4].mask_range.u32);
+   /* Parse n:n */
+   rc = parse_port_range(in[CB_FLD_SRC_PORT_RANGE],
+   &v->field_value[SRCP_FIELD_IPV4].value.u16,
+   &v->field_value[SRCP_FIELD_IPV4].mask_range.u16);
+   if (rc != 0) {
+   RTE_LOG(ERR, PIPELINE, "failed to read source port range: %s\n",
+   in[CB_FLD_SRC_PORT_RANGE]);
+   return rc;
+   }
+
+   printf("V=%u, mask=%u\n", v->field_value[SRCP_FIELD_IPV4].value.u16,
+   v->field_value[SRCP_FIELD_IPV4].mask_range.u16);
+   /* Parse n:n */
+   rc = parse_port_range(in[CB_FLD_DST_PORT_RANGE],
+   &v->field_value[DSTP_FIELD_IPV4].value.u16,
+   &v->field_value[DSTP_FIELD_IPV4].mask_range.u16);
+   if (rc != 0) {
+   RTE_LOG(ERR, PIPELINE, "failed to read dest port range: %s\n",
+   in[CB_FLD_DST_PORT_RANGE]);
+   return rc;
+   }
+
+   printf("V=%u, mask=%u\n", v->field_value[DSTP_FIELD_IPV4].value.u16,
+   v->field_value[DSTP_FIELD_IPV4].mask_range.u16);
+   /* parse 0/0xnn */
+   GET_CB_FIELD(in[CB_FLD_PROTO],
+   v->field_value[PROTO_FIELD_IPV4].value.u8,
+   0, UINT8_MAX, '/');
+   GET_CB_FIELD(in[CB_FLD_PROTO],
+   v->field_value[PROTO_FIELD_IPV4].mask_range.u8,
+   0, UINT8_MAX, 0);
+
+   printf("V=%u, mask=%u\n",
+   (unsigned int)v->field_value[PROTO_FIELD_IPV4].value.u8,
+   v->field_value[PROTO_FIELD_IPV4].mask_range.u8);
+   return 0;
+}

 /*
  * The format for these rules DO NOT need the port ranges to be
@@ -393,6 +481,84 @@ setup_acl_pipeline(void)
}
}

+   /* Add bulk entries to tables */
+   for (i = 0; i < N_PORTS; i++) {
+   struct rte_table_acl_rule_add_params keys[5];
+   struct rte_pipeline_table_entry entries[5];
+   struct rte_table_acl_rule_add_params *key_array[5];
+   struct rte_pipeline_table_entry *table_entries[5];
+   int key_found[5];
+   struct rte_pipeline_table_entry *table_entries_ptr[5];
+   struct rte_pipeline_table_entry entries_ptr[5];
+
+   parser = parse_cb_ipv4_rule;
+   for (n = 0; n < 5; n++) {
+   memset(&keys[n], 0, sizeof(struct 
rte_table_acl_rule_add_params));
+   key_array[n] = &keys[n];
+
+   snprintf(line, sizeof(line), "%s", lines[n]);
+   printf("PARSING [%s]\n", line);
+
+   ret = parser(line, &keys[n]);
+   if (ret != 0) {
+   RTE_LOG(ERR, PIPELINE,
+

[dpdk-dev] [PATCH v3 2/5] pipeline: added bulk add/delete functions for table

2015-10-13 Thread Marcin Kerlin

Added functions for adding/deleting multiple records to table owned by
pipeline.

Signed-off-by: Maciej Gajdzica 
Signed-off-by: Marcin Kerlin 
Acked-by: Cristian Dumitrescu 
---
 lib/librte_pipeline/rte_pipeline.c | 106 +
 lib/librte_pipeline/rte_pipeline.h |  64 ++
 2 files changed, 170 insertions(+)

diff --git a/lib/librte_pipeline/rte_pipeline.c 
b/lib/librte_pipeline/rte_pipeline.c
index bd700d2..56022f4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -587,6 +587,112 @@ rte_pipeline_table_entry_delete(struct rte_pipeline *p,
return (table->ops.f_delete)(table->h_table, key, key_found, entry);
 }

+int rte_pipeline_table_entry_add_bulk(struct rte_pipeline *p,
+   uint32_t table_id,
+   void **keys,
+   struct rte_pipeline_table_entry **entries,
+   uint32_t n_keys,
+   int *key_found,
+   struct rte_pipeline_table_entry **entries_ptr)
+{
+   struct rte_table *table;
+   uint32_t i;
+
+   /* Check input arguments */
+   if (p == NULL) {
+   RTE_LOG(ERR, PIPELINE, "%s: pipeline parameter is NULL\n",
+   __func__);
+   return -EINVAL;
+   }
+
+   if (keys == NULL) {
+   RTE_LOG(ERR, PIPELINE, "%s: keys parameter is NULL\n", 
__func__);
+   return -EINVAL;
+   }
+
+   if (entries == NULL) {
+   RTE_LOG(ERR, PIPELINE, "%s: entries parameter is NULL\n",
+   __func__);
+   return -EINVAL;
+   }
+
+   if (table_id >= p->num_tables) {
+   RTE_LOG(ERR, PIPELINE,
+   "%s: table_id %d out of range\n", __func__, table_id);
+   return -EINVAL;
+   }
+
+   table = &p->tables[table_id];
+
+   if (table->ops.f_add_bulk == NULL) {
+   RTE_LOG(ERR, PIPELINE, "%s: f_add_bulk function pointer NULL\n",
+   __func__);
+   return -EINVAL;
+   }
+
+   for (i = 0; i < n_keys; i++) {
+   if ((entries[i]->action == RTE_PIPELINE_ACTION_TABLE) &&
+   table->table_next_id_valid &&
+   (entries[i]->table_id != table->table_next_id)) {
+   RTE_LOG(ERR, PIPELINE,
+   "%s: Tree-like topologies not allowed\n", 
__func__);
+   return -EINVAL;
+   }
+   }
+
+   /* Add entry */
+   for (i = 0; i < n_keys; i++) {
+   if ((entries[i]->action == RTE_PIPELINE_ACTION_TABLE) &&
+   (table->table_next_id_valid == 0)) {
+   table->table_next_id = entries[i]->table_id;
+   table->table_next_id_valid = 1;
+   }
+   }
+
+   return (table->ops.f_add_bulk)(table->h_table, keys, (void **) entries,
+   n_keys, key_found, (void **) entries_ptr);
+}
+
+int rte_pipeline_table_entry_delete_bulk(struct rte_pipeline *p,
+   uint32_t table_id,
+   void **keys,
+   uint32_t n_keys,
+   int *key_found,
+   struct rte_pipeline_table_entry **entries)
+{
+   struct rte_table *table;
+
+   /* Check input arguments */
+   if (p == NULL) {
+   RTE_LOG(ERR, PIPELINE, "%s: pipeline parameter NULL\n",
+   __func__);
+   return -EINVAL;
+   }
+
+   if (keys == NULL) {
+   RTE_LOG(ERR, PIPELINE, "%s: key parameter is NULL\n",
+   __func__);
+   return -EINVAL;
+   }
+
+   if (table_id >= p->num_tables) {
+   RTE_LOG(ERR, PIPELINE,
+   "%s: table_id %d out of range\n", __func__, table_id);
+   return -EINVAL;
+   }
+
+   table = &p->tables[table_id];
+
+   if (table->ops.f_delete_bulk == NULL) {
+   RTE_LOG(ERR, PIPELINE,
+   "%s: f_delete function pointer NULL\n", __func__);
+   return -EINVAL;
+   }
+
+   return (table->ops.f_delete_bulk)(table->h_table, keys, n_keys, 
key_found,
+   (void **) entries);
+}
+
 /*
  * Port
  *
diff --git a/lib/librte_pipeline/rte_pipeline.h 
b/lib/librte_pipeline/rte_pipeline.h
index 59e0710..5459324 100644
--- a/lib/librte_pipeline/rte_pipeline.h
+++ b/lib/librte_pipeline/rte_pipeline.h
@@ -466,6 +466,70 @@ int rte_pipeline_table_entry_delete(struct rte_pipeline *p,
struct rte_pipeline_table_entry *entry);

 /**
+ * Pipeline table entry add bulk
+ *
+ * @param p
+ *   Handle to pipeline instance
+ * @param table_id
+ *   Table ID (returned by previous invocation of pipeline table create)
+ * @param keys
+ *   Array containing table entry keys
+ * @param entries
+ *   Array containung new contents for every table entry identified by key
+ * @param n_keys
+ *   Number of keys to add
+ * @param key_found
+ *   On successful invocation, key_found for

[dpdk-dev] [PATCH v3 1/5] table: added bulk add/delete functions for table

2015-10-13 Thread Marcin Kerlin

New functions prototypes for bulk add/delete added to table API. New
functions allows adding/deleting multiple records with single function
call. For now those functions are implemented only for ACL table. For
other tables these function pointers are set to NULL.

Signed-off-by: Maciej Gajdzica 
Acked-by: Cristian Dumitrescu 
---
 lib/librte_table/rte_table.h|  85 -
 lib/librte_table/rte_table_acl.c| 309 
 lib/librte_table/rte_table_array.c  |   2 +
 lib/librte_table/rte_table_hash_ext.c   |   4 +
 lib/librte_table/rte_table_hash_key16.c |   4 +
 lib/librte_table/rte_table_hash_key32.c |   4 +
 lib/librte_table/rte_table_hash_key8.c  |   8 +
 lib/librte_table/rte_table_hash_lru.c   |   4 +
 lib/librte_table/rte_table_lpm.c|   2 +
 lib/librte_table/rte_table_lpm_ipv6.c   |   2 +
 lib/librte_table/rte_table_stub.c   |   2 +
 11 files changed, 420 insertions(+), 6 deletions(-)

diff --git a/lib/librte_table/rte_table.h b/lib/librte_table/rte_table.h
index c13d40d..720514e 100644
--- a/lib/librte_table/rte_table.h
+++ b/lib/librte_table/rte_table.h
@@ -154,6 +154,77 @@ typedef int (*rte_table_op_entry_delete)(
void *entry);

 /**
+ * Lookup table entry add bulk
+ *
+ * @param table
+ *   Handle to lookup table instance
+ * @param key
+ *   Array containing lookup keys
+ * @param entries
+ *   Array containing data to be associated with each key. Every item in the
+ *   array has to point to a valid memory buffer where the first entry_size
+ *   bytes (table create parameter) are populated with the data.
+ * @param n_keys
+ *   Number of keys to add
+ * @param key_found
+ *   After successful invocation, key_found for every item in the array is set
+ *   to a value different than 0 if the current key is already present in the
+ *   table and to 0 if not. This pointer has to be set to a valid memory
+ *   location before the table entry add function is called.
+ * @param entries_ptr
+ *   After successful invocation, array *entries_ptr stores the handle to the
+ *   table entry containing the data associated with every key. This handle can
+ *   be used to perform further read-write accesses to this entry. This handle
+ *   is valid until the key is deleted from the table or the same key is
+ *   re-added to the table, typically to associate it with different data. This
+ *   pointer has to be set to a valid memory location before the function is
+ *   called.
+ * @return
+ *   0 on success, error code otherwise
+ */
+typedef int (*rte_table_op_entry_add_bulk)(
+   void *table,
+   void **keys,
+   void **entries,
+   uint32_t n_keys,
+   int *key_found,
+   void **entries_ptr);
+
+/**
+ * Lookup table entry delete bulk
+ *
+ * @param table
+ *   Handle to lookup table instance
+ * @param key
+ *   Array containing lookup keys
+ * @param n_keys
+ *   Number of keys to delete
+ * @param key_found
+ *   After successful invocation, key_found for every item in the array is set
+ *   to a value different than 0if the current key was present in the table
+ *   before the delete operation was performed and to 0 if not. This pointer
+ *   has to be set to a valid memory location before the table entry delete
+ *   function is called.
+ * @param entries
+ *   If entries pointer is NULL, this pointer is ignored for every entry found.
+ *   Else, after successful invocation, if specific key is found in the table
+ *   (key_found is different than 0 for this item after function call is
+ *   completed) and item of entry array points to a valid buffer (entry is set
+ *   to a value different than NULL before the function is called), then the
+ *   first entry_size bytes (table create parameter) in *entry store a copy of
+ *   table entry that contained the data associated with the current key before
+ *   the key was deleted.
+ * @return
+ *   0 on success, error code otherwise
+ */
+typedef int (*rte_table_op_entry_delete_bulk)(
+   void *table,
+   void **keys,
+   uint32_t n_keys,
+   int *key_found,
+   void **entries);
+
+/**
  * Lookup table lookup
  *
  * @param table
@@ -213,12 +284,14 @@ typedef int (*rte_table_op_stats_read)(

 /** Lookup table interface defining the lookup table operation */
 struct rte_table_ops {
-   rte_table_op_create f_create;   /**< Create */
-   rte_table_op_free f_free;   /**< Free */
-   rte_table_op_entry_add f_add;   /**< Entry add */
-   rte_table_op_entry_delete f_delete; /**< Entry delete */
-   rte_table_op_lookup f_lookup;   /**< Lookup */
-   rte_table_op_stats_read f_stats;/**< Stats */
+   rte_table_op_create f_create; /**< Create */
+   rte_table_op_free f_free; /**< Free */
+   rte_table_op_entry_add f_add; /**< Entry add */
+   rte_table_op_entry_delete f_delete;   /**< Entry delete */
+   rte_table_op_entry_add_bulk f_

[dpdk-dev] [PATCH v3 0/5] pipeline: add bulk add/delete functions for table

2015-10-13 Thread Marcin Kerlin

This patch adds bulk add/delete functions for tables used by pipelines. It
allows for adding/deleting many rules to pipeline tables in one function call.
It is particulary useful for firewall pipeline which is using ACL table. After
every add or delete, table is rebuild which leads to very long times when
trying to add/delete many entries.

v2:
* Incremented the LIBABIVER number
* Updated release notes
* Removed deprecation announce

v3:
* Updated a Doxygen comment

Acked-by: Cristian Dumitrescu 

Maciej Gajdzica (5):
  table: added bulk add/delete functions for table
  pipeline: added bulk add/delete functions for table
  test_table: added check for bulk add/delete to acl table unit test
  ip_pipline: added cli commands for bulk   add/delete to firewall
pipeline
  doc: modify release notes and deprecation notice for table and
pipeline

 app/test/test_table_acl.c  | 166 
 doc/guides/rel_notes/deprecation.rst   |   3 -
 doc/guides/rel_notes/release_2_2.rst   |   6 +-
 examples/ip_pipeline/pipeline/pipeline_firewall.c  | 858 +
 examples/ip_pipeline/pipeline/pipeline_firewall.h  |  14 +
 .../ip_pipeline/pipeline/pipeline_firewall_be.c| 157 
 .../ip_pipeline/pipeline/pipeline_firewall_be.h|  38 +
 lib/librte_pipeline/Makefile   |   2 +-
 lib/librte_pipeline/rte_pipeline.c | 106 +++
 lib/librte_pipeline/rte_pipeline.h |  64 ++
 lib/librte_pipeline/rte_pipeline_version.map   |   8 +
 lib/librte_table/Makefile  |   2 +-
 lib/librte_table/rte_table.h   |  85 +-
 lib/librte_table/rte_table_acl.c   | 309 
 lib/librte_table/rte_table_array.c |   2 +
 lib/librte_table/rte_table_hash_ext.c  |   4 +
 lib/librte_table/rte_table_hash_key16.c|   4 +
 lib/librte_table/rte_table_hash_key32.c|   4 +
 lib/librte_table/rte_table_hash_key8.c |   8 +
 lib/librte_table/rte_table_hash_lru.c  |   4 +
 lib/librte_table/rte_table_lpm.c   |   2 +
 lib/librte_table/rte_table_lpm_ipv6.c  |   2 +
 lib/librte_table/rte_table_stub.c  |   2 +
 23 files changed, 1837 insertions(+), 13 deletions(-)

-- 
1.9.1

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall happened in following sequence start show port info 0

2015-10-13 Thread De Lara Guarch, Pablo

Hi Jiuling,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jiuling Bie
> Sent: Wednesday, October 07, 2015 5:54 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall happened
> in following sequence start show port info 0
> 
> ---
>  app/test-pmd/testpmd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 386bf84..45adefa 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1779,6 +1779,7 @@ init_port_config(void)
>   port = &ports[pid];
>   port->dev_conf.rxmode = rx_mode;
>   port->dev_conf.fdir_conf = fdir_conf;
> + port->dev_conf.intr_conf.lsc = 1;
>   if (nb_rxq > 1) {
>   port->dev_conf.rx_adv_conf.rss_conf.rss_key =
> NULL;
>   port->dev_conf.rx_adv_conf.rss_conf.rss_hf =
> rss_hf;
> --
> 1.9.1

Several things about your patch:
-  It looks like this is your first patch (plus the other one you sent a few 
minutes later): take a look at http://dpdk.org/dev
- You forgot to sign off your patches (use --signoff with git commit)
- The title of this patch is too long, shorten it and include more information 
in the body of the commit message.
- I don't know what this patch is trying to solve exactly. It looks like you 
are saying that there is a bug
  that makes TX stop when you run the following commands:
   testpmd> start
   testpmd> show port info 0

I don't see such bug, could you explain better the steps to reproduce the issue?

Thanks,
Pablo

[dpdk-dev] [PATCH] ethdev: remove the imissed deprecation tag

2015-10-13 Thread Stephen Hemminger

On Wed, 30 Sep 2015 09:20:56 +0100
Maryam Tahhan  wrote:

> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index fa06554..78bd94d 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -194,8 +194,7 @@ struct rte_eth_stats {
>   uint64_t opackets;  /**< Total number of successfully transmitted 
> packets.*/
>   uint64_t ibytes;/**< Total number of successfully received bytes. */
>   uint64_t obytes;/**< Total number of successfully transmitted 
> bytes. */
> - uint64_t imissed;
> - /**< Deprecated; Total of RX missed packets (e.g full FIFO). */

If you want to deprecate a structure field, it works better to mark
it with __attribute__((deprecated)) that way all use of that field in
code will be flagged.

Comments are advisory only and often never spotted.

[dpdk-dev] [PATCH v2 1/2] fm10k: enable TSO support

2015-10-13 Thread Qiu, Michael

On 2015/10/12 14:38, Wang Xiao W wrote:
> This patch enables fm10k TSO feature for both non-tunneling packet
> and tunneling packet.
>
> Signed-off-by: Wang Xiao W 
> ---

Acked-by: Michael Qiu

[dpdk-dev] IXGBE RX packet loss with 5+ cores

2015-10-13 Thread Venkatesan, Venky

On 10/13/2015 7:47 AM, Sanford, Robert wrote:
 [Robert:]
 1. The 82599 device supports up to 128 queues. Why do we see trouble
 with as few as 5 queues? What could limit the system (and one port
 controlled by 5+ cores) from receiving at line-rate without loss?

 2. As far as we can tell, the RX path only touches the device
 registers when it updates a Receive Descriptor Tail register (RDT[n]),
 roughly every rx_free_thresh packets. Is there a big difference
 between one core doing this and N cores doing it 1/N as often?
>>> [Stephen:]
>>> As you add cores, there is more traffic on the PCI bus from each core
>>> polling. There is a fix number of PCI bus transactions per second
>>> possible.
>>> Each core is increasing the number of useless (empty) transactions.
>> [Bruce:]
>> The polling for packets by the core should not be using PCI bandwidth
>> directly,
>> as the ixgbe driver (and other drivers) check for the DD bit being set on
>> the
>> descriptor in memory/cache.
> I was preparing to reply with the same point.
>
>>> [Stephen:] Why do you think adding more cores will help?
> We're using run-to-completion and sometimes spend too many cycles per pkt.
> We realize that we need to move to io+workers model, but wanted a better
> understanding of the dynamics involved here.
>
>> [Bruce:] However, using an increased number of queues can
>> use PCI bandwidth in other ways, for instance, with more queues you
>> reduce the
>> amount of descriptor coalescing that can be done by the NICs, so that
>> instead of
>> having a single transaction of 4 descriptors to one queue, the NIC may
>> instead
>> have to do 4 transactions each writing 1 descriptor to 4 different
>> queues. This
>> is possibly why sending all traffic to a single queue works ok - the
>> polling on
>> the other queues is still being done, but has little effect.
> Brilliant! This idea did not occur to me.
To add a little more detail - this ends up being both a bandwidth and a 
transaction bottleneck. Not only do you add an increased transaction 
count, you also add a huge amount of bandwidth overhead (each 16 byte 
descriptor is preceded by a PCI-E TLP which is about the same size). So 
what ends up happening in the case where the incoming packets are 
bifurcated to different queues (1 per queue) is that you have 2x the 
number of transactions (1 for the packet and one for the descriptor) and 
then we essentially double the bandwidth used because you now have the 
TLP overhead per descriptor write.

There is a second issue that also pops up when coalescing breaks down - 
testpmd essentially in iofwd mode simply transmits the number of packets 
it receives (i.e. Rx (n) -> Tx (n)). This means that the transmit side 
also suffers from writing one descriptor at a time for output (i.e. when 
the NIC pulls a descriptor cache line to transmit, it finds 1 valid 
descriptor). When a second descriptor is transmitted on the same it will 
again pull and find only one valid descriptor. That is another 2x 
increase in transaction count as well as PCI-E TLP overhead.

The third hit actually comes from the transmit side when transmitting 
one packet at a time. The last part of the transmit process is a MMIO 
write to the tail pointer. This is a costly operation (since it is a 
un-cacheable memory operation) in terms of cycles, not to mention again 
with heavy PCI-E overhead (TLP + 4 byte write) and increased transaction 
counts on PCI-E.

Hope that explains all the touch-points as to why you see the drop off 
in performance you see.
>
>
>
> --
> Thanks guys,
> Robert
>

[dpdk-dev] Host kernel panic when running ixgbe NIC in pci passthrough

2015-10-13 Thread Kyle Larose

Hello,

I have a system using dpdk 1.8 with 82599ES ixgbe NICs. These are
provided to a virtual guest via pci passthrough. Our dpdk application
on the guest takes control of the NICs using igb_uio.

On certain systems, under conditions we have not yet figured out,
sending traffic causes the host to kernel panic. It looks like a pci
device is reporting a fatal error.

>From the error, the issue looks to be either the bridge connected to
the ixgbe, or the ixgbe itself; I cannot decipher the message beyond
that.

This has happened on three different machines, so I do not think it is
bad hardware.

I was wondering if anybody has run into this before, and if they have
any solutions. I tried searching the mailing list, but couldn't find
anything related.


3108395.524535] {1}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 3
[3108395.533959] {1}[Hardware Error]: APEI generic hardware error status
[3108395.541149] {1}[Hardware Error]: severity: 1, fatal
[3108395.546785] {1}[Hardware Error]: section: 0, severity: 1, fatal
[3108395.553586] {1}[Hardware Error]: flags: 0x01
[3108395.558543] {1}[Hardware Error]: primary
[3108395.563113] {1}[Hardware Error]: section_type: PCIe error
[3108395.569332] {1}[Hardware Error]: port_type: 6, downstream switch port
[3108395.576715] {1}[Hardware Error]: version: 1.16
[3108395.581866] {1}[Hardware Error]: command: 0x0407, status: 0x0010
[3108395.588763] {1}[Hardware Error]: device_id: :05:01.0
[3108395.594886] {1}[Hardware Error]: slot: 0
[3108395.599455] {1}[Hardware Error]: secondary_bus: 0x06
[3108395.605189] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724
[3108395.612572] {1}[Hardware Error]: class_code: 000406
[3108395.618208] {1}[Hardware Error]: bridge: secondary_status:
0x, control: 0x0003
[3108395.626853] {1}[Hardware Error]: section: 1, severity: 1, fatal
[3108395.633653] {1}[Hardware Error]: flags: 0x01
[3108395.638611] {1}[Hardware Error]: primary
[3108395.643179] {1}[Hardware Error]: section_type: PCIe error
[3108395.649396] {1}[Hardware Error]: port_type: 6, downstream switch port
[3108395.656778] {1}[Hardware Error]: version: 1.16
[3108395.661930] {1}[Hardware Error]: command: 0x0407, status: 0x0010
[3108395.668829] {1}[Hardware Error]: device_id: :05:09.0
[3108395.674951] {1}[Hardware Error]: slot: 0
[3108395.679521] {1}[Hardware Error]: secondary_bus: 0x09
[3108395.685254] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724
[3108395.692636] {1}[Hardware Error]: class_code: 000406
[3108395.698272] {1}[Hardware Error]: bridge: secondary_status:
0x, control: 0x0003
[3108395.706915] Kernel panic - not syncing: Fatal hardware error!

:05:01.0 is a PLX pci bridge. It has two ixgbe NICs connected to
it. Likewise with :05:09.0.

Here is the boot cmdline on the host (we're using iommu):

BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64
root=UUID=57d79ff0-1152-46fb-a619-b2a102de3d5f ro
console=ttyS0,115200n8 vconsole.font=latarcyrheb-sun16
crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol0
vconsole.keymap=us LANG=en_US.UTF-8 intel_iommu=on

Any help would be greatly appreciated.

Thanks,

Kyle

[dpdk-dev] DPDK hash function related question

2015-10-13 Thread De Lara Guarch, Pablo

Hi Avinash,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash
> Sent: Monday, October 12, 2015 6:03 PM
> To: Dumitrescu, Cristian; dev at dpdk.org; Bly, Mike
> Subject: Re: [dpdk-dev] DPDK hash function related question
> 
> Hi Cristian,
> I have configured the hash function and it compile fine with "warnings". Since
> librte_hash vs librte_table is 32bit vs 64bit.
> 
> librte_hash library :
> /** Type of function that can be used for calculating the hash value. */
> typedef uint32_t (*rte_hash_function)(const void *key, uint32_t key_len,
> uint32_t init_val);
> 
> librte_table library:
> typedef uint64_t (*rte_table_hash_op_hash) (void *key,uint32_t
> key_size, uint64_t seed);
> 
> I could use one of these hash functions. This is one option, but our first
> priority is  to use crc hash or cukoo hash.

Mind that cuckoo hash is not a hash function, but a method for resolving 
collisions in a hash table.

> https://github.com/scylladb/dpdk/blob/master/examples/ip_pipeline/pipeli
> ne/hash_func.h
> 
> We do not want to have those warning in our code. What do you suggest ?
> 
> Thanks
> -Avinash
> 
> -Original Message-
> From: Dumitrescu, Cristian [mailto:cristian.dumitrescu at intel.com]
> Sent: Tuesday, September 22, 2015 3:05 AM
> To: Yeddula, Avinash; dev at dpdk.org; Bly, Mike
> Subject: RE: DPDK hash function related question
> 
> Hi Avinash,
> 
> Yes, the hash function is configurable.
> 
> Are you using a DPDK release older than 2.1? In DPDK we moved away from
> test_hash to CRC-based hashes. Please take a look at DPDK release 2.1
> examples/ip_pipeline application: in pipeline_flow_classification_be.c, we
> use CRC-based hash functions defined in file hash_func.h from the same
> folder.
> 
> Regards,
> Cristian
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash
> > Sent: Tuesday, September 22, 2015 1:34 AM
> > To: dev at dpdk.org; Bly, Mike
> > Subject: [dpdk-dev] DPDK hash function related question
> >
> > Hello All,
> >
> > I'm DPDK extensible bucket hash in the rte_table library of packet
> > framework. My question is related to the actual hash function that
> > computes the hash signature.
> >
> > All the available examples have initialized it to test_hash.   I do not see 
> > any
> > hash function available in rte_table library , that computes the
> > actual signature
> >
> >
> >
> > struct rte_table_hash_ext_params   hash_table_params = {
> >
> > .key_size = TABLE_ENTRY_KEY_SIZE,
> >
> > .n_keys = TABLE_MAX_SIZE,
> >
> > .n_buckets = TABLE_MAX_BUCKET_COUNT,
> >
> > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT,
> >
> > .f_hash = test_hash,
> >
> > .seed = 0,
> >
> > .signature_offset = 0;
> >
> > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key),
> >
> > };
> >
> >
> >
> > So, I wanted to use hash functions from DPDK rte_hash library. This is
> > what I'm doing and looking at the code this looks ok to me.
> >
> > I'm at least a week or 2 away from testing this part of the code. I
> > wanted to confirm that, there is no fundamental flaw in using the DPDK
> > rte_hash library and rte_table library like this. Could someone confirm this
> please ?
> >
> >
> >
> > #define DEFAULT_HASH_FUNC rte_hash_crc
> >
> >
> >
> > struct rte_table_hash_ext_params   hash_table_params = {
> >
> > .key_size = TABLE_ENTRY_KEY_SIZE,
> >
> > .n_keys = TABLE_MAX_SIZE,
> >
> > .n_buckets = TABLE_MAX_BUCKET_COUNT,
> >
> > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT,
> >
> > .f_hash = DEFAULT_HASH_FUNC ,
> >
> > .seed = 0,
> >
> > .signature_offset = 0;
> >
> > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key),
> >
> > };
> >
> >
> >
> > Thanks
> >
> > -Avinash
> >
>

[dpdk-dev] [PATCH] examples/vmdq: Fix the core dump issue when mem_pool is more than 34

2015-10-13 Thread De Lara Guarch, Pablo

Hi Xutao,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xutao Sun
> Sent: Tuesday, October 13, 2015 8:29 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/vmdq: Fix the core dump issue when
> mem_pool is more than 34
> 
> Macro MAX_QUEUES was defined to 128, only allow 16 mem_pools in
> theory.
> When running vmdq_app with more than 34 mem_pools,
> it will cause the core_dump issue.
> Change MAX_QUEUES to 1024 will solve this issue.
> 
> Signed-off-by: Xutao Sun 
> ---
>  examples/vmdq/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c
> index a142d49..b463cfb 100644
> --- a/examples/vmdq/main.c
> +++ b/examples/vmdq/main.c
> @@ -69,7 +69,7 @@
>  #include 
>  #include 
> 
> -#define MAX_QUEUES 128
> +#define MAX_QUEUES 1024
>  /*
>   * For 10 GbE, 128 queues require roughly
>   * 128*512 (RX/TX_queue_nb * RX/TX_ring_descriptors_nb) per port.
> --
> 1.9.3

Just for clarification, when you say mem_pools, do you mean vmdq pools?
Also, if you are going to increase MAX_QUEUES, shouldn't you increase the 
NUM_MBUFS_PER_PORT?
Looking at the comment below, looks like there is a calculation of number of 
mbufs based on number of queues.
Plus, I assume 128 is the maximum number of queues per port, and as far as I 
know,
only Fortville supports 256 as maximum.

Thanks,
Pablo

[dpdk-dev] [PATCH] testpmd: modify the mac of csum forwarding

2015-10-13 Thread Qiu, Michael

Hi, Thomas

Any comments on this patch? Is it suitable for DPDK?

Thanks,
Michael
On 2015/8/26 14:12, Liu, Jijiang wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Qiu
>> Sent: Friday, August 07, 2015 11:29 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH] testpmd: modify the mac of csum forwarding
>>
>> For some ethnet-switch like intel RRC, all the packet forwarded out by DPDK
>> will be dropped in switch side, so the packet generator will never receive 
>> the
>> packet.
>>
>> Signed-off-by: Michael Qiu 
>> ---
>>  app/test-pmd/csumonly.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
>> 1bf3485..bf8af1d 100644
>> --- a/app/test-pmd/csumonly.c
>> +++ b/app/test-pmd/csumonly.c
>> @@ -550,6 +550,10 @@ pkt_burst_checksum_forward(struct fwd_stream
>> *fs)
>>   * and inner headers */
>>
>>  eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>> +ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
>> +ð_hdr->d_addr);
>> +ether_addr_copy(&ports[fs->tx_port].eth_addr,
>> +ð_hdr->s_addr);
>>  parse_ethernet(eth_hdr, &info);
>>  l3_hdr = (char *)eth_hdr + info.l2_len;
>>
>> --
>> 1.9.3
> The change will affect on the csum fwd performance.
> But I also think the change is necessary, or we cannot use csumonly fwd mode 
> in guest?
>
> Acked-by: Jijiang Liu 
>
>

[dpdk-dev] [PATCH] librte_eal: Fix wrong header file for old gcc version

2015-10-13 Thread Qiu, Michael

Hi, all

Any comments on this?

Thanks,
Michael
On 2015/9/25 10:56, Qiu, Michael wrote:
> On 2015/9/7 22:46, Thomas Monjalon wrote:
>> 2015-08-24 17:22, Michael Qiu:
>>> For __SSE3__, the corresponding header file should be pmmintrin.h,
>>> tmmintrin.h works for __SSSE3__.
>> Please could you better explain the difference and what is exactly the bug
>> being fixed?
> It should solve this issue:
>
> [dpdk-dev] DPDK 2.1.0 build error: inlining failed in call to always_inline
>
> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/include/tmmintrin.h:185:1: error: 
> inlining failed in call to always_inline ?_mm_alignr_epi8?: t
> arget specific option mismatch
>  _mm_alignr_epi8(__m128i __X, __m128i __Y, const int __N)
>
>  ^
> The AMD cpu flags:
>
> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxe
> xt fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl 
> nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lah
> f_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
> osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_sa
>
>
> "_mm_alignr_epi8" only works for ssse3 or upper,
> but this AMD CPU does not support that. This function has been wrongly 
> called, because the wrong header file.
>
> Thanks,
> Michael 
>
>
>> Thanks
>>
>>
>

[dpdk-dev] IXGBE RX packet loss with 5+ cores

2015-10-13 Thread Sanford, Robert

I'm hoping that someone (perhaps at Intel) can help us understand
an IXGBE RX packet loss issue we're able to reproduce with testpmd.

We run testpmd with various numbers of cores. We offer line-rate
traffic (~14.88 Mpps) to one ethernet port, and forward all received
packets via the second port.

When we configure 1, 2, 3, or 4 cores (per port, with same number RX
queues per port), there is no RX packet loss. When we configure 5 or
more cores, we observe the following packet loss (approximate):
 5 cores - 3% loss
 6 cores - 7% loss
 7 cores - 11% loss
 8 cores - 15% loss
 9 cores - 18% loss

All of the "lost" packets are accounted for in the device's Rx Missed
Packets Count register (RXMPC[0]). Quoting the datasheet:
 "Packets are missed when the receive FIFO has insufficient space to
 store the incoming packet. This might be caused due to insufficient
 buffers allocated, or because there is insufficient bandwidth on the
 IO bus."

RXMPC, and our use of API rx_descriptor_done to verify that we don't
run out of mbufs (discussed below), lead us to theorize that packet
loss occurs because the device is unable to DMA all packets from its
internal packet buffer (512 KB, reported by register RXPBSIZE[0])
before overrun.

Questions
=
1. The 82599 device supports up to 128 queues. Why do we see trouble
with as few as 5 queues? What could limit the system (and one port
controlled by 5+ cores) from receiving at line-rate without loss?

2. As far as we can tell, the RX path only touches the device
registers when it updates a Receive Descriptor Tail register (RDT[n]),
roughly every rx_free_thresh packets. Is there a big difference
between one core doing this and N cores doing it 1/N as often?

3. Do CPU reads/writes from/to device registers have a higher priority
than device reads/writes from/to memory? Could the former transactions
(CPU <-> device) significantly impede the latter (device <-> RAM)?

Thanks in advance for any help you can provide.



Testpmd Command Line

Here is an example of how we run testpmd:

# socket 0 lcores: 0-7, 16-23
N_QUEUES=5
N_CORES=10

./testpmd -c 0x003e013e -n 2 \
 --pci-whitelist "01:00.0" --pci-whitelist "01:00.1" \
 --master-lcore 8 -- \
 --interactive --portmask=0x3 --numa --socket-num=0 --auto-start \
 --coremask=0x003e003e \
 --rxd=4096 --txd=4096 --rxfreet=512 --txfreet=512 \
 --burst=128 --mbcache=256 \
 --nb-cores=$N_CORES --rxq=$N_QUEUES --txq=$N_QUEUES


Test machines
=
* We performed most testing on a system with two E5-2640 v3
(Haswell 2.6 GHz 8 cores) CPUs, 64 GB 1866 MHz RAM, TYAN S7076 mobo.
* We obtained similar results on a system with two E5-2698 v3
(Haswell 2.3 GHz 16 cores) CPUs, 64 GB 2133 MHz RAM, Dell R730.
* DPDK 2.1.0, Linux 2.6.32-504.23.4

Intel 10GB adapters
===
All ethernet adapters are 82599_SFP_SF2, vendor 8086, device 154D,
svendor 8086, sdevice 7B11.


Other Details and Ideas we tried

* Make sure that all cores, memory, and ethernet ports in use are on
the same NUMA socket.

* Modify testpmd to insert CPU delays in the forwarding loop, to
target some average number of RX packets that we reap per rx_pkt_burst
(e.g., 75% of burst).

* We configured the RSS redirection table such that all packets go to
one RX queue. In this case, there was NO packet loss (with any number
of RX cores), as the ethernet and core activity is very similar to
using only one RX core.

* When rx_pkt_burst returns a full burst, look at the subsequent RX
descriptors, using a binary search of calls to rx_descriptor_done, to
see whether the RX desc array is close to running out of new buffers.
The answer was: No, none of the RX queues has more than 100 additional
packets "done" (when testing with 5+ cores).

* Increase testpmd config params, e.g., --rxd, --rxfreet, --burst,
--mbcache, etc. These result in very small improvements, i.e., slight
reduction of packet loss.


Other Observations
==
* Some IXGBE RX/TX code paths do not follow (my interpretation of) the
documented semantics of the rx/tx packet burst APIs. For example,
invoke rx_pkt_burst with nb_pkts=64, and it returns 32, even when more
RX packets are available, because the code path is optimized to handle
a burst of 32. The same thing may be true in the tx_pkt_burst code
path.

To allow us to run testpmd with --burst greater than 32, we worked
around these limitations by wrapping the calls to rx_pkt_burst and
tx_pkt_burst with do-whiles that continue while rx/tx burst returns
32 and we have not yet satisfied the desired burst count.

The point here is that IXGBE's rx/tx packet burst API behavior is
misleading! The application developer should not need to know that
certain drivers or driver paths do not always complete an entire
burst, even though they could have.

* We na?vely believed that if a run-to-completion model uses too
many cycles per packet, we could just spread it over more cores.
If there is some inherent lim

79 matches

Mail list logo