[dpdk-dev] DPDK User Space: Session onUseability and Ease of Use
Thanks John for the summary. 2015-10-13 16:36, Mcnamara, John: > - Move the EAL to the kernel. Please explain what you mean here. It's difficult to imagine. > * Latest version of the docs. - Needs support from 6Wind. OK It's as simple as a git hook. Anybody to write and test one locally? The script could be hosted in the future website git repo. > * Distributed testing. - Needs support from Intel initially. > Some of this is already being rolled out on the > test-report at dpdk.org list for Intel hardware: > http://dpdk.org/ml/archives/test-report/. Other hardware > vendors could use the same automated test framework to host > something similar. IBM has first started a daily compilation test. Others are welcome. We had a small session dedicated to this topic during the useability session. > * Create a User Mailing List. - Needs support from 6Wind. OK Name: user at dpdk.org or other suggestion? > * Make the dpdk.org website patchable. - Needs support from > 6Wind. OK Needs a maintainer, a name for the git repo, the mailing list and the patchwork. > * Add a Contributing Guide. - I will submit a doc patch. > > * Add a README .txt or .1st to the root dir. - I will submit a > doc patch. Thanks > * Too much duplicated code in the PMDs. - Any volunteers to > refactor common PMD code up into the ethdev layer? We need more details. Please start a new thread or a RFC. > * Logging and debugging via a secondary process. Any volunteers > to add a sample app that demonstrates the technique? There is already one: http://dpdk.org/browse/dpdk/tree/app/proc_info/main.c#n289
[dpdk-dev] Question about unsupported transceivers
I believe I've discovered my problem: https://gist.github.com/AlexForster/0fb4699bcdf196cf5462 As mentioned previously, I have two X520-Q1 cards installed. It appears that initialization of the first card obeys allow_unsupported_sfp=1, but initialization of the second card does not. Is this a bug, or is there a way to work around this that I'm not aware of? Alex Forster
[dpdk-dev] [PATCH v5 resend 07/12] virtio: resolve for control queue
On Mon, Oct 12, 2015 at 10:58:17PM +0200, Steffen Bauch wrote: > On 10/12/2015 10:39 AM, Yuanhan Liu wrote: > >Hi, > > > >I just recognized that this dead loop is the same one that I have > >experienced (see > >http://dpdk.org/ml/archives/dev/2015-October/024737.html for > >reference). Just applying the changes in this patch (only 07/12) > >will not fix the dead loop at least in my setup. > >Try to enable CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT, and dump more log? > I enabled the additional debug output. First try was without any > additional changes in master, but it blocked also. Second try was > with > > [dpdk-dev] [PATCH v6 06/13] virtio: read virtio_net_config correctly > > applied, but same result. > > If you want to recreate my setup, just follow instructions in > > http://dpdk.org/ml/archives/dev/2015-October/024737.html > > > vagrant at vagrant-ubuntu-vivid-64:~/dpdk$ git status > On branch master > Your branch is up-to-date with 'origin/master'. > Changes not staged for commit: > (use "git add ..." to update what will be committed) > (use "git checkout -- ..." to discard changes in working directory) > > modified: config/defconfig_x86_64-native-linuxapp-gcc > > .. Don't have clear clue there. But you could try Huawei's solution first. It's likely that it will fix your problem. If not, would you please try to reproduce it with qemu (you were using virtualbox, right)? And then dump the whoe command line here so that I can try to reproduce and debug it on my side. Sorry that I don't use virtualbox, as well as vagrant. --yliu > > vagrant at vagrant-ubuntu-vivid-64:~/dpdk/x86_64-native-linuxapp-gcc/app$ > sudo ./testpmd -b :00:03.0 -c 3 -n 1 -- -i > EAL: Detected lcore 0 as core 0 on socket 0 > EAL: Detected lcore 1 as core 1 on socket 0 > EAL: Support maximum 128 logical core(s) by configuration. > EAL: Detected 2 lcore(s) > EAL: VFIO modules not all loaded, skip VFIO support... > EAL: Setting up physically contiguous memory... > EAL: Ask a virtual area of 0x40 bytes > EAL: Virtual area found at 0x7f2a3a80 (size = 0x40) > EAL: Ask a virtual area of 0xe00 bytes > EAL: Virtual area found at 0x7f2a2c60 (size = 0xe00) > EAL: Ask a virtual area of 0x30c0 bytes > EAL: Virtual area found at 0x7f29fb80 (size = 0x30c0) > EAL: Ask a virtual area of 0x40 bytes > EAL: Virtual area found at 0x7f29fb20 (size = 0x40) > EAL: Ask a virtual area of 0xa0 bytes > EAL: Virtual area found at 0x7f29fa60 (size = 0xa0) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7f29fa20 (size = 0x20) > EAL: Requesting 512 pages of size 2MB from socket 0 > EAL: TSC frequency is ~2198491 KHz > EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using > unreliable clock cycles ! > EAL: Master lcore 0 is ready (tid=3c9938c0;cpuset=[0]) > EAL: lcore 1 is ready (tid=fa1ff700;cpuset=[1]) > EAL: PCI device :00:03.0 on NUMA socket -1 > EAL: probe driver: 1af4:1000 rte_virtio_pmd > EAL: Device is blacklisted, not initializing > EAL: PCI device :00:08.0 on NUMA socket -1 > EAL: probe driver: 1af4:1000 rte_virtio_pmd > PMD: parse_sysfs_value(): parse_sysfs_value(): cannot open sysfs > value /sys/bus/pci/devices/:00:08.0/uio/uio0/portio/port0/size > PMD: virtio_resource_init_by_uio(): virtio_resource_init_by_uio(): > cannot parse size > PMD: virtio_resource_init_by_ioports(): PCI Port IO found > start=0xd040 with size=0x20 > PMD: virtio_negotiate_features(): guest_features before negotiate = cf8020 > PMD: virtio_negotiate_features(): host_features before negotiate = 410fdda3 > PMD: virtio_negotiate_features(): features after negotiate = f8020 > PMD: eth_virtio_dev_init(): PORT MAC: 08:00:27:CC:DE:CD > PMD: eth_virtio_dev_init(): VIRTIO_NET_F_MQ is not supported > PMD: virtio_dev_cq_queue_setup(): >> > PMD: virtio_dev_queue_setup(): selecting queue: 2 > PMD: virtio_dev_queue_setup(): vq_size: 16 nb_desc:0 > PMD: virtio_dev_queue_setup(): vring_size: 4228, rounded_vring_size: 8192 > PMD: virtio_dev_queue_setup(): vq->vq_ring_mem: 0x67b54000 > PMD: virtio_dev_queue_setup(): vq->vq_ring_virt_mem: 0x7f29fb354000 > PMD: eth_virtio_dev_init(): config->max_virtqueue_pairs=1 > PMD: eth_virtio_dev_init(): config->status=1 > PMD: eth_virtio_dev_init(): PORT MAC: 08:00:27:CC:DE:CD > PMD: eth_virtio_dev_init(): hw->max_rx_queues=1 hw->max_tx_queues=1 > PMD: eth_virtio_dev_init(): port 0 vendorID=0x1af4 deviceID=0x1000 > PMD: virtio_dev_vring_start(): >> > EAL: PCI device :00:09.0 on NUMA socket -1 > EAL: probe driver: 1af4:1000 rte_virtio_pmd > PMD: parse_sysfs_value(): parse_sysfs_value(): cannot open sysfs > value /sys/bus/pci/devices/:00:09.0/uio/uio1/portio/port0/size > PMD: virtio_resource_init_by_uio(): virtio_resource_init_by_uio(): > cannot parse size > PMD: virtio_resource_init_by_ioports(): PCI Port IO found > start=0xd060 with size=0x20 > PMD: virtio_negotiate_features(): guest_feature
[dpdk-dev] [PATCH] rte_alarm: modify it to make it not to be affected by discontinuous jumps in the system time
On Fri, 5 Jun 2015 10:46:36 +0800 Wen-Chi Yang wrote: > Due to eal_alarm_callback() and rte_eal_alarm_set() use gettimeofday() > to get the current time, and gettimeofday() is affected by jumps. > > For example, set up a rte_alarm which will be triggerd next second ( > current time + 1 second) by rte_eal_alarm_set(). And the callback > function of this rte_alarm sets up another rte_alarm which will be > triggered next second (current time + 2 second). > Once we change the system time when the callback function is triggered, > it is possiblb that rte alarm functionalities work out of expectation. > > Replace gettimeofday() with clock_gettime(CLOCK_MONOTONIC_RAW, &now) > could avoid this phenomenon. > > Signed-off-by: Wen-Chi Yang Agreed, this should be applied. Does BSD version have same problem? Acked-by: Stephen Hemminger
[dpdk-dev] [PATCH v3] Implement memcmp using Intel SIMD instrinsics.
On Mon, 18 May 2015 13:01:43 -0700 Ravi Kerur wrote: > This patch implements memcmp and use librte_hash as the first candidate > to use rte_memcmp which is implemented using AVX/SSE intrinsics. > > Tested with GCC(4.8.2) and Clang(3.4-1) compilers and both tests show better > performance on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04 > x86_64 shows when compared to memcmp. > > Changes in v3: > Implement complete memcmp functionality. > Implement functional and performance tests and add it to > "make test" infrastructure code. > > Changes in v2: > Modified code to support only upto 64 bytes as that's the max bytes > used by hash for comparison. > > Changes in v1: > Initial changes to support memcmp with support upto 128 bytes. > > Signed-off-by: Ravi Kerur I think this idea is best taken over to glibc not here. The issue is that Gcc default version of memcmp inline is bad and that is what needs to be fixed. See later discussion in email thread with Gcc intrinsic developer.
[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes
Hi Bruce, Using "--base-virtaddr" requires knowledge on the huge pages wanted address going to be used and might vary on different uses of the application. We suggest a more generic solution which wont require any previous knowledge and will be "bullet proof" as much as possible. Regards, Nissim On Oct 13, 2015 18:49, "Richardson, Bruce" wrote: > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov > Sent: Tuesday, October 13, 2015 4:40 PM > To: 'dev at dpdk.org' > Subject: [dpdk-dev] propose a solution for mapping same virtual address > space to asymmetric processes > > Hi all, > > The below will try to suggest a modification to the initialization of > Environment Abstraction Layer (AKA EAL) so it will be able to allocate > memory zones from same virtual memory addresses even if the primary > process is not similar to the secondary processes. > > Problem: > The DPDK Primary/Secondary model requires that the exact same hugepage > memory mappings be present in all applications. > An issue may occur when the Primary and secondary processes are not > symmetric in such way that the code has big differences (for example, > Primary process is a traffic distributer and secondary is a worker). > The result may be that specific virtual address region in the first > process won't be available in the second process. > > > Suggested solution: > Map all related rte and uio sections somewhere close to the end of huge > pages memory (that mean rte_eal_memory_init() should be called before > rte_config_init() in primary process) According to our observations there > will be more probability to success when allocating the above sections > after huge pages section (actually uio is already allocated after the huge > pages area) > > It solved our problem when trying to work with a primary traffic > distributer which is a very "light" process and few secondary worker > processes. > > > Please share your thoughts on this before I will try to commit our patch > for review > > Thanks, > Nissim Hi, out of interest, have you tried fixing the issue using the "--base-virtaddr" EAL flag to hint a base address to the primary process? It was put into the code some time ago to help solve exactly this problem. /Bruce
[dpdk-dev] Question about unsupported transceivers
Hi everybody, apologies for coming to this list with a tech support question. I'm completely stumped about using non-Intel transceivers with DPDK. testpmd is bailing here: PMD: eth_ixgbe_dev_init(): Unsupported SFP+ Module / PMD: eth_ixgbe_dev_init(): Hardware Initialization Failure: -19 My box is an x64 server running Debian 8 (Jessie) with two X520-Q1 cards using Finisar QSFP transceivers. Here are the things that I've tried so far, unsuccessfully- * Added CONFIG_RTE_LIBRTE_IXGBE_ALLOW_UNSUPPORTED_SFP=y to config/defconfig_x86_64-native-linuxapp-gcc and rebuilt/reinstalled/rebooted * Tried various incantations of modprobe/insmod with allow_unsupported_sfp=1 appended * Added options ixgbe allow_unsupported_sfp=1 to /etc/modprobe.d/dpdk.conf and rebuilt the initrd Can anybody lead me in the right direction here? It seems like a lot of the information floating around about this issue may be out of date. Alex Forster
[dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1
Actually, this is a good opportunity to fix a bug that's been in this code forever: it shouldn't be resetting optind to some arbitrary value: it should be saving optind (and optarg and optopt) at the beginning, initializing optind to 1 before calling getopt_long(), then restoring all the values after. (And, from what you're saying, optreset should be handled the same as optind.) This avoids broken behavior if rte_eal_init() is called by code that's in the middle of using getopt() to parse its own unrelated argc/argv parameters. -don provan dprovan at bivio.net -Original Message- From: Tiwei Bie [mailto:b...@mail.ustc.edu.cn] Sent: Tuesday, October 13, 2015 1:54 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1 The variable optind must be reinitialized to 1 in order to skip over argv[0] on FreeBSD. Because getopt() on FreeBSD will return -1 when it meets an argument which doesn't start with '-'. The variable optreset is provided on FreeBSD to indicate the additional set of calls to getopt(). So, also reinitialize it to 1. Signed-off-by: Tiwei Bie --- lib/librte_eal/bsdapp/eal/eal.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 1b6f705..35feaee 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -334,7 +334,8 @@ eal_log_level_parse(int argc, char **argv) break; } - optind = 0; /* reset getopt lib */ + optind = 1; /* reset getopt lib */ + optreset = 1; } /* Parse the argument given in the command line of the application */ @@ -403,7 +404,8 @@ eal_parse_args(int argc, char **argv) if (optind >= 0) argv[optind-1] = prgname; ret = optind-1; - optind = 0; /* reset getopt lib */ + optind = 1; /* reset getopt lib */ + optreset = 1; return ret; } -- 2.6.0
[dpdk-dev] [PATCH v8 0/9] Dynamic memzones
On Tue, 14 Jul 2015 09:57:04 +0100 Sergio Gonzalez Monroy wrote: > Current implemetation allows reserving/creating memzones but not the opposite > (unreserve/free). This affects mempools and other memzone based objects. > > From my point of view, implementing free functionality for memzones would look > like malloc over memsegs. > Thus, this approach moves malloc inside eal (which in turn removes a circular > dependency), where malloc heaps are composed of memsegs. > We keep both malloc and memzone APIs as they are, but memzones allocate its > memory by calling malloc_heap_alloc. > Some extra functionality is required in malloc to allow for boundary > constrained > memory requests. > In summary, currently malloc is based on memzones, and with this approach > memzones are based on malloc. > > v8: > - Rebase against current HEAD to factor for changes made by new Tile-Gx arch Following rules in kernel. You need to fix the 32 bit build and resubmit whole series. Thomas, this patchset should be marked "Changes requested" in patchwork.
[dpdk-dev] [PATCH v2] Clean up rte_memcpy.h file
On Mon, 20 Apr 2015 13:33:29 -0700 Ravi Kerur wrote: > Remove unnecessary type casting in functions. > > Tested on Ubuntu (14.04 x86_64) with "make test". > "make test" results match the results with baseline. > "Memcpy perf" results match the results with baseline. > > Signed-off-by: Ravi Kerur Getting rid of casts looks good. My guess is no one reviewed it because no one is using rte_memcpy much.. Acked-by: Stephen Hemminger
[dpdk-dev] [PATCH 2/2] virtio: change io privilege level as early as possible
On Thu, 1 Oct 2015 07:25:45 -0400 Neil Horman wrote: > On Wed, Sep 30, 2015 at 05:37:05PM +0200, Thomas Monjalon wrote: > > 2015-09-30 10:52, Neil Horman: > > > On Wed, Sep 30, 2015 at 10:28:53AM +0200, David Marchand wrote: > > > > On Tue, Sep 29, 2015 at 9:25 PM, Stephen Hemminger < > > > > stephen at networkplumber.org> wrote: > > > > > > > > > On Tue, 10 Mar 2015 09:14:28 -0400 > > > > > Neil Horman wrote: > > > > > > I don't see how this works for all cases. The constructor is called > > > > > once when > > > > > > the library is first loaded. What if you have multiple independent > > > > > (i.e. not > > > > > > forked children) processes that are using the dpdk in parallel? > > > > > > Only the > > > > > > process that triggered the library load will have io permissions set > > > > > > appropriately. I think what you need is to have every application > > > > > > that > > > > > expects > > > > > > to call through the transmit path or poll the receive path call > > > > > > iopl, > > > > > which I > > > > > > think speaks to having this requirement documented, so each > > > > > > application > > > > > can call > > > > > > iopl prior to calling fork/daemonize/etc. > > > > > > > > > > > > > > > > I am still seeing this problem with DPDK 2.0 and 2.1. > > > > > It seems to me that doing the iopl init in eal_init is the only safe > > > > > way. > > > > > Other workaround is to have application calling iopl_init before > > > > > eal_init > > > > > but that kind of violates the current method of all things being > > > > > initialized by eal_init > > > > > > > > Putting it in the virtio pmd constructor is my preferred solution and we > > > > don't need to pollute the eal for virtio (specific to x86, btw). > > > > > > Preferred solution or not, you can't just call iopl from the constructor, > > > because not all process will get appropriate permissions. It needs to be > > > called > > > by every process. What Stephen is saying is that your solution has use > > > cases > > > for which it doesn't work, and that needs to be solved. > > > > I think it may be solved by calling iopl in the constructor. > > We just need an extra call in rte_virtio_pmd_init() to detect iopl failures. > > We can also simply move rte_eal_intr_init() after rte_eal_dev_init(). > > Please read my previous post on this topic: > > > > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/14761/focus=22341 > > > > About the multiprocess case, I don't see the problem as the RX/TX and > > interrupt > > threads are forked in the rte_eal_init() context which should call iopl > > even in > > secondary processes. > > > > I'm not talking about secondary processes here (i.e. processes forked from a > parent that was the process which initialized the dpdk). I'm referring to two > completely independent processes, both of which link to and use the dpdk. > > Though I think we're saying the same thing. When you say 'constructor' above, > you don't mean 'constructor' in the strict sense, but rather the pmd init > routine (the one called from rte_eal_vdev_init and rte_eal_dev_init). If this > is the case, then yes, that works fine, since each process linking to the DPDK > will enter those routines and call iopl. In fact, if thats the case, then no > call is needed in the constructor at all. I think this patch should be rebased and resubmitted for 2.2. It fixes a real problem (virtio link state). The driver changed directory and the the patch could be redone to minimize changes.
[dpdk-dev] [PATCH 1/2] eal/linux: move plugin load to very start of eal init
On Tue, 10 Mar 2015 06:55:41 -0400 Neil Horman wrote: > On Tue, Mar 10, 2015 at 10:08:24AM +0100, David Marchand wrote: > > Hello Neil, > > > > On Mon, Mar 9, 2015 at 4:21 PM, Neil Horman > > wrote: > > > > > On Mon, Mar 09, 2015 at 03:56:38PM +0100, David Marchand wrote: > > > > Loading shared libraries should be done at the very start of eal init so > > > that > > > > the code statically built in dpdk and the code loaded from shared > > > objects is > > > > handled (almost) the same way wrt to call to rte_eal_init(). > > > > The only thing that must be done before is filling the solib_list which > > > is done > > > > by eal_parse_args(). > > > > > > > > > > > > > I don't see anything explicitly wrong with this, but at the same time it > > > doesn't > > > seem to fix anything. Is there a particular bug that you're fixing in > > > relation > > > to your cover letter here? Or is there some expectation that PMD's loaded > > > in > > > this fashion expect the dpdk to be completely uninitalized? That would > > > seem > > > like a strange operational requirement to me. > > > > > > > Well, at first, I wanted to fix the virtio pmd init issue (iopl() not > > called at the right place wrt to other pthreads created in rte_eal_init()). > Ah, this is what you were addressing: > http://dpdk.org/ml/archives/dev/2015-March/014765.html > > > With next patch, this issue is fixed for statically builtin virtio pmd, but > > for virtio pmd as a shared object, the dlopen comes too late. > > So, yes, I moved the dlopen() for this reason. > > > But this doesn't do anything to help you. The goal, according to the above > thread, is to initalize the pmd earlier so that you can call iopl prior to > doing > any forks (so that io privlidges are inherited). But both static and dynamic > pmd have constructors that just register their driver structures. No > initalization happens until rte_eal_dev_init is called. So this movement does > nothing to change the time any given drivers init routine is called. > > > From a more general point of view, since we support both static and dso > > pmds, I would say that this is more logical to have dlopen comes very > > early, since static code is "loaded" even earlier : if the current pmds > > needed more than just register to the driver list, they would already have > > triggered segfaults and/or bugs. > > > No, not really. I suppose it doesn't hurt anything, but moving this earlier > in > a function doesn't really buy you anything, as statically allocate pmds are > called by the gcc start code prior to an applications main routine running, so > we're never actually going to get close to parity there, nor do we need to, > because the actual init happens at rte_eal_dev_init, which is in parity for > both > static and dynamic drivers. > > > > > I know this change comes really late for 2.0. > > I am open to other ideas but I don't want to see more #ifdef > > in eal.c (especially for a pmd), this is a non sense. > > > > I would say that at least the patch 2 is needed for 2.0 : it fixes the > > static case, but without patch 1 virtio pmd triggers a segfault on > > interrupt receipt when built as a dso. > > > The static case suffers from problems as well I think, in that its possible to > architect multiple processes that are not started from fork that use the same > pmd, which would create the same issue. I think a better course of action > would > be to document the need for an application to call iopl before rte_eal_init. > Given all this, I recommend that Thomas not apply this patch. Please resubmit if there is a real problem with drivers (something in tree). There are enough other bugs to fix without chasing ghosts.
[dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1
The variable optind must be reinitialized to 1 in order to skip over argv[0] on FreeBSD. Because getopt() on FreeBSD will return -1 when it meets an argument which doesn't start with '-'. The variable optreset is provided on FreeBSD to indicate the additional set of calls to getopt(). So, also reinitialize it to 1. Signed-off-by: Tiwei Bie --- lib/librte_eal/bsdapp/eal/eal.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 1b6f705..35feaee 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -334,7 +334,8 @@ eal_log_level_parse(int argc, char **argv) break; } - optind = 0; /* reset getopt lib */ + optind = 1; /* reset getopt lib */ + optreset = 1; } /* Parse the argument given in the command line of the application */ @@ -403,7 +404,8 @@ eal_parse_args(int argc, char **argv) if (optind >= 0) argv[optind-1] = prgname; ret = optind-1; - optind = 0; /* reset getopt lib */ + optind = 1; /* reset getopt lib */ + optreset = 1; return ret; } -- 2.6.0
[dpdk-dev] [PATCH] Found a bug related to getopt() in eal/bsd module
I found a bug when trying to make my DPDK application work on FreeBSD. The variable optind must be reinitialized to 1 on FreeBSD to skip over argv[0]. Because getopt() on FreeBSD will return -1 when it meets an argument which doesn't start with '-'. This behaviour is implemented by the 13-17 lines: 01 /* 02 * getopt -- 03 * Parse argc/argv argument vector. 04 */ 05 int 06 getopt(int nargc, char * const nargv[], const char *ostr) 07 { 08 static char *place = EMSG; /* option letter processing */ 09 char *oli; /* option letter list index */ 10 11 if (optreset || *place == 0) { /* update scanning pointer */ 12 optreset = 0; 13 place = nargv[optind]; 14 if (optind >= nargc || *place++ != '-') { 15 /* Argument is absent or is not an option */ 16 place = EMSG; 17 return (-1); 18 } 19 .. 20 } 21 .. 22 } The variable optreset is also provided on FreeBSD to indicate the additional set of calls to getopt(). So, also reinitialize it to 1. References: 1. https://svnweb.freebsd.org/base/head/lib/libc/stdlib/getopt.c?view=markup#l70 2. https://www.freebsd.org/cgi/man.cgi?query=getopt&apropos=0&sektion=3&manpath=FreeBSD+11-current&arch=default&format=html Tiwei Bie (1): eal/bsd: reinitialize optind and optreset to 1 lib/librte_eal/bsdapp/eal/eal.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.6.0
[dpdk-dev] DPDK User Space: Session onUseability and Ease of Use
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas F Herbert > Sent: Thursday, October 8, 2015 7:30 PM > To: dev at dpdk.org > Subject: [dpdk-dev] DPDK User Space: Session onUseability and Ease of Use > > All: > > Captured white board notes from Jim McNamara's session on Useability and > Ease of Use at DPDK User Space today are here: > > http://people.redhat.com/therbert/Useability_and_Ease_of_Use_DPDK_User_Space/ Hi Tom, Thanks for that. Here is my summary of the "Usability and Ease of Use" session from memory and notes. Correction and additions welcome. * Latest version of the DPDK docs - (From an earlier session). Add a doc/latest build of the docs to dpdk.org. * PMD lite - Do we need a lighter PMD model? Perhaps based on the Mellanox model. - Vincent suggested be could remove 90% of the code. I'll leave Vincent explain this one. * OvS issues with Usability - Discussion of the DPSK usability issues highlighted on the OvS mailing list. - http://openvswitch.org/pipermail/dev/2015-August/058814.html * Distributed testing - There should be some form of distributed testing so patches can be tested on OSes and hardware that a dev doesn't have. - Suggestion to have an Open Lab/Pod similar to the OPNFV model that participants can use. * Debuggability - User expectation for tools like tcpdump, ip, ipconfig to work with DPDK bound NICs. - Easier in a pipeline application where debug is added as a pipeline stage. - Maybe add debug hooks via rx/tx callbacks. - Add/extend a solution based on KNI. - Use systemd naming algorithm for KNI. * Create a User Mailing List - An observation was made that the dev at dpdk.org list was very developer orientated and patch heavy. - Suggestion to add a user at dpdk.org mailing list for people with issues or subjects that aren't development related. - This seams easy to implement. It may not be well supported however resulting in users cross posting into dev at dpdk.org. - Probably worth trying anyway. * Make the dpdk.org website patchable - There are already plans to host the dpdk.org code in a git repo. * Add a Contributing Guide. - We are at the stage where we need one. - Suggestion to just use the Kernel guide. - Tailor it for DPDK. - Also explain the review process, acks, nacks, etc. * Add a README .txt or .1st to the root dir. - This could just include links to the getting started guides and other docs. Either to the online docs or how to build the local html versions. * EAL annoyances - Move the EAL to the kernel. - Have more/better/all default options. EAL figures out its own requirements. - Have a default for -n. * Hugepage consumption - Do not allow DPDK applications to grab all available hugepages. - Issues with running DPDK apps in tandem with other hugepage hungry apps such as Java/Eclipse. * rte_malloc() - Don't use rte_malloc() for non critical objects where malloc() would do. - Suggestion to allow the type of required memory to be specified by rte_malloc()-like function. * The Build system - Make install needs to be improved. Doesn't so what the user expects. - Use autotools and configure. (There were some objections that this may not be an improvement.) - Use kconfig. - Keep going with what we have now until it gets too unwieldy and needs to be changed. Then use kconfig. - Add better support for cross compilation. Useful for arm target. * Should DPDK applications be running as root - Clearly not a great option. - Currently required due to kernel. * Mempool debugging - We need better tools to debug memory leaks in the mempools. - Suggestion to do this via a valgrind plugin. * Kernel management of drivers * Too much duplicated code in the PMDs - Duplicated code has crept in organically as PMDs have been added. - Should be moved up to moved up to the ethdev level * Logging and debugging via a secondary process - Not a well known technique but very useful/powerful * Run DPDK as a daemon. * Issues with config files - Too many options turned off by default: code paths don't get compiled/tested. * More sample apps - Some more examples of using secondary processes. Of these we the following could be addressed in the near term: * Latest version of the docs. - Needs support from 6Wind. * Distributed testing. - Needs support from Intel initially. Some of this is already being rolled out on the test-report at dpdk.org list for Intel hardware: http://dpdk.org/ml/archives/test-report/. Other hardware vendors could use the same automated test framework to host something similar. * Debuggability. - Need some volunteers or workable suggestions. * Create a User Mailing List. - Needs support from 6Wind. * Make the dpdk.org website patchable. - Needs support from 6Wind. * Add a Cont
[dpdk-dev] [PATCH 2/4] rte_ring: store memzone pointer inside ring
Hi Bruce, On 09/30/2015 02:12 PM, Bruce Richardson wrote: > Add a new field to the rte_ring structure to store the memzone pointer which > contains the ring. For rings created using rte_ring_create(), the field will > be set automatically. > > This new field will allow users of the ring to query the numa node a ring is > allocated on, or to get the physical address of the ring, if so needed. > > The rte_ring structure will also maintain ABI compatibility, as the > structure members, after the new one, are set to be cache line aligned, > so leaving a space. > > Signed-off-by: Bruce Richardson Acked-by: Olivier Matz
[dpdk-dev] [PATCH] crc: deinline crc functions
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Friday, October 2, 2015 12:38 AM > To: Richardson, Bruce; De Lara Guarch, Pablo > Cc: dev at dpdk.org; Stephen Hemminger > Subject: [PATCH] crc: deinline crc functions > > Because the CRC functions are inline and defined purely in the header > file, every component that uses these functions gets its own copy > of the software CRC table which is a big space waster. > > Just deinline which give better long term ABI stablity anyway. > > Signed-off-by: Stephen Hemminger While I think it's a good idea to de-inline the functions that do the calculations using the lookup tables, I think the functions that consist of a single assembly instruction should be kept as inline. /Bruce
[dpdk-dev] [PATCH v4] ring: add function to free a ring
Hi Pablo, On 10/02/2015 05:53 PM, Pablo de Lara wrote: > From: "De Lara Guarch, Pablo" > > When creating a ring, a memzone is created to allocate it in memory, > but the ring could not be freed, as memzones could not be. > > Since memzones can be freed now, then rings can be as well, > taking into account if they were initialized using pre-allocated memory > (in which case, memory should be freed externally) or using > rte_memzone_reserve > (with rte_ring_create), freeing the memory with rte_memzone_free. > > Signed-off-by: Pablo de Lara Acked-by: Olivier Matz
[dpdk-dev] [PATCH] eal/bsd: reinitialize optind and optreset to 1
On Tue, Oct 13, 2015 at 04:54:06PM +0800, Tiwei Bie wrote: > The variable optind must be reinitialized to 1 in order to skip over > argv[0] on FreeBSD. Because getopt() on FreeBSD will return -1 when > it meets an argument which doesn't start with '-'. > > The variable optreset is provided on FreeBSD to indicate the additional > set of calls to getopt(). So, also reinitialize it to 1. > > Signed-off-by: Tiwei Bie Acked-by: Bruce Richardson
[dpdk-dev] [PATCH] ethdev: remove the imissed deprecation tag
Hi Maryam, On 09/30/2015 10:20 AM, Maryam Tahhan wrote: > Remove the deprecation tag and notice for imissed. > > Signed-off-by: Maryam Tahhan > --- > doc/guides/rel_notes/deprecation.rst | 2 +- > lib/librte_ether/rte_ethdev.h| 3 +-- > 2 files changed, 2 insertions(+), 3 deletions(-) Could you please add some more details about why it is finally kept? I think it could be helpful for people that did not follow the thread http://dpdk.org/dev/patchwork/patch/6410/ You can also reference the commit id of the patch that introduced the deprecation notice. It could also be a good occasion to remind the definition of imissed: number of packets dropped by hardware because the software does not poll fast enough (= queue full) Thanks! Olivier
[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov > Sent: Tuesday, October 13, 2015 4:40 PM > To: 'dev at dpdk.org' > Subject: [dpdk-dev] propose a solution for mapping same virtual address > space to asymmetric processes > > Hi all, > > The below will try to suggest a modification to the initialization of > Environment Abstraction Layer (AKA EAL) so it will be able to allocate > memory zones from same virtual memory addresses even if the primary > process is not similar to the secondary processes. > > Problem: > The DPDK Primary/Secondary model requires that the exact same hugepage > memory mappings be present in all applications. > An issue may occur when the Primary and secondary processes are not > symmetric in such way that the code has big differences (for example, > Primary process is a traffic distributer and secondary is a worker). > The result may be that specific virtual address region in the first > process won't be available in the second process. > > > Suggested solution: > Map all related rte and uio sections somewhere close to the end of huge > pages memory (that mean rte_eal_memory_init() should be called before > rte_config_init() in primary process) According to our observations there > will be more probability to success when allocating the above sections > after huge pages section (actually uio is already allocated after the huge > pages area) > > It solved our problem when trying to work with a primary traffic > distributer which is a very "light" process and few secondary worker > processes. > > > Please share your thoughts on this before I will try to commit our patch > for review > > Thanks, > Nissim Hi, out of interest, have you tried fixing the issue using the "--base-virtaddr" EAL flag to hint a base address to the primary process? It was put into the code some time ago to help solve exactly this problem. /Bruce
[dpdk-dev] [PATCH 2/2] igb: fix VF statistic wraparound handling macro
ack On 10/12/15 12:45 PM, Harry van Haaren wrote: > Fix a misinterpreatation of VF statistic macro in e1000/igb. > > Signed-off-by: Harry van Haaren > --- > drivers/net/e1000/igb_ethdev.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c > index 848ef6e..e4911fc 100644 > --- a/drivers/net/e1000/igb_ethdev.c > +++ b/drivers/net/e1000/igb_ethdev.c > @@ -246,11 +246,10 @@ static void eth_igb_configure_msix_intr(struct > rte_eth_dev *dev); > #define UPDATE_VF_STAT(reg, last, cur)\ > { \ > u32 latest = E1000_READ_REG(hw, reg); \ > - cur += latest - last; \ > + cur += (latest-last) & UINT_MAX; \ > last = latest;\ > } > > - > #define IGB_FC_PAUSE_TIME 0x0680 > #define IGB_LINK_UPDATE_CHECK_TIMEOUT 90 /* 9s */ > #define IGB_LINK_UPDATE_CHECK_INTERVAL 100 /* ms */ -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
ack On 10/12/15 12:45 PM, Harry van Haaren wrote: > Fix a misinterpretation of VF stats in ixgbe > > Signed-off-by: Harry van Haaren > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index ec2918c..86dcd87 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -329,10 +329,10 @@ static int ixgbe_timesync_read_tx_timestamp(struct > rte_eth_dev *dev, > /* >* Define VF Stats MACRO for Non "cleared on read" register >*/ > -#define UPDATE_VF_STAT(reg, last, cur) \ > +#define UPDATE_VF_STAT(reg, last, cur) \ > { \ > uint32_t latest = IXGBE_READ_REG(hw, reg); \ > - cur += latest - last; \ > + cur += (latest-last) & UINT_MAX;\ > last = latest; \ > } > -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
Agreed, this handles the off by one error on wrap around and should be faster. -Roger On 10/12/15 11:41 AM, Alexander Duyck wrote: > On 10/12/2015 06:33 AM, Harry van Haaren wrote: >> Fix a misinterpretation of VF stats in ixgbe >> >> Signed-off-by: Harry van Haaren >> --- >> drivers/net/ixgbe/ixgbe_ethdev.c | 8 ++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >> b/drivers/net/ixgbe/ixgbe_ethdev.c >> index ec2918c..d226e8d 100644 >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >> @@ -329,10 +329,14 @@ static int >> ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev, >> /* >>* Define VF Stats MACRO for Non "cleared on read" register >>*/ >> -#define UPDATE_VF_STAT(reg, last, cur)\ >> +#define UPDATE_VF_STAT(reg, last, cur) \ >> { \ >> uint32_t latest = IXGBE_READ_REG(hw, reg); \ >> -cur += latest - last; \ >> +if(likely(latest > last)) { \ >> +cur += latest - last; \ >> +} else {\ >> +cur += (UINT_MAX - last) + latest; \ >> +} \ >> last = latest; \ >> } > > From what I can tell your math is adding an off by one error. You > should probably be using UINT_MAX as a mask for the result, not as a > part of the calculation itself. > > So the correct way to compute this would be "cur += (latest - last) & > UINT_MAX". Also the mask approach should be faster as it avoids any > conditional jumps. > > - Alex > . > -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes
Hi all, The below will try to suggest a modification to the initialization of Environment Abstraction Layer (AKA EAL) so it will be able to allocate memory zones from same virtual memory addresses even if the primary process is not similar to the secondary processes. Problem: The DPDK Primary/Secondary model requires that the exact same hugepage memory mappings be present in all applications. An issue may occur when the Primary and secondary processes are not symmetric in such way that the code has big differences (for example, Primary process is a traffic distributer and secondary is a worker). The result may be that specific virtual address region in the first process won't be available in the second process. Suggested solution: Map all related rte and uio sections somewhere close to the end of huge pages memory (that mean rte_eal_memory_init() should be called before rte_config_init() in primary process) According to our observations there will be more probability to success when allocating the above sections after huge pages section (actually uio is already allocated after the huge pages area) It solved our problem when trying to work with a primary traffic distributer which is a very "light" process and few secondary worker processes. Please share your thoughts on this before I will try to commit our patch for review Thanks, Nissim
[dpdk-dev] [PATCH] examples/vmdq: Fix the core dump issue when mem_pool is more than 34
Macro MAX_QUEUES was defined to 128, only allow 16 mem_pools in theory. When running vmdq_app with more than 34 mem_pools, it will cause the core_dump issue. Change MAX_QUEUES to 1024 will solve this issue. Signed-off-by: Xutao Sun --- examples/vmdq/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c index a142d49..b463cfb 100644 --- a/examples/vmdq/main.c +++ b/examples/vmdq/main.c @@ -69,7 +69,7 @@ #include #include -#define MAX_QUEUES 128 +#define MAX_QUEUES 1024 /* * For 10 GbE, 128 queues require roughly * 128*512 (RX/TX_queue_nb * RX/TX_ring_descriptors_nb) per port. -- 1.9.3
[dpdk-dev] [PATCH] Add error message when trying to use make option T= during build/clean
Hi Francesco, On 09/29/2015 06:04 PM, Francesco Montorsi wrote: > From: Francesco Montorsi > > --- > mk/rte.sdkbuild.mk | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/mk/rte.sdkbuild.mk b/mk/rte.sdkbuild.mk > index 38ec7bd..013aa89 100644 > --- a/mk/rte.sdkbuild.mk > +++ b/mk/rte.sdkbuild.mk > @@ -29,6 +29,12 @@ > # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > +ifdef T > + ifeq ("$(origin T)", "command line") > +$(error "Cannot use T= with a build/clean target") > + endif > +endif > + > # If DESTDIR variable is given, install binary dpdk I tested this patch but it breaks the "make install" command: $ make install T=x86_64-native-linuxapp-gcc make[5]: Nothing to be done for 'depdirs'. Configuration done rte.sdkbuild.mk:34: *** "Cannot use T= with a build/clean target". As the T= argument is given as a command line variable, it is propagated to the "$(MAKE) all" in rte.sdkinstall.mk. So I think it's better to keep the current code as is, except if you have a better idea. Regards, Olivier
[dpdk-dev] [PATCH 3/3] i40evf: add support of AQ based RSS config
It supports both Admin queue based and directly writing registers based RSS hash key and lookup table configuration, as X722 supports AQ based configuration. Signed-off-by: Helin Zhang --- drivers/net/i40e/i40e_ethdev.h| 3 + drivers/net/i40e/i40e_ethdev_vf.c | 230 -- 2 files changed, 173 insertions(+), 60 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 57366ac..a8d8cac 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -449,6 +449,7 @@ struct i40e_vf { struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */ struct i40e_virtchnl_vsi_resource *vsi_res; /* LAN VSI */ struct i40e_vsi vsi; + uint64_t flags; }; /* @@ -541,6 +542,8 @@ i40e_get_vsi_from_adapter(struct i40e_adapter *adapter) (&(((struct i40e_vsi *)vsi)->adapter->hw)) #define I40E_VSI_TO_PF(vsi) \ (&(((struct i40e_vsi *)vsi)->adapter->pf)) +#define I40E_VSI_TO_VF(vsi) \ + (&(((struct i40e_vsi *)vsi)->adapter->vf)) #define I40E_VSI_TO_DEV_DATA(vsi) \ (((struct i40e_vsi *)vsi)->adapter->pf.dev_data) #define I40E_VSI_TO_ETH_DEV(vsi) \ diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index b694400..02ee87b 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1126,9 +1126,12 @@ i40evf_init_vf(struct rte_eth_dev *dev) goto err_alloc; } + if (hw->mac.type == I40E_MAC_X722_VF) + vf->flags = I40E_FLAG_RSS_AQ_CAPABLE; vf->vsi.vsi_id = vf->vsi_res->vsi_id; vf->vsi.type = vf->vsi_res->vsi_type; vf->vsi.nb_qps = vf->vsi_res->num_queue_pairs; + vf->vsi.adapter = I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); /* check mac addr, if it's not valid, genrate one */ if (I40E_SUCCESS != i40e_validate_mac_addr(\ @@ -1778,15 +1781,71 @@ i40evf_dev_close(struct rte_eth_dev *dev) } static int +i40evf_get_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size) +{ + struct i40e_vf *vf = I40E_VSI_TO_VF(vsi); + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); + int ret; + + if (!lut) + return -EINVAL; + + if (vf->flags & I40E_FLAG_RSS_AQ_CAPABLE) { + ret = i40e_aq_get_rss_lut(hw, vsi->vsi_id, FALSE, + lut, lut_size); + if (ret) { + PMD_DRV_LOG(ERR, "Failed to get RSS lookup table"); + return ret; + } + } else { + uint32_t *lut_dw = (uint32_t *)lut; + uint16_t i, lut_size_dw = lut_size / 4; + + for (i = 0; i < lut_size_dw; i++) + lut_dw[i] = I40E_READ_REG(hw, I40E_VFQF_HLUT(i)); + } + + return 0; +} + +static int +i40evf_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size) +{ + struct i40e_vf *vf = I40E_VSI_TO_VF(vsi); + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); + int ret; + + if (!vsi || !lut) + return -EINVAL; + + if (vf->flags & I40E_FLAG_RSS_AQ_CAPABLE) { + ret = i40e_aq_set_rss_lut(hw, vsi->vsi_id, FALSE, + lut, lut_size); + if (ret) { + PMD_DRV_LOG(ERR, "Failed to set RSS lookup table"); + return ret; + } + } else { + uint32_t *lut_dw = (uint32_t *)lut; + uint16_t i, lut_size_dw = lut_size / 4; + + for (i = 0; i < lut_size_dw; i++) + I40E_WRITE_REG(hw, I40E_VFQF_HLUT(i), lut_dw[i]); + I40EVF_WRITE_FLUSH(hw); + } + + return 0; +} + +static int i40evf_dev_rss_reta_update(struct rte_eth_dev *dev, struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size) { - struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); - uint32_t lut, l; - uint16_t i, j; - uint16_t idx, shift; - uint8_t mask; + struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); + uint8_t *lut; + uint16_t i, idx, shift; + int ret; if (reta_size != ETH_RSS_RETA_SIZE_64) { PMD_DRV_LOG(ERR, "The size of hash lookup table configured " @@ -1795,29 +1854,26 @@ i40evf_dev_rss_reta_update(struct rte_eth_dev *dev, return -EINVAL; } - for (i = 0; i < reta_size; i += I40E_4_BIT_WIDTH) { + lut = rte_zmalloc("i40e_rss_lut", reta_size, 0); + if (!lut) { + PMD_DRV_LOG(ERR, "No memory can be allocated"); + return -ENOMEM; + } + ret = i40evf_get_rss_lut(&vf->vsi, lut, reta_size); + if (ret) + goto out; + for (i = 0; i < reta_size; i++) { idx = i / RTE_RETA
[dpdk-dev] [PATCH 2/3] i40e: add support of AQ based RSS config
It supports both Admin queue based and directly writing registers based RSS hash key and lookup table configuration, as X722 supports AQ based configuration. Signed-off-by: Helin Zhang --- drivers/net/i40e/i40e_ethdev.c | 229 ++--- drivers/net/i40e/i40e_ethdev.h | 4 +- 2 files changed, 173 insertions(+), 60 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 2dd9fdc..0f4ef5b 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -1994,16 +1994,72 @@ i40e_mac_filter_handle(struct rte_eth_dev *dev, enum rte_filter_op filter_op, } static int +i40e_get_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size) +{ + struct i40e_pf *pf = I40E_VSI_TO_PF(vsi); + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); + int ret; + + if (!lut) + return -EINVAL; + + if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE) { + ret = i40e_aq_get_rss_lut(hw, vsi->vsi_id, TRUE, + lut, lut_size); + if (ret) { + PMD_DRV_LOG(ERR, "Failed to get RSS lookup table"); + return ret; + } + } else { + uint32_t *lut_dw = (uint32_t *)lut; + uint16_t i, lut_size_dw = lut_size / 4; + + for (i = 0; i < lut_size_dw; i++) + lut_dw[i] = I40E_READ_REG(hw, I40E_PFQF_HLUT(i)); + } + + return 0; +} + +static int +i40e_set_rss_lut(struct i40e_vsi *vsi, uint8_t *lut, uint16_t lut_size) +{ + struct i40e_pf *pf = I40E_VSI_TO_PF(vsi); + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); + int ret; + + if (!vsi || !lut) + return -EINVAL; + + if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE) { + ret = i40e_aq_set_rss_lut(hw, vsi->vsi_id, TRUE, + lut, lut_size); + if (ret) { + PMD_DRV_LOG(ERR, "Failed to set RSS lookup table"); + return ret; + } + } else { + uint32_t *lut_dw = (uint32_t *)lut; + uint16_t i, lut_size_dw = lut_size / 4; + + for (i = 0; i < lut_size_dw; i++) + I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i), lut_dw[i]); + I40E_WRITE_FLUSH(hw); + } + + return 0; +} + +static int i40e_dev_rss_reta_update(struct rte_eth_dev *dev, struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size) { struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); - struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); - uint32_t lut, l; - uint16_t i, j, lut_size = pf->hash_lut_size; + uint16_t i, lut_size = pf->hash_lut_size; uint16_t idx, shift; - uint8_t mask; + uint8_t *lut; + int ret; if (reta_size != lut_size || reta_size > ETH_RSS_RETA_SIZE_512) { @@ -2013,28 +2069,26 @@ i40e_dev_rss_reta_update(struct rte_eth_dev *dev, return -EINVAL; } - for (i = 0; i < reta_size; i += I40E_4_BIT_WIDTH) { + lut = rte_zmalloc("i40e_rss_lut", reta_size, 0); + if (!lut) { + PMD_DRV_LOG(ERR, "No memory can be allocated"); + return -ENOMEM; + } + ret = i40e_get_rss_lut(pf->main_vsi, lut, reta_size); + if (ret) + goto out; + for (i = 0; i < reta_size; i++) { idx = i / RTE_RETA_GROUP_SIZE; shift = i % RTE_RETA_GROUP_SIZE; - mask = (uint8_t)((reta_conf[idx].mask >> shift) & - I40E_4_BIT_MASK); - if (!mask) - continue; - if (mask == I40E_4_BIT_MASK) - l = 0; - else - l = I40E_READ_REG(hw, I40E_PFQF_HLUT(i >> 2)); - for (j = 0, lut = 0; j < I40E_4_BIT_WIDTH; j++) { - if (mask & (0x1 << j)) - lut |= reta_conf[idx].reta[shift + j] << - (CHAR_BIT * j); - else - lut |= l & (I40E_8_BIT_MASK << (CHAR_BIT * j)); - } - I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i >> 2), lut); + if (reta_conf[idx].mask & (1ULL << shift)) + lut[i] = reta_conf[idx].reta[shift]; } + ret = i40e_set_rss_lut(pf->main_vsi, lut, reta_size); - return 0; +out: + rte_free(lut); + + return ret; } static int @@ -2043,11 +2097,10 @@ i40e_dev_rss_reta_query(struct rte_eth_dev *dev, uint16_t reta_size) { struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); - struc
[dpdk-dev] [PATCH 1/3] i40e: add support of X722 and its A0 hardware
In order to provide users early access of X722 and its A0 hardware, new device IDs are added, and also compilation with those support in base driver is enabled. Signed-off-by: Helin Zhang --- drivers/net/i40e/Makefile | 1 + lib/librte_eal/common/include/rte_pci_dev_ids.h | 14 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile index 55b7d31..5744c0d 100644 --- a/drivers/net/i40e/Makefile +++ b/drivers/net/i40e/Makefile @@ -38,6 +38,7 @@ LIB = librte_pmd_i40e.a CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) -DPF_DRIVER -DVF_DRIVER -DINTEGRATED_VF +CFLAGS += -DX722_SUPPORT -DX722_A0_SUPPORT EXPORT_MAP := rte_pmd_i40e_version.map diff --git a/lib/librte_eal/common/include/rte_pci_dev_ids.h b/lib/librte_eal/common/include/rte_pci_dev_ids.h index 265e66c..fb29650 100644 --- a/lib/librte_eal/common/include/rte_pci_dev_ids.h +++ b/lib/librte_eal/common/include/rte_pci_dev_ids.h @@ -4,7 +4,7 @@ * * GPL LICENSE SUMMARY * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as @@ -499,6 +499,10 @@ RTE_PCI_DEV_ID_DECL_IXGBE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82599_BYPASS) #define I40E_DEV_ID_20G_KR2 0x1587 #define I40E_DEV_ID_20G_KR2_A 0x1588 #define I40E_DEV_ID_10G_BASE_T4 0x1589 +#define I40E_DEV_ID_X722_A0 0x374C +#define I40E_DEV_ID_SFP_X7220x37D0 +#define I40E_DEV_ID_1G_BASE_T_X722 0x37D1 +#define I40E_DEV_ID_10G_BASE_T_X722 0x37D2 RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_SFP_XL710) RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_QEMU) @@ -512,6 +516,10 @@ RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_10G_BASE_T) RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_20G_KR2) RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_20G_KR2_A) RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_10G_BASE_T4) +RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_X722_A0) +RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_SFP_X722) +RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_1G_BASE_T_X722) +RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_10G_BASE_T_X722) /*** Physical FM10K devices from fm10k_type.h ***/ @@ -555,9 +563,13 @@ RTE_PCI_DEV_ID_DECL_IXGBEVF(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_X550EM_X_VF_HV) #define I40E_DEV_ID_VF 0x154C #define I40E_DEV_ID_VF_HV 0x1571 +#define I40E_DEV_ID_X722_VF 0x37CD +#define I40E_DEV_ID_X722_VF_HV 0x37D9 RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_VF) RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_VF_HV) +RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_X722_VF) +RTE_PCI_DEV_ID_DECL_I40EVF(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_X722_VF_HV) /** Virtio devices from virtio.h **/ -- 1.9.3
[dpdk-dev] [PATCH 0/3] add i40e series x722 support
It supports i40e series x722 and its A0 hardware for early access. Helin Zhang (3): i40e: add support of X722 and its A0 hardware i40e: add support of AQ based RSS config i40evf: add support of AQ based RSS config drivers/net/i40e/Makefile | 1 + drivers/net/i40e/i40e_ethdev.c | 229 +-- drivers/net/i40e/i40e_ethdev.h | 7 +- drivers/net/i40e/i40e_ethdev_vf.c | 230 +--- lib/librte_eal/common/include/rte_pci_dev_ids.h | 14 +- 5 files changed, 360 insertions(+), 121 deletions(-) -- 1.9.3
[dpdk-dev] [PATCH v2] i40e: Add a workaround to drop flow control frames from VFs
This patch adds a workaround to drop flow control frames from being transmitted from VSIs. With this patch in place a malicious VF cannot send flow control or PFC packets out on the wire. Signed-off-by: Jingjing Wu --- v2: - reword comments drivers/net/i40e/i40e_ethdev.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 2dd9fdc..3d19f42 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -382,6 +382,30 @@ static inline void i40e_flex_payload_reg_init(struct i40e_hw *hw) I40E_WRITE_REG(hw, I40E_GLQF_PIT(17), 0x7440); } +#define I40E_FLOW_CONTROL_ETHERTYPE 0x8808 + +/* + * Add a ethertype filter to drop all flow control frames transimited + * from VSIs. +*/ +static void +i40e_add_tx_flow_control_drop_filter(struct i40e_pf *pf) +{ + struct i40e_hw *hw = I40E_PF_TO_HW(pf); + uint16_t flags = I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC | + I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP | + I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TX; + int ret; + + ret = i40e_aq_add_rem_control_packet_filter(hw, NULL, + I40E_FLOW_CONTROL_ETHERTYPE, flags, + pf->main_vsi_seid, 0, + TRUE, NULL, NULL); + if (ret) + PMD_INIT_LOG(ERR, "Failed to add filter to drop flow control " + " frames from VSIs."); +} + static int eth_i40e_dev_init(struct rte_eth_dev *dev) { @@ -584,6 +608,12 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) /* enable uio intr after callback register */ rte_intr_enable(&(pci_dev->intr_handle)); + /* +* Add an ethertype filter to drop all flow control frames transimited +* from VSIs. By doing so, we stop VF from sening out PAUSE or PFC +* frames to wire. +*/ + i40e_add_tx_flow_control_drop_filter(pf); /* initialize mirror rule list */ TAILQ_INIT(&pf->mirror_list); -- 2.4.0
[dpdk-dev] [PATCH v3] mbuf/ip_frag: Move mbuf chaining to common code
2015-10-13 14:50, Simon Kagstrom: > Ping? OK you apply the ping method we have just talked about :) To make it really effective, you should have these headers: To: Olivier Matz Cc: dev at dpdk.org Indeed, as the mbuf maintainer, he's the target of your ping. And to make it clear, the title should be mbuf: move chaining from ip_frag library (note the lowercase in "move") Now you know how to do it so you can spread the word when someone forget these advices. Thanks
[dpdk-dev] [PATCH v3] mbuf/ip_frag: Move mbuf chaining to common code
Hi Simon, On 09/07/2015 02:50 PM, Simon Kagstrom wrote: > Chaining/segmenting mbufs can be useful in many places, so make it > global. > > Signed-off-by: Simon Kagstrom > Signed-off-by: Johan Faltstrom > > [...] > > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -1775,6 +1775,40 @@ static inline int rte_pktmbuf_is_contiguous(const > struct rte_mbuf *m) > } > > /** > + * Chain an mbuf to another, thereby creating a segmented packet. > + * > + * Note: The implementation will do a linear walk over the segments to find > + * the tail entry. For cases when there are many segments, it's better to > + * chain the entries manually. > + * > + * @param head the head of the mbuf chain (the first packet) > + * @param tail the mbuf to put last in the chain > + * > + * @return 0 on success, -EOVERFLOW if the chain is full (256 entries) > + */ Small nit about the API comment, it should be: @param head The head of the mbuf chain (the first packet). ... (note the uppercase and the dot at the end, see the other functions in the file) I know Thomas usually fixes this kind of stuff when he pushes the patches, but it's better if we can avoid him this load :) Regards, Olivier
[dpdk-dev] [PATCH v2] kni: Use utsrelease.h to determine Ubuntu kernel version
Hi Simon, I'm looking forward to this patch since we also build the DPDK for kernel versions which differ from the host kernel. IMHO, the check for "Ubuntu" also needs change since this is also a "host system" check instead of "target system" check. I.e. A user may want to build for Ubuntu on a non-Ubuntu system and vice versa. With best regards, Tom. -- Tom Ghyselinck Excentis N.V. On di, 2015-10-13 at 14:50 +0200, Simon Kagstrom wrote: > Ping? > > // Simon > > On Thu, 20 Aug 2015 08:51:06 +0200 > Simon Kagstrom wrote: > > > /proc/version_signature is the version for the host machine, but in > > e.g., chroots, this does not necessarily match that DPDK is built > > for. DPDK will then build for the wrong kernel version - that of > > the > > server, and not that installed in the (build) chroot. > > > > The patch uses utsrelease.h from the kernel sources instead and > > fakes > > the upload version. > > > > Tested on a server with Ubuntu 12.04, building in a chroot for > > Ubuntu > > 14.04. > > > > Signed-off-by: Simon Kagstrom > > Signed-off-by: Johan Faltstrom > > --- > > ChangeLog: > > > > v2: Improve description and motivation for the patch. > > > > lib/librte_eal/linuxapp/kni/Makefile | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_eal/linuxapp/kni/Makefile > > b/lib/librte_eal/linuxapp/kni/Makefile > > index fb673d9..ac99d3f 100644 > > --- a/lib/librte_eal/linuxapp/kni/Makefile > > +++ b/lib/librte_eal/linuxapp/kni/Makefile > > @@ -44,10 +44,10 @@ MODULE_CFLAGS += -I$(RTE_OUTPUT)/include > > -I$(SRCDIR)/ethtool/ixgbe -I$(SRCDIR)/e > > MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h > > MODULE_CFLAGS += -Wall -Werror > > > > -ifeq ($(shell test -f /proc/version_signature && lsb_release -si > > 2>/dev/null),Ubuntu) > > +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu) > > MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | > > tr -d .) > > -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 > > /proc/version_signature | \ > > -cut -d'~' -f1 | cut -d- -f1,2 | tr .- > > $(comma)) > > +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE > > $(RTE_KERNELDIR)/include/generated/utsrelease.h \ > > +| cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1) > > MODULE_CFLAGS += > > -D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_CODE))" > > endif > > >
[dpdk-dev] IXGBE RX packet loss with 5+ cores
On Mon, Oct 12, 2015 at 10:18:30PM -0700, Stephen Hemminger wrote: > On Tue, 13 Oct 2015 02:57:46 + > "Sanford, Robert" wrote: > > > I'm hoping that someone (perhaps at Intel) can help us understand > > an IXGBE RX packet loss issue we're able to reproduce with testpmd. > > > > We run testpmd with various numbers of cores. We offer line-rate > > traffic (~14.88 Mpps) to one ethernet port, and forward all received > > packets via the second port. > > > > When we configure 1, 2, 3, or 4 cores (per port, with same number RX > > queues per port), there is no RX packet loss. When we configure 5 or > > more cores, we observe the following packet loss (approximate): > > 5 cores - 3% loss > > 6 cores - 7% loss > > 7 cores - 11% loss > > 8 cores - 15% loss > > 9 cores - 18% loss > > > > All of the "lost" packets are accounted for in the device's Rx Missed > > Packets Count register (RXMPC[0]). Quoting the datasheet: > > "Packets are missed when the receive FIFO has insufficient space to > > store the incoming packet. This might be caused due to insufficient > > buffers allocated, or because there is insufficient bandwidth on the > > IO bus." > > > > RXMPC, and our use of API rx_descriptor_done to verify that we don't > > run out of mbufs (discussed below), lead us to theorize that packet > > loss occurs because the device is unable to DMA all packets from its > > internal packet buffer (512 KB, reported by register RXPBSIZE[0]) > > before overrun. > > > > Questions > > = > > 1. The 82599 device supports up to 128 queues. Why do we see trouble > > with as few as 5 queues? What could limit the system (and one port > > controlled by 5+ cores) from receiving at line-rate without loss? > > > > 2. As far as we can tell, the RX path only touches the device > > registers when it updates a Receive Descriptor Tail register (RDT[n]), > > roughly every rx_free_thresh packets. Is there a big difference > > between one core doing this and N cores doing it 1/N as often? > > > > 3. Do CPU reads/writes from/to device registers have a higher priority > > than device reads/writes from/to memory? Could the former transactions > > (CPU <-> device) significantly impede the latter (device <-> RAM)? > > > > Thanks in advance for any help you can provide. > > As you add cores, there is more traffic on the PCI bus from each core > polling. There is a fix number of PCI bus transactions per second possible. > Each core is increasing the number of useless (empty) transactions. > Why do you think adding more cores will help? > The polling for packets by the core should not be using PCI bandwidth directly, as the ixgbe driver (and other drivers) check for the DD bit being set on the descriptor in memory/cache. However, using an increased number of queues can use PCI bandwidth in other ways, for instance, with more queues you reduce the amount of descriptor coalescing that can be done by the NICs, so that instead of having a single transaction of 4 descriptors to one queue, the NIC may instead have to do 4 transactions each writing 1 descriptor to 4 different queues. This is possibly why sending all traffic to a single queue works ok - the polling on the other queues is still being done, but has little effect. Regards, /Bruce
[dpdk-dev] [PATCH v2 8/8] librte_table: modify release notes and deprecation notice
From: Fan Zhang The LIBABIVER number is incremented. The release notes is updated and the deprecation announce is removed. Signed-off-by: Fan Zhang --- doc/guides/rel_notes/deprecation.rst | 3 --- doc/guides/rel_notes/release_2_2.rst | 4 +++- lib/librte_table/Makefile| 2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index fa55117..06e0078 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -56,9 +56,6 @@ Deprecation Notices * librte_table: New functions for table entry bulk add/delete will be added to the table operations structure. -* librte_table hash: Key mask parameter will be added to the hash table - parameter structure for 8-byte key and 16-byte key extendible bucket and - LRU tables. * librte_pipeline: The prototype for the pipeline input port, output port and table action handlers will be updated: diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5687676..30197ec 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -98,6 +98,8 @@ ABI Changes * The LPM structure is changed. The deprecated field mem_location is removed. +* Key mask parameter is added to the hash table parameter structure for + 8-byte key and 16-byte key extendible bucket and LRU tables. Shared Library Versions --- @@ -130,6 +132,6 @@ The libraries prepended with a plus sign were incremented in this version. librte_reorder.so.1 librte_ring.so.1 librte_sched.so.1 - librte_table.so.1 + librte_table.so.2 librte_timer.so.1 librte_vhost.so.1 diff --git a/lib/librte_table/Makefile b/lib/librte_table/Makefile index c5b3eaf..7f02af3 100644 --- a/lib/librte_table/Makefile +++ b/lib/librte_table/Makefile @@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS) EXPORT_MAP := rte_table_version.map -LIBABIVER := 1 +LIBABIVER := 2 # # all source are stored in SRCS-y -- 2.1.0
[dpdk-dev] [PATCH v2 7/8] example/ip_pipeline/pipeline: update flow_classification pipeline
From: Fan Zhang This patch updates the flow_classification pipeline for added key_mask parameter in 8/16-byte key hash parameters. The update provides user optional key_mask configuration item applying to the packets. Signed-off-by: Fan Zhang --- .../pipeline/pipeline_flow_classification_be.c | 56 -- 1 file changed, 52 insertions(+), 4 deletions(-) diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c b/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c index 06a648d..e22f96f 100644 --- a/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c +++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "pipeline_flow_classification_be.h" #include "hash_func.h" @@ -49,6 +50,7 @@ struct pipeline_flow_classification { uint32_t key_offset; uint32_t key_size; uint32_t hash_offset; + uint8_t *key_mask; } __rte_cache_aligned; static void * @@ -125,8 +127,12 @@ pipeline_fc_parse_args(struct pipeline_flow_classification *p, uint32_t key_offset_present = 0; uint32_t key_size_present = 0; uint32_t hash_offset_present = 0; + uint32_t key_mask_present = 0; uint32_t i; + char *key_mask_str = NULL; + + p->hash_offset = 0; for (i = 0; i < params->n_args; i++) { char *arg_name = params->args_name[i]; @@ -171,6 +177,20 @@ pipeline_fc_parse_args(struct pipeline_flow_classification *p, continue; } + /* key_mask */ + if (strcmp(arg_name, "key_mask") == 0) { + if (key_mask_present) + return -1; + + key_mask_str = strdup(arg_value); + if (key_mask_str == NULL) + return -1; + + key_mask_present = 1; + + continue; + } + /* hash_offset */ if (strcmp(arg_name, "hash_offset") == 0) { if (hash_offset_present) @@ -189,10 +209,23 @@ pipeline_fc_parse_args(struct pipeline_flow_classification *p, /* Check that mandatory arguments are present */ if ((n_flows_present == 0) || (key_offset_present == 0) || - (key_size_present == 0) || - (hash_offset_present == 0)) + (key_size_present == 0)) return -1; + if (key_mask_present) { + p->key_mask = rte_malloc(NULL, p->key_size, 0); + if (p->key_mask == NULL) + return -1; + + if (parse_hex_string(key_mask_str, p->key_mask, &p->key_size) + != 0) { + free(p->key_mask); + return -1; + } + + free(key_mask_str); + } + return 0; } @@ -297,6 +330,7 @@ static void *pipeline_fc_init(struct pipeline_params *params, .signature_offset = p_fc->hash_offset, .key_offset = p_fc->key_offset, .f_hash = hash_func[(p_fc->key_size / 8) - 1], + .key_mask = p_fc->key_mask, .seed = 0, }; @@ -307,6 +341,7 @@ static void *pipeline_fc_init(struct pipeline_params *params, .signature_offset = p_fc->hash_offset, .key_offset = p_fc->key_offset, .f_hash = hash_func[(p_fc->key_size / 8) - 1], + .key_mask = p_fc->key_mask, .seed = 0, }; @@ -336,12 +371,25 @@ static void *pipeline_fc_init(struct pipeline_params *params, switch (p_fc->key_size) { case 8: - table_params.ops = &rte_table_hash_key8_lru_ops; + if (p_fc->hash_offset != 0) { + table_params.ops = + &rte_table_hash_key8_ext_ops; + } else { + table_params.ops = + &rte_table_hash_key8_ext_dosig_ops; + } table_params.arg_create = &table_hash_key8_params; break; + break; case 16: - table_params.ops = &rte_table_hash_key16_ext_ops; + if (p_fc->hash_offset != 0) { + table_params.ops = + &rte_table_hash_key16_ext_ops; + } else { + table_params.ops = + &rte_table_hash_key16_ext_dosig_ops; + } table_p
[dpdk-dev] [PATCH v2 6/8] example/ip_pipeline: add parse_hex_string for internal use
From: Fan Zhang This patch adds parse_hex_string function to parse hex string to uint8_t array. Signed-off-by: Fan Zhang --- examples/ip_pipeline/config_parse.c | 70 + examples/ip_pipeline/pipeline.h | 4 +++ 2 files changed, 74 insertions(+) diff --git a/examples/ip_pipeline/config_parse.c b/examples/ip_pipeline/config_parse.c index c9b78f9..d7ee707 100644 --- a/examples/ip_pipeline/config_parse.c +++ b/examples/ip_pipeline/config_parse.c @@ -455,6 +455,76 @@ parse_pipeline_core(uint32_t *socket, return 0; } +static uint32_t +get_hex_val(char c) +{ + switch (c) { + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + return c - '0'; + case 'A': + case 'B': + case 'C': + case 'D': + case 'E': + case 'F': + return c - 'A' + 10; + case 'a': + case 'b': + case 'c': + case 'd': + case 'e': + case 'f': + return c - 'a' + 10; + default: + return 0; + } +} + +int +parse_hex_string(char *src, uint8_t *dst, uint32_t *size) +{ + char *c; + uint32_t len, i; + + /* Check input parameters */ + if ((src == NULL) || + (dst == NULL) || + (size == NULL) || + (*size == 0)) + return -1; + + len = strlen(src); + if (((len & 3) != 0) || + (len > (*size) * 2)) + return -1; + *size = len / 2; + + for (c = src; *c != 0; c++) { + if *c) >= '0') && ((*c) <= '9')) || + (((*c) >= 'A') && ((*c) <= 'F')) || + (((*c) >= 'a') && ((*c) <= 'f'))) + continue; + + return -1; + } + + /* Convert chars to bytes */ + for (i = 0; i < *size; i++) + dst[i] = get_hex_val(src[2 * i]) * 16 + + get_hex_val(src[2 * i + 1]); + + return 0; +} + static size_t skip_digits(const char *src) { diff --git a/examples/ip_pipeline/pipeline.h b/examples/ip_pipeline/pipeline.h index b9a56ea..4063594 100644 --- a/examples/ip_pipeline/pipeline.h +++ b/examples/ip_pipeline/pipeline.h @@ -84,4 +84,8 @@ pipeline_type_cmds_count(struct pipeline_type *ptype) return n_cmds; } +/* Parse hex string to uint8_t array */ +int +parse_hex_string(char *src, uint8_t *dst, uint32_t *size); + #endif -- 2.1.0
[dpdk-dev] [PATCH v2 5/8] app/test-pipeline: modify pipeline test
From: Fan Zhang Test-pipeline have been updated to work on added key_mask parameter for 8-byte key extendible bucket and LRU tables. Signed-off-by: Fan Zhang --- app/test-pipeline/pipeline_hash.c | 4 1 file changed, 4 insertions(+) diff --git a/app/test-pipeline/pipeline_hash.c b/app/test-pipeline/pipeline_hash.c index 548615f..dda0d4d 100644 --- a/app/test-pipeline/pipeline_hash.c +++ b/app/test-pipeline/pipeline_hash.c @@ -216,6 +216,7 @@ app_main_loop_worker_pipeline_hash(void) { .n_entries_ext = 1 << 23, .signature_offset = 0, .key_offset = 32, + .key_mask = NULL, .f_hash = test_hash, .seed = 0, }; @@ -240,6 +241,7 @@ app_main_loop_worker_pipeline_hash(void) { .n_entries = 1 << 24, .signature_offset = 0, .key_offset = 32, + .key_mask = NULL, .f_hash = test_hash, .seed = 0, }; @@ -267,6 +269,7 @@ app_main_loop_worker_pipeline_hash(void) { .key_offset = 32, .f_hash = test_hash, .seed = 0, + .key_mask = NULL, }; struct rte_pipeline_table_params table_params = { @@ -291,6 +294,7 @@ app_main_loop_worker_pipeline_hash(void) { .key_offset = 32, .f_hash = test_hash, .seed = 0, + .key_mask = NULL }; struct rte_pipeline_table_params table_params = { -- 2.1.0
[dpdk-dev] [PATCH v2 4/8] app/test: modify app/test_table_combined and app/test_table_tables
From: Fan Zhang Tests have been updated to work on added key_mask parameter for 8-byte key extendible bucket and LRU tables. Signed-off-by: Fan Zhang --- app/test/test_table_combined.c | 4 app/test/test_table_tables.c | 6 -- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/app/test/test_table_combined.c b/app/test/test_table_combined.c index dd09da5..359ac45 100644 --- a/app/test/test_table_combined.c +++ b/app/test/test_table_combined.c @@ -419,6 +419,7 @@ test_table_hash8lru(void) .seed = 0, .signature_offset = 0, .key_offset = 32, + .key_mask = NULL, }; uint8_t key8lru[8]; @@ -477,6 +478,7 @@ test_table_hash16lru(void) .seed = 0, .signature_offset = 0, .key_offset = 32, + .key_mask = NULL, }; uint8_t key16lru[16]; @@ -594,6 +596,7 @@ test_table_hash8ext(void) .seed = 0, .signature_offset = 0, .key_offset = 32, + .key_mask = NULL, }; uint8_t key8ext[8]; @@ -660,6 +663,7 @@ test_table_hash16ext(void) .seed = 0, .signature_offset = 0, .key_offset = 32, + .key_mask = NULL, }; uint8_t key16ext[16]; diff --git a/app/test/test_table_tables.c b/app/test/test_table_tables.c index 566964b..cc222f1 100644 --- a/app/test/test_table_tables.c +++ b/app/test/test_table_tables.c @@ -651,7 +651,8 @@ test_table_hash_lru_generic(struct rte_table_ops *ops) .f_hash = pipeline_test_hash, .seed = 0, .signature_offset = 1, - .key_offset = 32 + .key_offset = 32, + .key_mask = NULL, }; hash_params.n_entries = 0; @@ -766,7 +767,8 @@ test_table_hash_ext_generic(struct rte_table_ops *ops) .f_hash = pipeline_test_hash, .seed = 0, .signature_offset = 1, - .key_offset = 32 + .key_offset = 32, + .key_mask = NULL, }; hash_params.n_entries = 0; -- 2.1.0
[dpdk-dev] [PATCH v2 3/8] librte_table: add 16 byte hash table operations with computed lookup
From: Fan Zhang This patch is to adding hash table operations for key signature computed on lookup ("do-sig") for LRU hash tables and Extendible buckets. Signed-off-by: Fan Zhang --- lib/librte_table/rte_table_hash.h | 8 + lib/librte_table/rte_table_hash_key16.c | 358 +++- 2 files changed, 363 insertions(+), 3 deletions(-) diff --git a/lib/librte_table/rte_table_hash.h b/lib/librte_table/rte_table_hash.h index e2c60e1..9d17516 100644 --- a/lib/librte_table/rte_table_hash.h +++ b/lib/librte_table/rte_table_hash.h @@ -271,6 +271,10 @@ struct rte_table_hash_key16_lru_params { /** LRU hash table operations for pre-computed key signature */ extern struct rte_table_ops rte_table_hash_key16_lru_ops; +/** LRU hash table operations for key signature computed on lookup +("do-sig") */ +extern struct rte_table_ops rte_table_hash_key16_lru_dosig_ops; + /** Extendible bucket hash table parameters */ struct rte_table_hash_key16_ext_params { /** Maximum number of entries (and keys) in the table */ @@ -301,6 +305,10 @@ struct rte_table_hash_key16_ext_params { /** Extendible bucket operations for pre-computed key signature */ extern struct rte_table_ops rte_table_hash_key16_ext_ops; +/** Extendible bucket hash table operations for key signature computed on +lookup ("do-sig") */ +extern struct rte_table_ops rte_table_hash_key16_ext_dosig_ops; + /** * 32-byte key hash tables * diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c index ffd3249..427b534 100644 --- a/lib/librte_table/rte_table_hash_key16.c +++ b/lib/librte_table/rte_table_hash_key16.c @@ -620,6 +620,27 @@ rte_table_hash_entry_delete_key16_ext( rte_prefetch0((void *)(((uintptr_t) bucket1) + RTE_CACHE_LINE_SIZE));\ } +#define lookup1_stage1_dosig(mbuf1, bucket1, f)\ +{ \ + uint64_t *key; \ + uint64_t signature = 0; \ + uint32_t bucket_index; \ + uint64_t hash_key_buffer[2];\ + \ + key = RTE_MBUF_METADATA_UINT64_PTR(mbuf1, f->key_offset);\ + \ + hash_key_buffer[0] = key[0] & f->key_mask[0]; \ + hash_key_buffer[1] = key[1] & f->key_mask[1]; \ + signature = f->f_hash(hash_key_buffer, \ + RTE_TABLE_HASH_KEY_SIZE, f->seed); \ + \ + bucket_index = signature & (f->n_buckets - 1); \ + bucket1 = (struct rte_bucket_4_16 *)\ + &f->memory[bucket_index * f->bucket_size]; \ + rte_prefetch0(bucket1); \ + rte_prefetch0((void *)(((uintptr_t) bucket1) + RTE_CACHE_LINE_SIZE));\ +} + #define lookup1_stage2_lru(pkt2_index, mbuf2, bucket2, \ pkts_mask_out, entries, f) \ { \ @@ -769,6 +790,36 @@ rte_table_hash_entry_delete_key16_ext( rte_prefetch0((void *)(((uintptr_t) bucket11) + RTE_CACHE_LINE_SIZE));\ } +#define lookup2_stage1_dosig(mbuf10, mbuf11, bucket10, bucket11, f)\ +{ \ + uint64_t *key10, *key11;\ + uint64_t hash_offset_buffer[2]; \ + uint64_t signature10, signature11; \ + uint32_t bucket10_index, bucket11_index;\ + \ + key10 = RTE_MBUF_METADATA_UINT64_PTR(mbuf10, f->key_offset);\ + hash_offset_buffer[0] = key10[0] & f->key_mask[0]; \ + hash_offset_buffer[1] = key10[1] & f->key_mask[1]; \ + signature10 = f->f_hash(hash_offset_buffer, \ + RTE_TABLE_HASH_KEY_SIZE, f->seed);\ + bucket10_index = signature10 & (f->n_buckets - 1); \ + bucket10 = (struct rte_bucket_4_16 *) \ + &f->memory[bucket10_index * f->bucket_size];\ + rte_prefetch0(bucket10);\ + rte_prefetch0((void *)(((uintptr_t) bucket10) + RTE_CACHE_LINE_SIZE));\ + \ + key11 = RTE_MBUF_METADATA_UINT64_PTR(mbuf11, f->key_offset);\ + hash_offset_buffer[0] = key11[0] & f->key_mask[0]; \ + hash_offset_buffer[1] = key11[1] & f->key_mask[1]; \ + signature11 = f->f_hash(hash_offset_buffer, \ + RTE_TABLE_HASH_KEY_SIZE, f->seed);\ + bucket11_index = signature11 & (f
[dpdk-dev] [PATCH v2 2/8] librte_table: add key_mask parameter to 16-byte key hash parameters
From: Fan Zhang This patch relates to ABI change proposed for librte_table. key_mask parameter is added for 16-byte key extendible bucket and LRU tables. Signed-off-by: Fan Zhang --- lib/librte_table/rte_table_hash.h | 6 lib/librte_table/rte_table_hash_key16.c | 53 - 2 files changed, 52 insertions(+), 7 deletions(-) diff --git a/lib/librte_table/rte_table_hash.h b/lib/librte_table/rte_table_hash.h index ef65355..e2c60e1 100644 --- a/lib/librte_table/rte_table_hash.h +++ b/lib/librte_table/rte_table_hash.h @@ -263,6 +263,9 @@ struct rte_table_hash_key16_lru_params { /** Byte offset within packet meta-data where the key is located */ uint32_t key_offset; + + /** Bit-mask to be AND-ed to the key on lookup */ + uint8_t *key_mask; }; /** LRU hash table operations for pre-computed key signature */ @@ -290,6 +293,9 @@ struct rte_table_hash_key16_ext_params { /** Byte offset within packet meta-data where the key is located */ uint32_t key_offset; + + /** Bit-mask to be AND-ed to the key on lookup */ + uint8_t *key_mask; }; /** Extendible bucket operations for pre-computed key signature */ diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c index f6a3306..ffd3249 100644 --- a/lib/librte_table/rte_table_hash_key16.c +++ b/lib/librte_table/rte_table_hash_key16.c @@ -85,6 +85,7 @@ struct rte_table_hash { uint32_t bucket_size; uint32_t signature_offset; uint32_t key_offset; + uint64_t key_mask[2]; rte_table_hash_op_hash f_hash; uint64_t seed; @@ -164,6 +165,14 @@ rte_table_hash_create_key16_lru(void *params, f->f_hash = p->f_hash; f->seed = p->seed; + if (p->key_mask != NULL) { + f->key_mask[0] = ((uint64_t *)p->key_mask)[0]; + f->key_mask[1] = ((uint64_t *)p->key_mask)[1]; + } else { + f->key_mask[0] = 0xLLU; + f->key_mask[1] = 0xLLU; + } + for (i = 0; i < n_buckets; i++) { struct rte_bucket_4_16 *bucket; @@ -384,6 +393,14 @@ rte_table_hash_create_key16_ext(void *params, for (i = 0; i < n_buckets_ext; i++) f->stack[i] = i; + if (p->key_mask != NULL) { + f->key_mask[0] = (((uint64_t *)p->key_mask)[0]); + f->key_mask[1] = (((uint64_t *)p->key_mask)[1]); + } else { + f->key_mask[0] = 0xLLU; + f->key_mask[1] = 0xLLU; + } + return f; } @@ -609,11 +626,14 @@ rte_table_hash_entry_delete_key16_ext( void *a;\ uint64_t pkt_mask; \ uint64_t *key; \ + uint64_t hash_key_buffer[2];\ uint32_t pos; \ \ key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\ + hash_key_buffer[0] = key[0] & f->key_mask[0]; \ + hash_key_buffer[1] = key[1] & f->key_mask[1]; \ \ - lookup_key16_cmp(key, bucket2, pos);\ + lookup_key16_cmp(hash_key_buffer, bucket2, pos);\ \ pkt_mask = (bucket2->signature[pos] & 1LLU) << pkt2_index;\ pkts_mask_out |= pkt_mask; \ @@ -631,11 +651,14 @@ rte_table_hash_entry_delete_key16_ext( void *a;\ uint64_t pkt_mask, bucket_mask; \ uint64_t *key; \ + uint64_t hash_key_buffer[2];\ uint32_t pos; \ \ key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\ + hash_key_buffer[0] = key[0] & f->key_mask[0]; \ + hash_key_buffer[1] = key[1] & f->key_mask[1]; \ \ - lookup_key16_cmp(key, bucket2, pos);\ + lookup_key16_cmp(hash_key_buffer, bucket2, pos);\ \ pkt_mask = (bucket2->signature[pos] & 1LLU) << pkt2_index;\ pkts_mask_out |= pkt_mask; \ @@ -658,12 +681,15 @@ rte_table_hash_entry_delete_key16_ext( void *a;\ uint64_t pkt_mask, bucket_mask; \ uint64_t *key; \ + uint64_t hash_key_buffer[2];
[dpdk-dev] [PATCH v2 1/8] librte_table: add key_mask parameter to 8-byte key hash parameters
From: Fan Zhang This patch relates to ABI change proposed for librte_table. key_mask parameter is added for 8-byte key extendible bucket and LRU tables. Signed-off-by: Fan Zhang --- lib/librte_table/rte_table_hash.h | 6 lib/librte_table/rte_table_hash_key8.c | 54 +++--- 2 files changed, 50 insertions(+), 10 deletions(-) diff --git a/lib/librte_table/rte_table_hash.h b/lib/librte_table/rte_table_hash.h index 9181942..ef65355 100644 --- a/lib/librte_table/rte_table_hash.h +++ b/lib/librte_table/rte_table_hash.h @@ -196,6 +196,9 @@ struct rte_table_hash_key8_lru_params { /** Byte offset within packet meta-data where the key is located */ uint32_t key_offset; + + /** Bit-mask to be AND-ed to the key on lookup */ + uint8_t *key_mask; }; /** LRU hash table operations for pre-computed key signature */ @@ -226,6 +229,9 @@ struct rte_table_hash_key8_ext_params { /** Byte offset within packet meta-data where the key is located */ uint32_t key_offset; + + /** Bit-mask to be AND-ed to the key on lookup */ + uint8_t *key_mask; }; /** Extendible bucket hash table operations for pre-computed key signature */ diff --git a/lib/librte_table/rte_table_hash_key8.c b/lib/librte_table/rte_table_hash_key8.c index b351a49..ccb20cf 100644 --- a/lib/librte_table/rte_table_hash_key8.c +++ b/lib/librte_table/rte_table_hash_key8.c @@ -82,6 +82,7 @@ struct rte_table_hash { uint32_t bucket_size; uint32_t signature_offset; uint32_t key_offset; + uint64_t key_mask; rte_table_hash_op_hash f_hash; uint64_t seed; @@ -160,6 +161,11 @@ rte_table_hash_create_key8_lru(void *params, int socket_id, uint32_t entry_size) f->f_hash = p->f_hash; f->seed = p->seed; + if (p->key_mask != NULL) + f->key_mask = ((uint64_t *)p->key_mask)[0]; + else + f->key_mask = 0xLLU; + for (i = 0; i < n_buckets; i++) { struct rte_bucket_4_8 *bucket; @@ -372,6 +378,11 @@ rte_table_hash_create_key8_ext(void *params, int socket_id, uint32_t entry_size) f->stack = (uint32_t *) &f->memory[(n_buckets + n_buckets_ext) * f->bucket_size]; + if (p->key_mask != NULL) + f->key_mask = ((uint64_t *)p->key_mask)[0]; + else + f->key_mask = 0xLLU; + for (i = 0; i < n_buckets_ext; i++) f->stack[i] = i; @@ -586,9 +597,12 @@ rte_table_hash_entry_delete_key8_ext( uint64_t *key; \ uint64_t signature; \ uint32_t bucket_index; \ + uint64_t hash_key_buffer; \ \ key = RTE_MBUF_METADATA_UINT64_PTR(mbuf1, f->key_offset);\ - signature = f->f_hash(key, RTE_TABLE_HASH_KEY_SIZE, f->seed);\ + hash_key_buffer = *key & f->key_mask; \ + signature = f->f_hash(&hash_key_buffer, \ + RTE_TABLE_HASH_KEY_SIZE, f->seed); \ bucket_index = signature & (f->n_buckets - 1); \ bucket1 = (struct rte_bucket_4_8 *) \ &f->memory[bucket_index * f->bucket_size]; \ @@ -602,10 +616,12 @@ rte_table_hash_entry_delete_key8_ext( uint64_t pkt_mask; \ uint64_t *key; \ uint32_t pos; \ + uint64_t hash_key_buffer; \ \ key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\ + hash_key_buffer = key[0] & f->key_mask; \ \ - lookup_key8_cmp(key, bucket2, pos); \ + lookup_key8_cmp((&hash_key_buffer), bucket2, pos); \ \ pkt_mask = ((bucket2->signature >> pos) & 1LLU) << pkt2_index;\ pkts_mask_out |= pkt_mask; \ @@ -624,10 +640,12 @@ rte_table_hash_entry_delete_key8_ext( uint64_t pkt_mask, bucket_mask; \ uint64_t *key; \ uint32_t pos; \ + uint64_t hash_key_buffer; \ \ key = RTE_MBUF_METADATA_UINT64_PTR(mbuf2, f->key_offset);\ + hash_key_buffer = *key & f->key_mask; \ \ - lookup_key8_cmp(key, bucket2,
[dpdk-dev] [PATCH v2 0/8] librte_table: add key_mask parameter to 8-byte key
From: Fan Zhang This patchset links to ABI change announced for librte_table. Key_mask parameters has been added to the hash table parameter structure for 8-byte key and 16-byte key extendible bucket and LRU tables. v2: *change in release note. Acked-by: Cristian Dumitrescu Fan Zhang (8): librte_table: add key_mask parameter to 8-byte key hash parameters librte_table: add key_mask parameter to 16-byte key hash parameters librte_table: add 16 byte hash table operations with computed lookup app/test: modify app/test_table_combined and app/test_table_tables app/test-pipeline: modify pipeline test example/ip_pipeline: add parse_hex_string for internal use example/ip_pipeline/pipeline: update flow_classification pipeline librte_table: modify release notes and deprecation notice app/test-pipeline/pipeline_hash.c | 4 + app/test/test_table_combined.c | 4 + app/test/test_table_tables.c | 6 +- doc/guides/rel_notes/deprecation.rst | 3 - doc/guides/rel_notes/release_2_2.rst | 4 +- examples/ip_pipeline/config_parse.c| 70 examples/ip_pipeline/pipeline.h| 4 + .../pipeline/pipeline_flow_classification_be.c | 56 ++- lib/librte_table/Makefile | 2 +- lib/librte_table/rte_table_hash.h | 20 + lib/librte_table/rte_table_hash_key16.c| 411 - lib/librte_table/rte_table_hash_key8.c | 54 ++- 12 files changed, 607 insertions(+), 31 deletions(-) -- 2.1.0
[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds
On 2015-10-13 14:45, Thomas Monjalon wrote: > 2015-10-13 14:39, Simon K?gstr?m: >> Two of the patches (this one included) I have outstanding are build fixes >> for use in our build environment, so it would be nice to them upstreamed. > > Waiting for integration of your patches, maybe you have some free time to > help other developers by making reviews ;) Waiting for integration is not my only work-task :-) Anyway, I have too superficial knowledge about DPDK to chime in with relevant comments in most cases, but I'll comment if I feel I can contribute. // Simon
[dpdk-dev] [PATCH v2] kni: Use utsrelease.h to determine Ubuntu kernel version
Ping? // Simon On Thu, 20 Aug 2015 08:51:06 +0200 Simon Kagstrom wrote: > /proc/version_signature is the version for the host machine, but in > e.g., chroots, this does not necessarily match that DPDK is built > for. DPDK will then build for the wrong kernel version - that of the > server, and not that installed in the (build) chroot. > > The patch uses utsrelease.h from the kernel sources instead and fakes > the upload version. > > Tested on a server with Ubuntu 12.04, building in a chroot for Ubuntu > 14.04. > > Signed-off-by: Simon Kagstrom > Signed-off-by: Johan Faltstrom > --- > ChangeLog: > > v2: Improve description and motivation for the patch. > > lib/librte_eal/linuxapp/kni/Makefile | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/kni/Makefile > b/lib/librte_eal/linuxapp/kni/Makefile > index fb673d9..ac99d3f 100644 > --- a/lib/librte_eal/linuxapp/kni/Makefile > +++ b/lib/librte_eal/linuxapp/kni/Makefile > @@ -44,10 +44,10 @@ MODULE_CFLAGS += -I$(RTE_OUTPUT)/include > -I$(SRCDIR)/ethtool/ixgbe -I$(SRCDIR)/e > MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h > MODULE_CFLAGS += -Wall -Werror > > -ifeq ($(shell test -f /proc/version_signature && lsb_release -si > 2>/dev/null),Ubuntu) > +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu) > MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | tr -d .) > -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \ > -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma)) > +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE > $(RTE_KERNELDIR)/include/generated/utsrelease.h \ > + | cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1) > MODULE_CFLAGS += > -D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_CODE))" > endif >
[dpdk-dev] [PATCH v3] mbuf/ip_frag: Move mbuf chaining to common code
Ping? // Simon On Mon, 7 Sep 2015 14:50:09 +0200 Simon Kagstrom wrote: > Chaining/segmenting mbufs can be useful in many places, so make it > global. > > Signed-off-by: Simon Kagstrom > Signed-off-by: Johan Faltstrom > --- > ChangeLog: > v2: > * Check for nb_segs byte overflow (Olivier MATZ) > * Don't reset nb_segs in tail (Olivier MATZ) > v3: > * Describe performance implications of linear search > * Correct check-for-out-of-bounds (Konstantin Ananyev) > > lib/librte_ip_frag/ip_frag_common.h | 23 - > lib/librte_ip_frag/rte_ipv4_reassembly.c | 7 +-- > lib/librte_ip_frag/rte_ipv6_reassembly.c | 7 +-- > lib/librte_mbuf/rte_mbuf.h | 34 > > 4 files changed, 44 insertions(+), 27 deletions(-) > > diff --git a/lib/librte_ip_frag/ip_frag_common.h > b/lib/librte_ip_frag/ip_frag_common.h > index 6b2acee..cde6ed4 100644 > --- a/lib/librte_ip_frag/ip_frag_common.h > +++ b/lib/librte_ip_frag/ip_frag_common.h > @@ -166,27 +166,4 @@ ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms) > fp->frags[IP_FIRST_FRAG_IDX] = zero_frag; > } > > -/* chain two mbufs */ > -static inline void > -ip_frag_chain(struct rte_mbuf *mn, struct rte_mbuf *mp) > -{ > - struct rte_mbuf *ms; > - > - /* adjust start of the last fragment data. */ > - rte_pktmbuf_adj(mp, (uint16_t)(mp->l2_len + mp->l3_len)); > - > - /* chain two fragments. */ > - ms = rte_pktmbuf_lastseg(mn); > - ms->next = mp; > - > - /* accumulate number of segments and total length. */ > - mn->nb_segs = (uint8_t)(mn->nb_segs + mp->nb_segs); > - mn->pkt_len += mp->pkt_len; > - > - /* reset pkt_len and nb_segs for chained fragment. */ > - mp->pkt_len = mp->data_len; > - mp->nb_segs = 1; > -} > - > - > #endif /* _IP_FRAG_COMMON_H_ */ > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c > b/lib/librte_ip_frag/rte_ipv4_reassembly.c > index 5d24843..26d07f9 100644 > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c > @@ -63,7 +63,9 @@ ipv4_frag_reassemble(const struct ip_frag_pkt *fp) > /* previous fragment found. */ > if(fp->frags[i].ofs + fp->frags[i].len == ofs) { > > - ip_frag_chain(fp->frags[i].mb, m); > + /* adjust start of the last fragment data. */ > + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + > m->l3_len)); > + rte_pktmbuf_chain(fp->frags[i].mb, m); > > /* update our last fragment and offset. */ > m = fp->frags[i].mb; > @@ -78,7 +80,8 @@ ipv4_frag_reassemble(const struct ip_frag_pkt *fp) > } > > /* chain with the first fragment. */ > - ip_frag_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m); > + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + m->l3_len)); > + rte_pktmbuf_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m); > m = fp->frags[IP_FIRST_FRAG_IDX].mb; > > /* update mbuf fields for reassembled packet. */ > diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c > b/lib/librte_ip_frag/rte_ipv6_reassembly.c > index 1f1c172..5969b4a 100644 > --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c > +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c > @@ -86,7 +86,9 @@ ipv6_frag_reassemble(const struct ip_frag_pkt *fp) > /* previous fragment found. */ > if (fp->frags[i].ofs + fp->frags[i].len == ofs) { > > - ip_frag_chain(fp->frags[i].mb, m); > + /* adjust start of the last fragment data. */ > + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + > m->l3_len)); > + rte_pktmbuf_chain(fp->frags[i].mb, m); > > /* update our last fragment and offset. */ > m = fp->frags[i].mb; > @@ -101,7 +103,8 @@ ipv6_frag_reassemble(const struct ip_frag_pkt *fp) > } > > /* chain with the first fragment. */ > - ip_frag_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m); > + rte_pktmbuf_adj(m, (uint16_t)(m->l2_len + m->l3_len)); > + rte_pktmbuf_chain(fp->frags[IP_FIRST_FRAG_IDX].mb, m); > m = fp->frags[IP_FIRST_FRAG_IDX].mb; > > /* update mbuf fields for reassembled packet. */ > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index d7c9030..f1f1400 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -1775,6 +1775,40 @@ static inline int rte_pktmbuf_is_contiguous(const > struct rte_mbuf *m) > } > > /** > + * Chain an mbuf to another, thereby creating a segmented packet. > + * > + * Note: The implementation will do a linear walk over the segments to find > + * the tail entry. For cases when there are many segments, it's better to > + * chain the entries manually. > + *
[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name
Hi Chas, On 10/05/2015 05:26 PM, Chas Williams wrote: > If a system is using deterministic interface names, it may be easier in > some cases to use the interface name to blacklist an interface. > > Signed-off-by: Chas Williams <3chas3 at gmail.com> > --- > app/test/test_devargs.c | 2 ++ > lib/librte_eal/common/eal_common_devargs.c | 9 +++-- > lib/librte_eal/common/eal_common_options.c | 2 +- > lib/librte_eal/common/eal_common_pci.c | 10 -- > lib/librte_eal/common/include/rte_devargs.h | 2 ++ > lib/librte_eal/common/include/rte_pci.h | 1 + > lib/librte_eal/linuxapp/eal/eal_pci.c | 15 +++ > 7 files changed, 36 insertions(+), 5 deletions(-) > > diff --git a/app/test/test_devargs.c b/app/test/test_devargs.c > index f7fc59c..27855ff 100644 > > [...] > > @@ -352,6 +354,19 @@ pci_scan_one(const char *dirname, uint16_t domain, > uint8_t bus, > return -1; > } > > + /* get network interface name */ > + snprintf(filename, sizeof(filename), "%s/net", dirname); > + dir = opendir(filename); > + if (dir) { > + while ((e = readdir(dir)) != NULL) { > + if (e->d_name[0] == '.') > + continue; > + > + strncpy(dev->name, e->d_name, sizeof(dev->name)); > + } > + closedir(dir); > + } > + > if (!ret) { > if (!strcmp(driver, "vfio-pci")) > dev->kdrv = RTE_KDRV_VFIO; > For PCI devices that have several interfaces (I think it's the case for some Mellanox boards), maybe we should not store the interface name? Another small comment about the strncpy(): it's maybe safer to ensure that dev->name is properly nul-terminated. Regards, Olivier
[dpdk-dev] IXGBE RX packet loss with 5+ cores
>>> [Robert:] >>> 1. The 82599 device supports up to 128 queues. Why do we see trouble >>> with as few as 5 queues? What could limit the system (and one port >>> controlled by 5+ cores) from receiving at line-rate without loss? >>> >>> 2. As far as we can tell, the RX path only touches the device >>> registers when it updates a Receive Descriptor Tail register (RDT[n]), >>> roughly every rx_free_thresh packets. Is there a big difference >>> between one core doing this and N cores doing it 1/N as often? >>[Stephen:] >>As you add cores, there is more traffic on the PCI bus from each core >>polling. There is a fix number of PCI bus transactions per second >>possible. >>Each core is increasing the number of useless (empty) transactions. >[Bruce:] >The polling for packets by the core should not be using PCI bandwidth >directly, >as the ixgbe driver (and other drivers) check for the DD bit being set on >the >descriptor in memory/cache. I was preparing to reply with the same point. >>[Stephen:] Why do you think adding more cores will help? We're using run-to-completion and sometimes spend too many cycles per pkt. We realize that we need to move to io+workers model, but wanted a better understanding of the dynamics involved here. >[Bruce:] However, using an increased number of queues can >use PCI bandwidth in other ways, for instance, with more queues you >reduce the >amount of descriptor coalescing that can be done by the NICs, so that >instead of >having a single transaction of 4 descriptors to one queue, the NIC may >instead >have to do 4 transactions each writing 1 descriptor to 4 different >queues. This >is possibly why sending all traffic to a single queue works ok - the >polling on >the other queues is still being done, but has little effect. Brilliant! This idea did not occur to me. -- Thanks guys, Robert
[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds
2015-10-13 14:39, Simon K?gstr?m: > On 2015-10-13 14:26, Olivier MATZ wrote: > > There is the patchwork tool: > > http://dpdk.org/dev/patchwork/project/dpdk/list/ > > Thanks! I knew about it, but forgot to look there. It would be nice to > have tags to signify e.g., for-v2.2, candidate-v2.2 etc. like you can > have on github to easier see where patches are going, but perhaps > patchwork doesn't work that way. No it's not needed currently because every patches in this mailing-list target an integration in the main branch for the next release. Exceptions must be notified. > Is the process to ping old patches like this on the mailing list? Yes it is the responsibility of the developer and the maintainer to get reviews. Please check in the MAINTAINERS file who to contact. > Two of the patches (this one included) I have outstanding are build fixes > for use in our build environment, so it would be nice to them upstreamed. Waiting for integration of your patches, maybe you have some free time to help other developers by making reviews ;) Thanks
[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds
On 2015-10-13 14:26, Olivier MATZ wrote: > Sorry for not having answered before. Thanks for looking at it now though! >> This is one of three outstanding DPDK patches I have which hasn't seen >> any activitiy in a while. Is there a list of pending applies somewhere >> to monitor activity? > > There is the patchwork tool: > http://dpdk.org/dev/patchwork/project/dpdk/list/ Thanks! I knew about it, but forgot to look there. It would be nice to have tags to signify e.g., for-v2.2, candidate-v2.2 etc. like you can have on github to easier see where patches are going, but perhaps patchwork doesn't work that way. Is the process to ping old patches like this on the mailing list? Two of the patches (this one included) I have outstanding are build fixes for use in our build environment, so it would be nice to them upstreamed. // Simon
[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds
Hi Simon, Sorry for not having answered before. On 10/13/2015 02:10 PM, Simon Kagstrom wrote: > Ping? > > This is one of three outstanding DPDK patches I have which hasn't seen > any activitiy in a while. Is there a list of pending applies somewhere > to monitor activity? There is the patchwork tool: http://dpdk.org/dev/patchwork/project/dpdk/list/ >> Otherwise building with KERNELCC="ccache gcc" will fail: >> >> == Build lib/librte_eal/linuxapp/igb_uio >> /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:98: stack >> protector enabled but no compiler support >> /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:113: >> CONFIG_X86_X32 enabled but no binutils support >> ccache: invalid option -- 'p' >> Usage: >> ccache [options] >> ccache compiler [compiler options] >> compiler [compiler options] (via symbolic link) >> >> Options: >> -c, --cleanup delete old files and recalculate size counters >>(normally not needed as this is done >> automatically) >> -C, --clear clear the cache completely >> -F, --max-files=N set maximum number of files in cache to N (use 0 >> for >>no limit) >> -M, --max-size=SIZE set maximum size of cache to SIZE (use 0 for no >>limit; available suffixes: G, M and K; default >>suffix: G) >> -s, --show-stats show statistics summary >> -z, --zero-stats zero statistics counters >> >> -h, --helpprint this help text >> -V, --version print version and copyright information >> >> Signed-off-by: Simon Kagstrom Acked-by: Olivier Matz
[dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)
On Tue, Oct 13, 2015 at 11:13 AM, David Marchand wrote: > Hello Christoph, > > On Tue, Oct 13, 2015 at 11:10 AM, Christoph Gysin < > christoph.gysin at gmail.com> wrote: > >> >> Is there anything I can do to help getting this merged? >> > > This is ok for me, cc-ing Thomas. > Thought I already did, but just in case, Acked-by: David Marchand -- David Marchand
[dpdk-dev] [PATCH] mk: Quote $(KERNELCC) to allow ccache builds
Ping? This is one of three outstanding DPDK patches I have which hasn't seen any activitiy in a while. Is there a list of pending applies somewhere to monitor activity? // Simon On Thu, 24 Sep 2015 09:43:28 +0200 Simon Kagstrom wrote: > Otherwise building with KERNELCC="ccache gcc" will fail: > > == Build lib/librte_eal/linuxapp/igb_uio > /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:98: stack > protector enabled but no compiler support > /usr/src/linux-headers-3.13.0-63-generic/arch/x86/Makefile:113: > CONFIG_X86_X32 enabled but no binutils support > ccache: invalid option -- 'p' > Usage: > ccache [options] > ccache compiler [compiler options] > compiler [compiler options] (via symbolic link) > > Options: > -c, --cleanup delete old files and recalculate size counters >(normally not needed as this is done automatically) > -C, --clear clear the cache completely > -F, --max-files=N set maximum number of files in cache to N (use 0 > for >no limit) > -M, --max-size=SIZE set maximum size of cache to SIZE (use 0 for no >limit; available suffixes: G, M and K; default >suffix: G) > -s, --show-stats show statistics summary > -z, --zero-stats zero statistics counters > > -h, --helpprint this help text > -V, --version print version and copyright information > > Signed-off-by: Simon Kagstrom > --- > mk/rte.module.mk | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mk/rte.module.mk b/mk/rte.module.mk > index 7bf77c1..53ed4fe 100644 > --- a/mk/rte.module.mk > +++ b/mk/rte.module.mk > @@ -78,7 +78,7 @@ build: _postbuild > $(MODULE).ko: $(SRCS_LINKS) > @if [ ! -f $(notdir Makefile) ]; then ln -nfs $(SRCDIR)/Makefile . ; fi > @$(MAKE) -C $(RTE_KERNELDIR) M=$(CURDIR) O=$(RTE_KERNELDIR) \ > - CC=$(KERNELCC) CROSS_COMPILE=$(CROSS) V=$(if $V,1,0) > + CC="$(KERNELCC)" CROSS_COMPILE=$(CROSS) V=$(if $V,1,0) > > # install module in $(RTE_OUTPUT)/kmod > $(RTE_OUTPUT)/kmod/$(MODULE).ko: $(MODULE).ko
[dpdk-dev] Question about unsupported transceivers
On 10/13/2015 11:57 AM, Alex Forster wrote: > I believe I've discovered my problem: > https://gist.github.com/AlexForster/0fb4699bcdf196cf5462 > > As mentioned previously, I have two X520-Q1 cards installed. It appears that > initialization of the first card obeys allow_unsupported_sfp=1, but > initialization of the second card does not. > > Is this a bug, or is there a way to work around this that I'm not aware of? > > Alex Forster If you are using Intel's out-of-tree ixgbe driver I believe the module parameters are comma separated with one index per port. So if you have two ports you should be passing "allow_unsupported_sfp=1,1", and for 4 you would need four '1's. - Alex
[dpdk-dev] IXGBE RX packet loss with 5+ cores
On 10/13/2015 07:47 AM, Sanford, Robert wrote: [Robert:] 1. The 82599 device supports up to 128 queues. Why do we see trouble with as few as 5 queues? What could limit the system (and one port controlled by 5+ cores) from receiving at line-rate without loss? 2. As far as we can tell, the RX path only touches the device registers when it updates a Receive Descriptor Tail register (RDT[n]), roughly every rx_free_thresh packets. Is there a big difference between one core doing this and N cores doing it 1/N as often? >>> [Stephen:] >>> As you add cores, there is more traffic on the PCI bus from each core >>> polling. There is a fix number of PCI bus transactions per second >>> possible. >>> Each core is increasing the number of useless (empty) transactions. >> [Bruce:] >> The polling for packets by the core should not be using PCI bandwidth >> directly, >> as the ixgbe driver (and other drivers) check for the DD bit being set on >> the >> descriptor in memory/cache. > I was preparing to reply with the same point. > >>> [Stephen:] Why do you think adding more cores will help? > We're using run-to-completion and sometimes spend too many cycles per pkt. > We realize that we need to move to io+workers model, but wanted a better > understanding of the dynamics involved here. > > > >> [Bruce:] However, using an increased number of queues can >> use PCI bandwidth in other ways, for instance, with more queues you >> reduce the >> amount of descriptor coalescing that can be done by the NICs, so that >> instead of >> having a single transaction of 4 descriptors to one queue, the NIC may >> instead >> have to do 4 transactions each writing 1 descriptor to 4 different >> queues. This >> is possibly why sending all traffic to a single queue works ok - the >> polling on >> the other queues is still being done, but has little effect. > Brilliant! This idea did not occur to me. You can actually make the throughput regression disappear by altering the traffic pattern you are testing with. In the past I have found that sending traffic in bursts where 4 frames belong to the same queue before moving to the next one essentially eliminated the dropped packets due to PCIe bandwidth limitations. The trick is you need to have the Rx descriptor processing work in batches so that you can get multiple descriptors processed for each PCIe read/write. - Alex
[dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)
Hi David Is there anything I can do to help getting this merged? Thanks, Chris On Mon, Oct 5, 2015 at 12:44 PM, Dumitrescu, Cristian wrote: > > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Christoph Gysin >> Sent: Tuesday, September 29, 2015 7:53 AM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual) >> >> 'virtual' is a keyword and can't be used if the code is to compile with >> C++ compilers. >> >> If rte_devargs.h was included in C++ code, compilation with clang++ >> failed with an error. g++ did not fail, but only because of a bug >> that treats it as an anonymous struct with a decl-specifier which it >> ignores. >> >> This simply renames the member to 'virt'. >> >> Signed-off-by: Christoph Gysin >> --- > > Acked-by: Cristian Dumitrescu > > Christoph, please also copy David Marchand for further updates of this patch, > as David is the maintainer of this component. Whenever in doubt about the > maintainer, you can check file MAINTAINERS from DPDK root folder. > > Regards, > Cristian > -- echo mailto: NOSPAM !#$.'<*>'|sed 's. ..'|tr "<*> !#:2" org at fr33z3
[dpdk-dev] dpdk 2.1.0: 40gig ports link is down
Updating the firmware using the tool did solve the problem! Now link is up. Thanks a lot for your time and help Stephen! :) -Original Message- From: Stephen Hemminger [mailto:step...@networkplumber.org] Sent: Monday, October 12, 2015 8:14 PM To: Shaham Fridenberg Cc: dev at dpdk.org Subject: Re: [dpdk-dev] dpdk 2.1.0: 40gig ports link is down On Mon, 12 Oct 2015 17:04:01 + Shaham Fridenberg wrote: > Hey Stephen, > > Thanks for your help. > > I tried updating i40e driver to the latest version (from 1.0.11-k to > 1.3.39.1) but it didn't help. > > By 'Compile i40e with DEBUG flag' you mean adding "CONFIG_RTE_LOG_LEVEL=8" to > defconfig_x86_64-wsm-linuxapp-gcc (assuming I'm compiling for westmere)? > > Also, is there any log file generated in that case? > > Thanks, > Shaham I was thinking of firmware update tool, which you need to run on Linux (w/o DPDK) https://downloadcenter.intel.com/download/24769/NVM-Update-Utility-for-Intel-Ethernet-Converged-Network-Adapter-XL710-X710-Series In current DPDK, the config stuff is in common_linuxapp
[dpdk-dev] DPDK hash function related question
> -Original Message- > From: Yeddula, Avinash [mailto:ayeddula at ciena.com] > Sent: Monday, October 12, 2015 6:03 PM > To: Dumitrescu, Cristian; dev at dpdk.org; Bly, Mike > Subject: RE: DPDK hash function related question > > Hi Cristian, > I have configured the hash function and it compile fine with "warnings". Since > librte_hash vs librte_table is 32bit vs 64bit. > > librte_hash library : > /** Type of function that can be used for calculating the hash value. */ > typedef uint32_t (*rte_hash_function)(const void *key, uint32_t key_len, > uint32_t init_val); > > librte_table library: > typedef uint64_t (*rte_table_hash_op_hash) (void *key,uint32_t > key_size, uint64_t seed); > > I could use one of these hash functions. This is one option, but our first > priority is to use crc hash or cukoo hash. > https://github.com/scylladb/dpdk/blob/master/examples/ip_pipeline/pipeli > ne/hash_func.h > > We do not want to have those warning in our code. What do you suggest ? Would function pointer conversion work? > > Thanks > -Avinash > > -Original Message- > From: Dumitrescu, Cristian [mailto:cristian.dumitrescu at intel.com] > Sent: Tuesday, September 22, 2015 3:05 AM > To: Yeddula, Avinash; dev at dpdk.org; Bly, Mike > Subject: RE: DPDK hash function related question > > Hi Avinash, > > Yes, the hash function is configurable. > > Are you using a DPDK release older than 2.1? In DPDK we moved away from > test_hash to CRC-based hashes. Please take a look at DPDK release 2.1 > examples/ip_pipeline application: in pipeline_flow_classification_be.c, we > use CRC-based hash functions defined in file hash_func.h from the same > folder. > > Regards, > Cristian > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash > > Sent: Tuesday, September 22, 2015 1:34 AM > > To: dev at dpdk.org; Bly, Mike > > Subject: [dpdk-dev] DPDK hash function related question > > > > Hello All, > > > > I'm DPDK extensible bucket hash in the rte_table library of packet > > framework. My question is related to the actual hash function that > > computes the hash signature. > > > > All the available examples have initialized it to test_hash. I do not see > > any > > hash function available in rte_table library , that computes the > > actual signature > > > > > > > > struct rte_table_hash_ext_params hash_table_params = { > > > > .key_size = TABLE_ENTRY_KEY_SIZE, > > > > .n_keys = TABLE_MAX_SIZE, > > > > .n_buckets = TABLE_MAX_BUCKET_COUNT, > > > > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT, > > > > .f_hash = test_hash, > > > > .seed = 0, > > > > .signature_offset = 0; > > > > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key), > > > > }; > > > > > > > > So, I wanted to use hash functions from DPDK rte_hash library. This is > > what I'm doing and looking at the code this looks ok to me. > > > > I'm at least a week or 2 away from testing this part of the code. I > > wanted to confirm that, there is no fundamental flaw in using the DPDK > > rte_hash library and rte_table library like this. Could someone confirm this > please ? > > > > > > > > #define DEFAULT_HASH_FUNC rte_hash_crc > > > > > > > > struct rte_table_hash_ext_params hash_table_params = { > > > > .key_size = TABLE_ENTRY_KEY_SIZE, > > > > .n_keys = TABLE_MAX_SIZE, > > > > .n_buckets = TABLE_MAX_BUCKET_COUNT, > > > > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT, > > > > .f_hash = DEFAULT_HASH_FUNC , > > > > .seed = 0, > > > > .signature_offset = 0; > > > > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key), > > > > }; > > > > > > > > Thanks > > > > -Avinash > > >
[dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall happened in following sequence start show port info 0
Hi Pablo, The issue is related to certain NIC(s). I observed this on Intel 82577LM(em). Basically show port info will read PHY registers to get link status when lsc interrupt was disabled, which caused TX to stop. I don't have other NICs so not sure it is a common issue or not. Regards, Jiuling On Tue, Oct 13, 2015 at 5:07 AM, De Lara Guarch, Pablo < pablo.de.lara.guarch at intel.com> wrote: > Hi Jiuling, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jiuling Bie > > Sent: Wednesday, October 07, 2015 5:54 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall > happened > > in following sequence start show port info 0 > > > > --- > > app/test-pmd/testpmd.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > > index 386bf84..45adefa 100644 > > --- a/app/test-pmd/testpmd.c > > +++ b/app/test-pmd/testpmd.c > > @@ -1779,6 +1779,7 @@ init_port_config(void) > > port = &ports[pid]; > > port->dev_conf.rxmode = rx_mode; > > port->dev_conf.fdir_conf = fdir_conf; > > + port->dev_conf.intr_conf.lsc = 1; > > if (nb_rxq > 1) { > > port->dev_conf.rx_adv_conf.rss_conf.rss_key = > > NULL; > > port->dev_conf.rx_adv_conf.rss_conf.rss_hf = > > rss_hf; > > -- > > 1.9.1 > > Several things about your patch: > - It looks like this is your first patch (plus the other one you sent a > few minutes later): take a look at http://dpdk.org/dev > - You forgot to sign off your patches (use --signoff with git commit) > - The title of this patch is too long, shorten it and include more > information in the body of the commit message. > - I don't know what this patch is trying to solve exactly. It looks like > you are saying that there is a bug > that makes TX stop when you run the following commands: >testpmd> start >testpmd> show port info 0 > > I don't see such bug, could you explain better the steps to reproduce the > issue? > > Thanks, > Pablo > >
[dpdk-dev] [PATCH] eal: fix C++ build (struct member: virtual)
Hello Christoph, On Tue, Oct 13, 2015 at 11:10 AM, Christoph Gysin wrote: > > Is there anything I can do to help getting this merged? > This is ok for me, cc-ing Thomas. -- David Marchand
[dpdk-dev] [PATCH] librte_eal: Fix wrong header file for old gcc version
On Mon, Aug 24, 2015 at 05:22:57PM +0800, Michael Qiu wrote: > For __SSE3__, the corresponding header file should be pmmintrin.h, > tmmintrin.h works for __SSSE3__. > > Signed-off-by: Michael Qiu Since this is a bug-fix, it should probably have a fixes line in the commit. Otherwise the change looks ok. Acked-by: Bruce Richardson
[dpdk-dev] [PATCH v3 5/5] doc: modify release notes and deprecation notice for table and pipeline
The LIBABIVER number is incremented for table and pipeline libraries. The release notes is updated and the deprecation announce is removed. Signed-off-by: Maciej Gajdzica Acked-by: Cristian Dumitrescu --- doc/guides/rel_notes/deprecation.rst | 3 --- doc/guides/rel_notes/release_2_2.rst | 6 -- lib/librte_pipeline/Makefile | 2 +- lib/librte_pipeline/rte_pipeline_version.map | 8 lib/librte_table/Makefile| 2 +- 5 files changed, 14 insertions(+), 7 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index fa55117..2bf2df4 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -53,9 +53,6 @@ Deprecation Notices * librte_table LPM: A new parameter to hold the table name will be added to the LPM table parameter structure. -* librte_table: New functions for table entry bulk add/delete will be added - to the table operations structure. - * librte_table hash: Key mask parameter will be added to the hash table parameter structure for 8-byte key and 16-byte key extendible bucket and LRU tables. diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5687676..b46d2ae 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -98,6 +98,8 @@ ABI Changes * The LPM structure is changed. The deprecated field mem_location is removed. +* Added functions add/delete bulk to table and pipeline libraries. + Shared Library Versions --- @@ -122,7 +124,7 @@ The libraries prepended with a plus sign were incremented in this version. + librte_mbuf.so.2 librte_mempool.so.1 librte_meter.so.1 - librte_pipeline.so.1 + + librte_pipeline.so.2 librte_pmd_bond.so.1 + librte_pmd_ring.so.2 librte_port.so.1 @@ -130,6 +132,6 @@ The libraries prepended with a plus sign were incremented in this version. librte_reorder.so.1 librte_ring.so.1 librte_sched.so.1 - librte_table.so.1 + + librte_table.so.2 librte_timer.so.1 librte_vhost.so.1 diff --git a/lib/librte_pipeline/Makefile b/lib/librte_pipeline/Makefile index 15e406b..1166d3c 100644 --- a/lib/librte_pipeline/Makefile +++ b/lib/librte_pipeline/Makefile @@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS) EXPORT_MAP := rte_pipeline_version.map -LIBABIVER := 1 +LIBABIVER := 2 # # all source are stored in SRCS-y diff --git a/lib/librte_pipeline/rte_pipeline_version.map b/lib/librte_pipeline/rte_pipeline_version.map index 8f25d0f..4cc86f6 100644 --- a/lib/librte_pipeline/rte_pipeline_version.map +++ b/lib/librte_pipeline/rte_pipeline_version.map @@ -29,3 +29,11 @@ DPDK_2.1 { rte_pipeline_table_stats_read; } DPDK_2.0; + +DPDK_2.2 { + global: + + rte_pipeline_table_entry_add_bulk; + rte_pipeline_table_entry_delete_bulk; + +} DPDK_2.1; diff --git a/lib/librte_table/Makefile b/lib/librte_table/Makefile index c5b3eaf..7f02af3 100644 --- a/lib/librte_table/Makefile +++ b/lib/librte_table/Makefile @@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS) EXPORT_MAP := rte_table_version.map -LIBABIVER := 1 +LIBABIVER := 2 # # all source are stored in SRCS-y -- 1.9.1 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH v3 4/5] ip_pipline: added cli commands for bulk add/delete to firewall pipeline
Added two new cli commands to firewall pipeline. Commands bulk add and bulk delete takes as argument a file with rules to add/delete. The file is parsed, and then rules are passed to backend functions which add/delete records from pipeline tables. Signed-off-by: Maciej Gajdzica Acked-by: Cristian Dumitrescu --- examples/ip_pipeline/pipeline/pipeline_firewall.c | 858 + examples/ip_pipeline/pipeline/pipeline_firewall.h | 14 + .../ip_pipeline/pipeline/pipeline_firewall_be.c| 157 .../ip_pipeline/pipeline/pipeline_firewall_be.h| 38 + 4 files changed, 1067 insertions(+) diff --git a/examples/ip_pipeline/pipeline/pipeline_firewall.c b/examples/ip_pipeline/pipeline/pipeline_firewall.c index f6924ab..4137923 100644 --- a/examples/ip_pipeline/pipeline/pipeline_firewall.c +++ b/examples/ip_pipeline/pipeline/pipeline_firewall.c @@ -51,6 +51,8 @@ #include "pipeline_common_fe.h" #include "pipeline_firewall.h" +#define BUF_SIZE 1024 + struct app_pipeline_firewall_rule { struct pipeline_firewall_key key; int32_t priority; @@ -73,6 +75,18 @@ struct app_pipeline_firewall { void *default_rule_entry_ptr; }; +struct app_pipeline_add_bulk_params { + struct pipeline_firewall_key *keys; + uint32_t n_keys; + uint32_t *priorities; + uint32_t *port_ids; +}; + +struct app_pipeline_del_bulk_params { + struct pipeline_firewall_key *keys; + uint32_t n_keys; +}; + static void print_firewall_ipv4_rule(struct app_pipeline_firewall_rule *rule) { @@ -256,6 +270,358 @@ app_pipeline_firewall_key_check_and_normalize(struct pipeline_firewall_key *key) } } +static int +app_pipeline_add_bulk_parse_file(char *filename, + struct app_pipeline_add_bulk_params *params) +{ + FILE *f; + char file_buf[BUF_SIZE]; + uint32_t i; + int status = 0; + + f = fopen(filename, "r"); + if (f == NULL) + return -1; + + params->n_keys = 0; + while (fgets(file_buf, BUF_SIZE, f) != NULL) + params->n_keys++; + rewind(f); + + if (params->n_keys == 0) { + status = -1; + goto end; + } + + params->keys = rte_malloc(NULL, + params->n_keys * sizeof(struct pipeline_firewall_key), + RTE_CACHE_LINE_SIZE); + if (params->keys == NULL) { + status = -1; + goto end; + } + + params->priorities = rte_malloc(NULL, + params->n_keys * sizeof(uint32_t), + RTE_CACHE_LINE_SIZE); + if (params->priorities == NULL) { + status = -1; + goto end; + } + + params->port_ids = rte_malloc(NULL, + params->n_keys * sizeof(uint32_t), + RTE_CACHE_LINE_SIZE); + if (params->port_ids == NULL) { + status = -1; + goto end; + } + + i = 0; + while (fgets(file_buf, BUF_SIZE, f) != NULL) { + char *str; + + str = strtok(file_buf, " "); + if (str == NULL) { + status = -1; + goto end; + } + params->priorities[i] = atoi(str); + + str = strtok(NULL, " ."); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.src_ip = atoi(str)<<24; + + str = strtok(NULL, " ."); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.src_ip |= atoi(str)<<16; + + str = strtok(NULL, " ."); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.src_ip |= atoi(str)<<8; + + str = strtok(NULL, " ."); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.src_ip |= atoi(str); + + str = strtok(NULL, " "); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.src_ip_mask = atoi(str); + + str = strtok(NULL, " ."); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.dst_ip = atoi(str)<<24; + + str = strtok(NULL, " ."); + if (str == NULL) { + status = -1; + goto end; + } + params->keys[i].key.ipv4_5tuple.dst_ip |= ato
[dpdk-dev] [PATCH v3 3/5] test_table: added check for bulk add/delete to acl table unit test
Added to acl table unit test check for bulk add and bulk delete. Signed-off-by: Maciej Gajdzica Acked-by: Cristian Dumitrescu --- app/test/test_table_acl.c | 166 ++ 1 file changed, 166 insertions(+) diff --git a/app/test/test_table_acl.c b/app/test/test_table_acl.c index e4e9b9c..fe8e545 100644 --- a/app/test/test_table_acl.c +++ b/app/test/test_table_acl.c @@ -253,6 +253,94 @@ parse_cb_ipv4_rule(char *str, struct rte_table_acl_rule_add_params *v) return 0; } +static int +parse_cb_ipv4_rule_del(char *str, struct rte_table_acl_rule_delete_params *v) +{ + int i, rc; + char *s, *sp, *in[CB_FLD_NUM]; + static const char *dlm = " \t\n"; + + /* + ** Skip leading '@' + */ + if (strchr(str, '@') != str) + return -EINVAL; + + s = str + 1; + + /* + * Populate the 'in' array with the location of each + * field in the string we're parsing + */ + for (i = 0; i != DIM(in); i++) { + in[i] = strtok_r(s, dlm, &sp); + if (in[i] == NULL) + return -EINVAL; + s = NULL; + } + + /* Parse x.x.x.x/x */ + rc = parse_ipv4_net(in[CB_FLD_SRC_ADDR], + &v->field_value[SRC_FIELD_IPV4].value.u32, + &v->field_value[SRC_FIELD_IPV4].mask_range.u32); + if (rc != 0) { + RTE_LOG(ERR, PIPELINE, "failed to read src address/mask: %s\n", + in[CB_FLD_SRC_ADDR]); + return rc; + } + + printf("V=%u, mask=%u\n", v->field_value[SRC_FIELD_IPV4].value.u32, + v->field_value[SRC_FIELD_IPV4].mask_range.u32); + + /* Parse x.x.x.x/x */ + rc = parse_ipv4_net(in[CB_FLD_DST_ADDR], + &v->field_value[DST_FIELD_IPV4].value.u32, + &v->field_value[DST_FIELD_IPV4].mask_range.u32); + if (rc != 0) { + RTE_LOG(ERR, PIPELINE, "failed to read dest address/mask: %s\n", + in[CB_FLD_DST_ADDR]); + return rc; + } + + printf("V=%u, mask=%u\n", v->field_value[DST_FIELD_IPV4].value.u32, + v->field_value[DST_FIELD_IPV4].mask_range.u32); + /* Parse n:n */ + rc = parse_port_range(in[CB_FLD_SRC_PORT_RANGE], + &v->field_value[SRCP_FIELD_IPV4].value.u16, + &v->field_value[SRCP_FIELD_IPV4].mask_range.u16); + if (rc != 0) { + RTE_LOG(ERR, PIPELINE, "failed to read source port range: %s\n", + in[CB_FLD_SRC_PORT_RANGE]); + return rc; + } + + printf("V=%u, mask=%u\n", v->field_value[SRCP_FIELD_IPV4].value.u16, + v->field_value[SRCP_FIELD_IPV4].mask_range.u16); + /* Parse n:n */ + rc = parse_port_range(in[CB_FLD_DST_PORT_RANGE], + &v->field_value[DSTP_FIELD_IPV4].value.u16, + &v->field_value[DSTP_FIELD_IPV4].mask_range.u16); + if (rc != 0) { + RTE_LOG(ERR, PIPELINE, "failed to read dest port range: %s\n", + in[CB_FLD_DST_PORT_RANGE]); + return rc; + } + + printf("V=%u, mask=%u\n", v->field_value[DSTP_FIELD_IPV4].value.u16, + v->field_value[DSTP_FIELD_IPV4].mask_range.u16); + /* parse 0/0xnn */ + GET_CB_FIELD(in[CB_FLD_PROTO], + v->field_value[PROTO_FIELD_IPV4].value.u8, + 0, UINT8_MAX, '/'); + GET_CB_FIELD(in[CB_FLD_PROTO], + v->field_value[PROTO_FIELD_IPV4].mask_range.u8, + 0, UINT8_MAX, 0); + + printf("V=%u, mask=%u\n", + (unsigned int)v->field_value[PROTO_FIELD_IPV4].value.u8, + v->field_value[PROTO_FIELD_IPV4].mask_range.u8); + return 0; +} /* * The format for these rules DO NOT need the port ranges to be @@ -393,6 +481,84 @@ setup_acl_pipeline(void) } } + /* Add bulk entries to tables */ + for (i = 0; i < N_PORTS; i++) { + struct rte_table_acl_rule_add_params keys[5]; + struct rte_pipeline_table_entry entries[5]; + struct rte_table_acl_rule_add_params *key_array[5]; + struct rte_pipeline_table_entry *table_entries[5]; + int key_found[5]; + struct rte_pipeline_table_entry *table_entries_ptr[5]; + struct rte_pipeline_table_entry entries_ptr[5]; + + parser = parse_cb_ipv4_rule; + for (n = 0; n < 5; n++) { + memset(&keys[n], 0, sizeof(struct rte_table_acl_rule_add_params)); + key_array[n] = &keys[n]; + + snprintf(line, sizeof(line), "%s", lines[n]); + printf("PARSING [%s]\n", line); + + ret = parser(line, &keys[n]); + if (ret != 0) { + RTE_LOG(ERR, PIPELINE, +
[dpdk-dev] [PATCH v3 2/5] pipeline: added bulk add/delete functions for table
Added functions for adding/deleting multiple records to table owned by pipeline. Signed-off-by: Maciej Gajdzica Signed-off-by: Marcin Kerlin Acked-by: Cristian Dumitrescu --- lib/librte_pipeline/rte_pipeline.c | 106 + lib/librte_pipeline/rte_pipeline.h | 64 ++ 2 files changed, 170 insertions(+) diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c index bd700d2..56022f4 100644 --- a/lib/librte_pipeline/rte_pipeline.c +++ b/lib/librte_pipeline/rte_pipeline.c @@ -587,6 +587,112 @@ rte_pipeline_table_entry_delete(struct rte_pipeline *p, return (table->ops.f_delete)(table->h_table, key, key_found, entry); } +int rte_pipeline_table_entry_add_bulk(struct rte_pipeline *p, + uint32_t table_id, + void **keys, + struct rte_pipeline_table_entry **entries, + uint32_t n_keys, + int *key_found, + struct rte_pipeline_table_entry **entries_ptr) +{ + struct rte_table *table; + uint32_t i; + + /* Check input arguments */ + if (p == NULL) { + RTE_LOG(ERR, PIPELINE, "%s: pipeline parameter is NULL\n", + __func__); + return -EINVAL; + } + + if (keys == NULL) { + RTE_LOG(ERR, PIPELINE, "%s: keys parameter is NULL\n", __func__); + return -EINVAL; + } + + if (entries == NULL) { + RTE_LOG(ERR, PIPELINE, "%s: entries parameter is NULL\n", + __func__); + return -EINVAL; + } + + if (table_id >= p->num_tables) { + RTE_LOG(ERR, PIPELINE, + "%s: table_id %d out of range\n", __func__, table_id); + return -EINVAL; + } + + table = &p->tables[table_id]; + + if (table->ops.f_add_bulk == NULL) { + RTE_LOG(ERR, PIPELINE, "%s: f_add_bulk function pointer NULL\n", + __func__); + return -EINVAL; + } + + for (i = 0; i < n_keys; i++) { + if ((entries[i]->action == RTE_PIPELINE_ACTION_TABLE) && + table->table_next_id_valid && + (entries[i]->table_id != table->table_next_id)) { + RTE_LOG(ERR, PIPELINE, + "%s: Tree-like topologies not allowed\n", __func__); + return -EINVAL; + } + } + + /* Add entry */ + for (i = 0; i < n_keys; i++) { + if ((entries[i]->action == RTE_PIPELINE_ACTION_TABLE) && + (table->table_next_id_valid == 0)) { + table->table_next_id = entries[i]->table_id; + table->table_next_id_valid = 1; + } + } + + return (table->ops.f_add_bulk)(table->h_table, keys, (void **) entries, + n_keys, key_found, (void **) entries_ptr); +} + +int rte_pipeline_table_entry_delete_bulk(struct rte_pipeline *p, + uint32_t table_id, + void **keys, + uint32_t n_keys, + int *key_found, + struct rte_pipeline_table_entry **entries) +{ + struct rte_table *table; + + /* Check input arguments */ + if (p == NULL) { + RTE_LOG(ERR, PIPELINE, "%s: pipeline parameter NULL\n", + __func__); + return -EINVAL; + } + + if (keys == NULL) { + RTE_LOG(ERR, PIPELINE, "%s: key parameter is NULL\n", + __func__); + return -EINVAL; + } + + if (table_id >= p->num_tables) { + RTE_LOG(ERR, PIPELINE, + "%s: table_id %d out of range\n", __func__, table_id); + return -EINVAL; + } + + table = &p->tables[table_id]; + + if (table->ops.f_delete_bulk == NULL) { + RTE_LOG(ERR, PIPELINE, + "%s: f_delete function pointer NULL\n", __func__); + return -EINVAL; + } + + return (table->ops.f_delete_bulk)(table->h_table, keys, n_keys, key_found, + (void **) entries); +} + /* * Port * diff --git a/lib/librte_pipeline/rte_pipeline.h b/lib/librte_pipeline/rte_pipeline.h index 59e0710..5459324 100644 --- a/lib/librte_pipeline/rte_pipeline.h +++ b/lib/librte_pipeline/rte_pipeline.h @@ -466,6 +466,70 @@ int rte_pipeline_table_entry_delete(struct rte_pipeline *p, struct rte_pipeline_table_entry *entry); /** + * Pipeline table entry add bulk + * + * @param p + * Handle to pipeline instance + * @param table_id + * Table ID (returned by previous invocation of pipeline table create) + * @param keys + * Array containing table entry keys + * @param entries + * Array containung new contents for every table entry identified by key + * @param n_keys + * Number of keys to add + * @param key_found + * On successful invocation, key_found for
[dpdk-dev] [PATCH v3 1/5] table: added bulk add/delete functions for table
New functions prototypes for bulk add/delete added to table API. New functions allows adding/deleting multiple records with single function call. For now those functions are implemented only for ACL table. For other tables these function pointers are set to NULL. Signed-off-by: Maciej Gajdzica Acked-by: Cristian Dumitrescu --- lib/librte_table/rte_table.h| 85 - lib/librte_table/rte_table_acl.c| 309 lib/librte_table/rte_table_array.c | 2 + lib/librte_table/rte_table_hash_ext.c | 4 + lib/librte_table/rte_table_hash_key16.c | 4 + lib/librte_table/rte_table_hash_key32.c | 4 + lib/librte_table/rte_table_hash_key8.c | 8 + lib/librte_table/rte_table_hash_lru.c | 4 + lib/librte_table/rte_table_lpm.c| 2 + lib/librte_table/rte_table_lpm_ipv6.c | 2 + lib/librte_table/rte_table_stub.c | 2 + 11 files changed, 420 insertions(+), 6 deletions(-) diff --git a/lib/librte_table/rte_table.h b/lib/librte_table/rte_table.h index c13d40d..720514e 100644 --- a/lib/librte_table/rte_table.h +++ b/lib/librte_table/rte_table.h @@ -154,6 +154,77 @@ typedef int (*rte_table_op_entry_delete)( void *entry); /** + * Lookup table entry add bulk + * + * @param table + * Handle to lookup table instance + * @param key + * Array containing lookup keys + * @param entries + * Array containing data to be associated with each key. Every item in the + * array has to point to a valid memory buffer where the first entry_size + * bytes (table create parameter) are populated with the data. + * @param n_keys + * Number of keys to add + * @param key_found + * After successful invocation, key_found for every item in the array is set + * to a value different than 0 if the current key is already present in the + * table and to 0 if not. This pointer has to be set to a valid memory + * location before the table entry add function is called. + * @param entries_ptr + * After successful invocation, array *entries_ptr stores the handle to the + * table entry containing the data associated with every key. This handle can + * be used to perform further read-write accesses to this entry. This handle + * is valid until the key is deleted from the table or the same key is + * re-added to the table, typically to associate it with different data. This + * pointer has to be set to a valid memory location before the function is + * called. + * @return + * 0 on success, error code otherwise + */ +typedef int (*rte_table_op_entry_add_bulk)( + void *table, + void **keys, + void **entries, + uint32_t n_keys, + int *key_found, + void **entries_ptr); + +/** + * Lookup table entry delete bulk + * + * @param table + * Handle to lookup table instance + * @param key + * Array containing lookup keys + * @param n_keys + * Number of keys to delete + * @param key_found + * After successful invocation, key_found for every item in the array is set + * to a value different than 0if the current key was present in the table + * before the delete operation was performed and to 0 if not. This pointer + * has to be set to a valid memory location before the table entry delete + * function is called. + * @param entries + * If entries pointer is NULL, this pointer is ignored for every entry found. + * Else, after successful invocation, if specific key is found in the table + * (key_found is different than 0 for this item after function call is + * completed) and item of entry array points to a valid buffer (entry is set + * to a value different than NULL before the function is called), then the + * first entry_size bytes (table create parameter) in *entry store a copy of + * table entry that contained the data associated with the current key before + * the key was deleted. + * @return + * 0 on success, error code otherwise + */ +typedef int (*rte_table_op_entry_delete_bulk)( + void *table, + void **keys, + uint32_t n_keys, + int *key_found, + void **entries); + +/** * Lookup table lookup * * @param table @@ -213,12 +284,14 @@ typedef int (*rte_table_op_stats_read)( /** Lookup table interface defining the lookup table operation */ struct rte_table_ops { - rte_table_op_create f_create; /**< Create */ - rte_table_op_free f_free; /**< Free */ - rte_table_op_entry_add f_add; /**< Entry add */ - rte_table_op_entry_delete f_delete; /**< Entry delete */ - rte_table_op_lookup f_lookup; /**< Lookup */ - rte_table_op_stats_read f_stats;/**< Stats */ + rte_table_op_create f_create; /**< Create */ + rte_table_op_free f_free; /**< Free */ + rte_table_op_entry_add f_add; /**< Entry add */ + rte_table_op_entry_delete f_delete; /**< Entry delete */ + rte_table_op_entry_add_bulk f_
[dpdk-dev] [PATCH v3 0/5] pipeline: add bulk add/delete functions for table
This patch adds bulk add/delete functions for tables used by pipelines. It allows for adding/deleting many rules to pipeline tables in one function call. It is particulary useful for firewall pipeline which is using ACL table. After every add or delete, table is rebuild which leads to very long times when trying to add/delete many entries. v2: * Incremented the LIBABIVER number * Updated release notes * Removed deprecation announce v3: * Updated a Doxygen comment Acked-by: Cristian Dumitrescu Maciej Gajdzica (5): table: added bulk add/delete functions for table pipeline: added bulk add/delete functions for table test_table: added check for bulk add/delete to acl table unit test ip_pipline: added cli commands for bulk add/delete to firewall pipeline doc: modify release notes and deprecation notice for table and pipeline app/test/test_table_acl.c | 166 doc/guides/rel_notes/deprecation.rst | 3 - doc/guides/rel_notes/release_2_2.rst | 6 +- examples/ip_pipeline/pipeline/pipeline_firewall.c | 858 + examples/ip_pipeline/pipeline/pipeline_firewall.h | 14 + .../ip_pipeline/pipeline/pipeline_firewall_be.c| 157 .../ip_pipeline/pipeline/pipeline_firewall_be.h| 38 + lib/librte_pipeline/Makefile | 2 +- lib/librte_pipeline/rte_pipeline.c | 106 +++ lib/librte_pipeline/rte_pipeline.h | 64 ++ lib/librte_pipeline/rte_pipeline_version.map | 8 + lib/librte_table/Makefile | 2 +- lib/librte_table/rte_table.h | 85 +- lib/librte_table/rte_table_acl.c | 309 lib/librte_table/rte_table_array.c | 2 + lib/librte_table/rte_table_hash_ext.c | 4 + lib/librte_table/rte_table_hash_key16.c| 4 + lib/librte_table/rte_table_hash_key32.c| 4 + lib/librte_table/rte_table_hash_key8.c | 8 + lib/librte_table/rte_table_hash_lru.c | 4 + lib/librte_table/rte_table_lpm.c | 2 + lib/librte_table/rte_table_lpm_ipv6.c | 2 + lib/librte_table/rte_table_stub.c | 2 + 23 files changed, 1837 insertions(+), 13 deletions(-) -- 1.9.1 -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall happened in following sequence start show port info 0
Hi Jiuling, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jiuling Bie > Sent: Wednesday, October 07, 2015 5:54 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [testpmd] enable lsc to avoid TX stall, TX stall happened > in following sequence start show port info 0 > > --- > app/test-pmd/testpmd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > index 386bf84..45adefa 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -1779,6 +1779,7 @@ init_port_config(void) > port = &ports[pid]; > port->dev_conf.rxmode = rx_mode; > port->dev_conf.fdir_conf = fdir_conf; > + port->dev_conf.intr_conf.lsc = 1; > if (nb_rxq > 1) { > port->dev_conf.rx_adv_conf.rss_conf.rss_key = > NULL; > port->dev_conf.rx_adv_conf.rss_conf.rss_hf = > rss_hf; > -- > 1.9.1 Several things about your patch: - It looks like this is your first patch (plus the other one you sent a few minutes later): take a look at http://dpdk.org/dev - You forgot to sign off your patches (use --signoff with git commit) - The title of this patch is too long, shorten it and include more information in the body of the commit message. - I don't know what this patch is trying to solve exactly. It looks like you are saying that there is a bug that makes TX stop when you run the following commands: testpmd> start testpmd> show port info 0 I don't see such bug, could you explain better the steps to reproduce the issue? Thanks, Pablo
[dpdk-dev] [PATCH] ethdev: remove the imissed deprecation tag
On Wed, 30 Sep 2015 09:20:56 +0100 Maryam Tahhan wrote: > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h > index fa06554..78bd94d 100644 > --- a/lib/librte_ether/rte_ethdev.h > +++ b/lib/librte_ether/rte_ethdev.h > @@ -194,8 +194,7 @@ struct rte_eth_stats { > uint64_t opackets; /**< Total number of successfully transmitted > packets.*/ > uint64_t ibytes;/**< Total number of successfully received bytes. */ > uint64_t obytes;/**< Total number of successfully transmitted > bytes. */ > - uint64_t imissed; > - /**< Deprecated; Total of RX missed packets (e.g full FIFO). */ If you want to deprecate a structure field, it works better to mark it with __attribute__((deprecated)) that way all use of that field in code will be flagged. Comments are advisory only and often never spotted.
[dpdk-dev] [PATCH v2 1/2] fm10k: enable TSO support
On 2015/10/12 14:38, Wang Xiao W wrote: > This patch enables fm10k TSO feature for both non-tunneling packet > and tunneling packet. > > Signed-off-by: Wang Xiao W > --- Acked-by: Michael Qiu
[dpdk-dev] IXGBE RX packet loss with 5+ cores
On 10/13/2015 7:47 AM, Sanford, Robert wrote: [Robert:] 1. The 82599 device supports up to 128 queues. Why do we see trouble with as few as 5 queues? What could limit the system (and one port controlled by 5+ cores) from receiving at line-rate without loss? 2. As far as we can tell, the RX path only touches the device registers when it updates a Receive Descriptor Tail register (RDT[n]), roughly every rx_free_thresh packets. Is there a big difference between one core doing this and N cores doing it 1/N as often? >>> [Stephen:] >>> As you add cores, there is more traffic on the PCI bus from each core >>> polling. There is a fix number of PCI bus transactions per second >>> possible. >>> Each core is increasing the number of useless (empty) transactions. >> [Bruce:] >> The polling for packets by the core should not be using PCI bandwidth >> directly, >> as the ixgbe driver (and other drivers) check for the DD bit being set on >> the >> descriptor in memory/cache. > I was preparing to reply with the same point. > >>> [Stephen:] Why do you think adding more cores will help? > We're using run-to-completion and sometimes spend too many cycles per pkt. > We realize that we need to move to io+workers model, but wanted a better > understanding of the dynamics involved here. > >> [Bruce:] However, using an increased number of queues can >> use PCI bandwidth in other ways, for instance, with more queues you >> reduce the >> amount of descriptor coalescing that can be done by the NICs, so that >> instead of >> having a single transaction of 4 descriptors to one queue, the NIC may >> instead >> have to do 4 transactions each writing 1 descriptor to 4 different >> queues. This >> is possibly why sending all traffic to a single queue works ok - the >> polling on >> the other queues is still being done, but has little effect. > Brilliant! This idea did not occur to me. To add a little more detail - this ends up being both a bandwidth and a transaction bottleneck. Not only do you add an increased transaction count, you also add a huge amount of bandwidth overhead (each 16 byte descriptor is preceded by a PCI-E TLP which is about the same size). So what ends up happening in the case where the incoming packets are bifurcated to different queues (1 per queue) is that you have 2x the number of transactions (1 for the packet and one for the descriptor) and then we essentially double the bandwidth used because you now have the TLP overhead per descriptor write. There is a second issue that also pops up when coalescing breaks down - testpmd essentially in iofwd mode simply transmits the number of packets it receives (i.e. Rx (n) -> Tx (n)). This means that the transmit side also suffers from writing one descriptor at a time for output (i.e. when the NIC pulls a descriptor cache line to transmit, it finds 1 valid descriptor). When a second descriptor is transmitted on the same it will again pull and find only one valid descriptor. That is another 2x increase in transaction count as well as PCI-E TLP overhead. The third hit actually comes from the transmit side when transmitting one packet at a time. The last part of the transmit process is a MMIO write to the tail pointer. This is a costly operation (since it is a un-cacheable memory operation) in terms of cycles, not to mention again with heavy PCI-E overhead (TLP + 4 byte write) and increased transaction counts on PCI-E. Hope that explains all the touch-points as to why you see the drop off in performance you see. > > > > -- > Thanks guys, > Robert >
[dpdk-dev] Host kernel panic when running ixgbe NIC in pci passthrough
Hello, I have a system using dpdk 1.8 with 82599ES ixgbe NICs. These are provided to a virtual guest via pci passthrough. Our dpdk application on the guest takes control of the NICs using igb_uio. On certain systems, under conditions we have not yet figured out, sending traffic causes the host to kernel panic. It looks like a pci device is reporting a fatal error. >From the error, the issue looks to be either the bridge connected to the ixgbe, or the ixgbe itself; I cannot decipher the message beyond that. This has happened on three different machines, so I do not think it is bad hardware. I was wondering if anybody has run into this before, and if they have any solutions. I tried searching the mailing list, but couldn't find anything related. 3108395.524535] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 [3108395.533959] {1}[Hardware Error]: APEI generic hardware error status [3108395.541149] {1}[Hardware Error]: severity: 1, fatal [3108395.546785] {1}[Hardware Error]: section: 0, severity: 1, fatal [3108395.553586] {1}[Hardware Error]: flags: 0x01 [3108395.558543] {1}[Hardware Error]: primary [3108395.563113] {1}[Hardware Error]: section_type: PCIe error [3108395.569332] {1}[Hardware Error]: port_type: 6, downstream switch port [3108395.576715] {1}[Hardware Error]: version: 1.16 [3108395.581866] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [3108395.588763] {1}[Hardware Error]: device_id: :05:01.0 [3108395.594886] {1}[Hardware Error]: slot: 0 [3108395.599455] {1}[Hardware Error]: secondary_bus: 0x06 [3108395.605189] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724 [3108395.612572] {1}[Hardware Error]: class_code: 000406 [3108395.618208] {1}[Hardware Error]: bridge: secondary_status: 0x, control: 0x0003 [3108395.626853] {1}[Hardware Error]: section: 1, severity: 1, fatal [3108395.633653] {1}[Hardware Error]: flags: 0x01 [3108395.638611] {1}[Hardware Error]: primary [3108395.643179] {1}[Hardware Error]: section_type: PCIe error [3108395.649396] {1}[Hardware Error]: port_type: 6, downstream switch port [3108395.656778] {1}[Hardware Error]: version: 1.16 [3108395.661930] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [3108395.668829] {1}[Hardware Error]: device_id: :05:09.0 [3108395.674951] {1}[Hardware Error]: slot: 0 [3108395.679521] {1}[Hardware Error]: secondary_bus: 0x09 [3108395.685254] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724 [3108395.692636] {1}[Hardware Error]: class_code: 000406 [3108395.698272] {1}[Hardware Error]: bridge: secondary_status: 0x, control: 0x0003 [3108395.706915] Kernel panic - not syncing: Fatal hardware error! :05:01.0 is a PLX pci bridge. It has two ixgbe NICs connected to it. Likewise with :05:09.0. Here is the boot cmdline on the host (we're using iommu): BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=57d79ff0-1152-46fb-a619-b2a102de3d5f ro console=ttyS0,115200n8 vconsole.font=latarcyrheb-sun16 crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol0 vconsole.keymap=us LANG=en_US.UTF-8 intel_iommu=on Any help would be greatly appreciated. Thanks, Kyle
[dpdk-dev] DPDK hash function related question
Hi Avinash, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash > Sent: Monday, October 12, 2015 6:03 PM > To: Dumitrescu, Cristian; dev at dpdk.org; Bly, Mike > Subject: Re: [dpdk-dev] DPDK hash function related question > > Hi Cristian, > I have configured the hash function and it compile fine with "warnings". Since > librte_hash vs librte_table is 32bit vs 64bit. > > librte_hash library : > /** Type of function that can be used for calculating the hash value. */ > typedef uint32_t (*rte_hash_function)(const void *key, uint32_t key_len, > uint32_t init_val); > > librte_table library: > typedef uint64_t (*rte_table_hash_op_hash) (void *key,uint32_t > key_size, uint64_t seed); > > I could use one of these hash functions. This is one option, but our first > priority is to use crc hash or cukoo hash. Mind that cuckoo hash is not a hash function, but a method for resolving collisions in a hash table. > https://github.com/scylladb/dpdk/blob/master/examples/ip_pipeline/pipeli > ne/hash_func.h > > We do not want to have those warning in our code. What do you suggest ? > > Thanks > -Avinash > > -Original Message- > From: Dumitrescu, Cristian [mailto:cristian.dumitrescu at intel.com] > Sent: Tuesday, September 22, 2015 3:05 AM > To: Yeddula, Avinash; dev at dpdk.org; Bly, Mike > Subject: RE: DPDK hash function related question > > Hi Avinash, > > Yes, the hash function is configurable. > > Are you using a DPDK release older than 2.1? In DPDK we moved away from > test_hash to CRC-based hashes. Please take a look at DPDK release 2.1 > examples/ip_pipeline application: in pipeline_flow_classification_be.c, we > use CRC-based hash functions defined in file hash_func.h from the same > folder. > > Regards, > Cristian > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash > > Sent: Tuesday, September 22, 2015 1:34 AM > > To: dev at dpdk.org; Bly, Mike > > Subject: [dpdk-dev] DPDK hash function related question > > > > Hello All, > > > > I'm DPDK extensible bucket hash in the rte_table library of packet > > framework. My question is related to the actual hash function that > > computes the hash signature. > > > > All the available examples have initialized it to test_hash. I do not see > > any > > hash function available in rte_table library , that computes the > > actual signature > > > > > > > > struct rte_table_hash_ext_params hash_table_params = { > > > > .key_size = TABLE_ENTRY_KEY_SIZE, > > > > .n_keys = TABLE_MAX_SIZE, > > > > .n_buckets = TABLE_MAX_BUCKET_COUNT, > > > > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT, > > > > .f_hash = test_hash, > > > > .seed = 0, > > > > .signature_offset = 0; > > > > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key), > > > > }; > > > > > > > > So, I wanted to use hash functions from DPDK rte_hash library. This is > > what I'm doing and looking at the code this looks ok to me. > > > > I'm at least a week or 2 away from testing this part of the code. I > > wanted to confirm that, there is no fundamental flaw in using the DPDK > > rte_hash library and rte_table library like this. Could someone confirm this > please ? > > > > > > > > #define DEFAULT_HASH_FUNC rte_hash_crc > > > > > > > > struct rte_table_hash_ext_params hash_table_params = { > > > > .key_size = TABLE_ENTRY_KEY_SIZE, > > > > .n_keys = TABLE_MAX_SIZE, > > > > .n_buckets = TABLE_MAX_BUCKET_COUNT, > > > > .n_buckets_ext = TABLE_MAX_EXT_BUCKET_COUNT, > > > > .f_hash = DEFAULT_HASH_FUNC , > > > > .seed = 0, > > > > .signature_offset = 0; > > > > .key_offset = __builtin_offsetof(struct metadata_t, tbl_key), > > > > }; > > > > > > > > Thanks > > > > -Avinash > > >
[dpdk-dev] [PATCH] examples/vmdq: Fix the core dump issue when mem_pool is more than 34
Hi Xutao, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xutao Sun > Sent: Tuesday, October 13, 2015 8:29 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] examples/vmdq: Fix the core dump issue when > mem_pool is more than 34 > > Macro MAX_QUEUES was defined to 128, only allow 16 mem_pools in > theory. > When running vmdq_app with more than 34 mem_pools, > it will cause the core_dump issue. > Change MAX_QUEUES to 1024 will solve this issue. > > Signed-off-by: Xutao Sun > --- > examples/vmdq/main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c > index a142d49..b463cfb 100644 > --- a/examples/vmdq/main.c > +++ b/examples/vmdq/main.c > @@ -69,7 +69,7 @@ > #include > #include > > -#define MAX_QUEUES 128 > +#define MAX_QUEUES 1024 > /* > * For 10 GbE, 128 queues require roughly > * 128*512 (RX/TX_queue_nb * RX/TX_ring_descriptors_nb) per port. > -- > 1.9.3 Just for clarification, when you say mem_pools, do you mean vmdq pools? Also, if you are going to increase MAX_QUEUES, shouldn't you increase the NUM_MBUFS_PER_PORT? Looking at the comment below, looks like there is a calculation of number of mbufs based on number of queues. Plus, I assume 128 is the maximum number of queues per port, and as far as I know, only Fortville supports 256 as maximum. Thanks, Pablo
[dpdk-dev] [PATCH] testpmd: modify the mac of csum forwarding
Hi, Thomas Any comments on this patch? Is it suitable for DPDK? Thanks, Michael On 2015/8/26 14:12, Liu, Jijiang wrote: > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Qiu >> Sent: Friday, August 07, 2015 11:29 AM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH] testpmd: modify the mac of csum forwarding >> >> For some ethnet-switch like intel RRC, all the packet forwarded out by DPDK >> will be dropped in switch side, so the packet generator will never receive >> the >> packet. >> >> Signed-off-by: Michael Qiu >> --- >> app/test-pmd/csumonly.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index >> 1bf3485..bf8af1d 100644 >> --- a/app/test-pmd/csumonly.c >> +++ b/app/test-pmd/csumonly.c >> @@ -550,6 +550,10 @@ pkt_burst_checksum_forward(struct fwd_stream >> *fs) >> * and inner headers */ >> >> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); >> +ether_addr_copy(&peer_eth_addrs[fs->peer_addr], >> +ð_hdr->d_addr); >> +ether_addr_copy(&ports[fs->tx_port].eth_addr, >> +ð_hdr->s_addr); >> parse_ethernet(eth_hdr, &info); >> l3_hdr = (char *)eth_hdr + info.l2_len; >> >> -- >> 1.9.3 > The change will affect on the csum fwd performance. > But I also think the change is necessary, or we cannot use csumonly fwd mode > in guest? > > Acked-by: Jijiang Liu > >
[dpdk-dev] [PATCH] librte_eal: Fix wrong header file for old gcc version
Hi, all Any comments on this? Thanks, Michael On 2015/9/25 10:56, Qiu, Michael wrote: > On 2015/9/7 22:46, Thomas Monjalon wrote: >> 2015-08-24 17:22, Michael Qiu: >>> For __SSE3__, the corresponding header file should be pmmintrin.h, >>> tmmintrin.h works for __SSSE3__. >> Please could you better explain the difference and what is exactly the bug >> being fixed? > It should solve this issue: > > [dpdk-dev] DPDK 2.1.0 build error: inlining failed in call to always_inline > > /usr/lib/gcc/x86_64-redhat-linux/4.9.2/include/tmmintrin.h:185:1: error: > inlining failed in call to always_inline ?_mm_alignr_epi8?: t > arget specific option mismatch > _mm_alignr_epi8(__m128i __X, __m128i __Y, const int __N) > > ^ > The AMD cpu flags: > > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxe > xt fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl > nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lah > f_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch > osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_sa > > > "_mm_alignr_epi8" only works for ssse3 or upper, > but this AMD CPU does not support that. This function has been wrongly > called, because the wrong header file. > > Thanks, > Michael > > >> Thanks >> >> >
[dpdk-dev] IXGBE RX packet loss with 5+ cores
I'm hoping that someone (perhaps at Intel) can help us understand an IXGBE RX packet loss issue we're able to reproduce with testpmd. We run testpmd with various numbers of cores. We offer line-rate traffic (~14.88 Mpps) to one ethernet port, and forward all received packets via the second port. When we configure 1, 2, 3, or 4 cores (per port, with same number RX queues per port), there is no RX packet loss. When we configure 5 or more cores, we observe the following packet loss (approximate): 5 cores - 3% loss 6 cores - 7% loss 7 cores - 11% loss 8 cores - 15% loss 9 cores - 18% loss All of the "lost" packets are accounted for in the device's Rx Missed Packets Count register (RXMPC[0]). Quoting the datasheet: "Packets are missed when the receive FIFO has insufficient space to store the incoming packet. This might be caused due to insufficient buffers allocated, or because there is insufficient bandwidth on the IO bus." RXMPC, and our use of API rx_descriptor_done to verify that we don't run out of mbufs (discussed below), lead us to theorize that packet loss occurs because the device is unable to DMA all packets from its internal packet buffer (512 KB, reported by register RXPBSIZE[0]) before overrun. Questions = 1. The 82599 device supports up to 128 queues. Why do we see trouble with as few as 5 queues? What could limit the system (and one port controlled by 5+ cores) from receiving at line-rate without loss? 2. As far as we can tell, the RX path only touches the device registers when it updates a Receive Descriptor Tail register (RDT[n]), roughly every rx_free_thresh packets. Is there a big difference between one core doing this and N cores doing it 1/N as often? 3. Do CPU reads/writes from/to device registers have a higher priority than device reads/writes from/to memory? Could the former transactions (CPU <-> device) significantly impede the latter (device <-> RAM)? Thanks in advance for any help you can provide. Testpmd Command Line Here is an example of how we run testpmd: # socket 0 lcores: 0-7, 16-23 N_QUEUES=5 N_CORES=10 ./testpmd -c 0x003e013e -n 2 \ --pci-whitelist "01:00.0" --pci-whitelist "01:00.1" \ --master-lcore 8 -- \ --interactive --portmask=0x3 --numa --socket-num=0 --auto-start \ --coremask=0x003e003e \ --rxd=4096 --txd=4096 --rxfreet=512 --txfreet=512 \ --burst=128 --mbcache=256 \ --nb-cores=$N_CORES --rxq=$N_QUEUES --txq=$N_QUEUES Test machines = * We performed most testing on a system with two E5-2640 v3 (Haswell 2.6 GHz 8 cores) CPUs, 64 GB 1866 MHz RAM, TYAN S7076 mobo. * We obtained similar results on a system with two E5-2698 v3 (Haswell 2.3 GHz 16 cores) CPUs, 64 GB 2133 MHz RAM, Dell R730. * DPDK 2.1.0, Linux 2.6.32-504.23.4 Intel 10GB adapters === All ethernet adapters are 82599_SFP_SF2, vendor 8086, device 154D, svendor 8086, sdevice 7B11. Other Details and Ideas we tried * Make sure that all cores, memory, and ethernet ports in use are on the same NUMA socket. * Modify testpmd to insert CPU delays in the forwarding loop, to target some average number of RX packets that we reap per rx_pkt_burst (e.g., 75% of burst). * We configured the RSS redirection table such that all packets go to one RX queue. In this case, there was NO packet loss (with any number of RX cores), as the ethernet and core activity is very similar to using only one RX core. * When rx_pkt_burst returns a full burst, look at the subsequent RX descriptors, using a binary search of calls to rx_descriptor_done, to see whether the RX desc array is close to running out of new buffers. The answer was: No, none of the RX queues has more than 100 additional packets "done" (when testing with 5+ cores). * Increase testpmd config params, e.g., --rxd, --rxfreet, --burst, --mbcache, etc. These result in very small improvements, i.e., slight reduction of packet loss. Other Observations == * Some IXGBE RX/TX code paths do not follow (my interpretation of) the documented semantics of the rx/tx packet burst APIs. For example, invoke rx_pkt_burst with nb_pkts=64, and it returns 32, even when more RX packets are available, because the code path is optimized to handle a burst of 32. The same thing may be true in the tx_pkt_burst code path. To allow us to run testpmd with --burst greater than 32, we worked around these limitations by wrapping the calls to rx_pkt_burst and tx_pkt_burst with do-whiles that continue while rx/tx burst returns 32 and we have not yet satisfied the desired burst count. The point here is that IXGBE's rx/tx packet burst API behavior is misleading! The application developer should not need to know that certain drivers or driver paths do not always complete an entire burst, even though they could have. * We na?vely believed that if a run-to-completion model uses too many cycles per packet, we could just spread it over more cores. If there is some inherent lim