[dpdk-dev] [PATCH 06/11] kvargs: simpler parsing and allow duplicated keys
Hi Bruce, >> Also replace rte_strsplit() by more a standard function strtok_r() that is >> easier to understand for people already knowing the libc. It also avoids >> useless calls to strnlen(). The delimiters macros become strings instead of >> chars due to the strtok_r() API. >> > > As a general rule, we try to use only string functions which > track the buffer length they are working with, which is why the > function rte_strsplit() is used. While strtok_r() is indeed a > standard C function, why not use the code as originally written? Thank you for your comment. Another reason for not using rte_strsplit() is that the buffer length argument that was given was not relevant. Indeed, the string we want to parse initially comes from the command line arguments, which is duplicated with strdup(). It is not stored in a fixed size buffer. In my opinion, giving strnlen(pairs[i], MAX_ARG_STRLEN) as the size to rte_strsplit() would not prevent the program to a segfault or to read a corrupted string if it is not properly null-terminated. Does it makes sense ? Regards Olivier
[dpdk-dev] multiple VLAN IDs for SR-IOV ports
Any one know what commands to use on the KVM host to add multiple VLAN IDs to a SR-IOV port ? I could only add one VLAN ID using ip link set ethN vf NUM vlan VLANID Also do I have to call any routine on the DPDK side to add those VIDs ? James
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
> > First and easy answer: it is open source, so anyone can recompile. So, > > what's the issue? > > I'm talking from a pure distribution perspective here: Requiring to > recompile all DPDK based applications to distribute a bugfix or to add > support for a new PMD is not ideal. > > So ideally OVS would have the possibility to link against the shared > library long term. I agree that distribution of DPDK apps is not covered properly at present. Identifying the proper scheme requires a specific analysis based on the constraints of the Telecom/Cloud/Networking markets. In the telecom world, if you fix the underlying framework of an app, you will still have to validate the solution, ie app/framework. In addition, the idea of shared libraries introduces the implied requirement to validate apps against diverse versions of DPDK shared libraries. This translates into development and support costs. I also expect many DPDK applications to tackle core networking features, with sub micro second packet handling delays and even lower than 200ns (NAT64...). The lazy binding based on ELF PLT represent quite a cost, not mentioning that optimization stops are shared libraries boundaries (gcc whole program optimization can be very effective...). Microsoft DLL linkage are an order of magnitude faster. If Linux was to provide that, I would probably revise my judgment. (I haven't checked Linux dynamic linking implementation for some time so my understanding of Linux dynamic linking may be outdated). > > > I get lost: do you mean ABI + API toward the PMDs or towards the > > applications using the librte ? > > Towards the PMDs is more straight forward at first so it seems logical to > focus on that first. I don't think it is so straight forward. Many recent cards such as Chelsio and Myricom have a very different "packet memory layout" that does not fit so easily into actual DPDK architecture. 1) "traditional" architecture: the driver reserves X buffers and provide the card with descriptors of those buffers. Each packet is DMA'ed into exactly one buffer. Typically you have 2K buffers, a 64 byte packet consumes exactly one buffer 2) "alternative" new architecture: the driver reserves a memory zone, say 4MB, without any structure, and provide a a single zone description and a ring buffer to the card. (there no individual buffer descriptors any more). The card fills the memory zone with packets, one next to the other and specifies where the packets are by updating the supplied ring. Out of the many issues fitting this scheme into DPDK, you cannot free a single mbuf: you have to maintain a ref count to the memory zone so that, when all mbufs have been "released", the memory zone can be freed. That's quite a stretch from actual paradigm. Apart from this aspect, managing RSS is two tied to Intel's flow director concepts and cannot accommodate directly smarter or dumber RSS mechanisms. That said, I fully agree PMD API should be revisited. Cordially, Fran?ois-Fr?d?ric
[dpdk-dev] How to debug packet sends to virtual functions
I'm trying to get a modified version of the l2fwd example running, and have problems with packets being silently thrown away. I can receive packets, and my printf's indicates that the packets are being sent to the correct port, using correct MAC address as source address. And still, the packets are lost. Since the port is a virtual function, it seems like I cannot use tcpdump on it to see the network traffic. There is nothing coming out of the cable (activity light not flashing, the receiving end running tcpdump does not see any traffic). I'm using two X540 with two virtual functions each. The application runs in a KVM/Qemu environmen. Any suggestions how to debug this? Regards, Mats
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On 01/29/2014 05:34 PM, Vincent JARDIN wrote: > Thomas, > > First and easy answer: it is open source, so anyone can recompile. So, > what's the issue? I'm talking from a pure distribution perspective here: Requiring to recompile all DPDK based applications to distribute a bugfix or to add support for a new PMD is not ideal. So ideally OVS would have the possibility to link against the shared library long term. > I get lost: do you mean ABI + API toward the PMDs or towards the > applications using the librte ? Towards the PMDs is more straight forward at first so it seems logical to focus on that first. A stable API and ABI for librte seems required as well long term as DPDK does offer shared libraries but I realize that this is a stretch goal in the initial phase.
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
Thomas, First and easy answer: it is open source, so anyone can recompile. So, what's the issue? > Without a concept of stable interfaces, it will be difficult to > package and distribute RTE libraries, PMD, and DPDK applications. Right > now, the obvious path would include packaging the PMD bits together > with each DPDK application depending on the version of DPDK the binary > was compiled against. This is clearly not ideal. > >> I agree that some areas could be improved since they are not into the >> critical datapath of packets, but still other areas remain very CPU >> constraints. For instance: >> http://dpdk.org/browse/dpdk/commit/lib/librte_ether/rte_ethdev.h?id=c3d0564cf0f00c3c9a61cf72bd4bd1c441740637 >> >> is bad: >> struct eth_dev_ops >> is churned, no comment, and a #ifdef that changes the structure >> according to compilation! > > This is a very good example as it outlines the difference between > control structures and the fast path. We have this same exact trade off > in the kernel a lot where we have highly optimized internal APIs > towards modules and drivers but want to provide binary compatibility to > a certain extend. As long as we agree on this limited scope, we'll think about it and provide a proposal on dev at dpdk.org mailing list. > As for the specific example you mention, it is relatively trivial to > make eth_dev_ops backwards compatible by appending appropriate padding > to the struct before a new major release and ensure that new members > are added by replacing the padding accordingly. Obviously no ifdefs > would be allowed anymore. Of course, it is basic C! >> Should an application use the librte libraries of the DPDK: >>- you can use RTE_VERSION and RTE_VERSION_NUM : >> http://dpdk.org/doc/api/rte__version_8h.html#a8775053b0f721b9fa0457494cfbb7ed9 > > Right. This would be more or less identical to requiring a specific > DPDK version in OVS_CHEC_DPDK. It's not ideal to require application to > clutter their code with #ifdefs all over for every new minor release > though. > >>- you can write your own wrapper (with CPU overhead) in order to have >> a stable ABI, that wrapper should be tight to the versions of the librte >> => the overhead is part of your application instead of the DPDK, >>- *otherwise recompile your software, it is opensource, what's the >> issue?* >> >> We are opened to any suggestion to have stable ABI, but it should never >> remove the options to have fast/efficient/compilation/CPU execution >> processing. > > Absolutely agreed. We also don't want to add tons of abstraction and > overcomplicate everything. Still, I strongly believe that the definition > of stable interfaces towards applications and especially PMD is > essential. > > I'm not proposing to standardize all the APIs towards applications on > the level of POSIX. DPDK is in early stages and disruptive changes will > come along. What I would propose on an abstract level is: > > 1. Extend but not break API between minor releases. Postpone API > breakages to the next major release. High cadence of major > releases initially, lower cadence as DPDK matures. > > 2. Define ABI stability towards PMD for minor releases to allow > isolated packaging of PMD by padding control structures and keeping > functions ABI stable. I get lost: do you mean ABI + API toward the PMDs or towards the applications using the librte ? Best regards, Vincent
[dpdk-dev] [PATCH 06/11] kvargs: simpler parsing and allow duplicated keys
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, January 28, 2014 4:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 06/11] kvargs: simpler parsing and allow > duplicated keys > > Remove the rte_kvargs_add_pair() function whose only role was to check if > a key is duplicated. Having duplicated keys is now allowed by kvargs API. > > Also replace rte_strsplit() by more a standard function strtok_r() that is > easier to understand for people already knowing the libc. It also avoids > useless calls to strnlen(). The delimiters macros become strings instead of > chars due to the strtok_r() API. > As a general rule, we try to use only string functions which track the buffer length they are working with, which is why the function rte_strsplit() is used. While strtok_r() is indeed a standard C function, why not use the code as originally written?
[dpdk-dev] [PATCH 04/11] kvargs: remove useless size field
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, January 28, 2014 4:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 04/11] kvargs: remove useless size field > > This value was not very useful as the size of the table is fixed (equals > RTE_KVARGS_MAX). > > By the way, the memset in the initialization function was wrong (size too > short). Even if it was not really an issue since we rely on the "count" > field, it > is now fixed by this patch. > > Signed-off-by: Olivier Matz > --- Acked-by: Bruce Richardson
[dpdk-dev] [PATCH 03/11] kvargs: remove driver name in arguments
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, January 28, 2014 4:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 03/11] kvargs: remove driver name in > arguments > > Now that rte_kvargs is a generic library, there is no need to have an > argument for the driver name in rte_kvargs_tokenize() and > rte_kvargs_parse() prototypes. This argument was only used to log the > driver name in case of error. Instead, we can add a log in init function of > pmd_pcap and pmd_ring. > > Signed-off-by: Olivier Matz > --- > lib/librte_kvargs/rte_kvargs.c | 13 ++--- > lib/librte_kvargs/rte_kvargs.h | 4 +--- > lib/librte_pmd_pcap/rte_eth_pcap.c | 4 +++- > lib/librte_pmd_ring/rte_eth_ring.c | 2 ++ > 4 files changed, 12 insertions(+), 11 deletions(-) > > diff --git a/lib/librte_kvargs/rte_kvargs.c b/lib/librte_kvargs/rte_kvargs.c > index 7a950d6..935698c 100644 > --- a/lib/librte_kvargs/rte_kvargs.c > +++ b/lib/librte_kvargs/rte_kvargs.c > @@ -91,8 +91,7 @@ rte_kvargs_add_pair(struct rte_kvargs *kvlist, char > *key, char *val) > * strtok() is used so the params string will be copied to be modified. > */ > static int > -rte_kvargs_tokenize(struct rte_kvargs *kvlist, const char *name, > - const char *params) > +rte_kvargs_tokenize(struct rte_kvargs *kvlist, const char *params) > { > unsigned i, count; > char *args; > @@ -101,7 +100,7 @@ rte_kvargs_tokenize(struct rte_kvargs *kvlist, > const char *name, > > /* If params are empty, nothing to do */ > if (params == NULL || params[0] == 0) { > - RTE_LOG(ERR, PMD, "Couldn't parse %s device, empty > arguments\n", name); > + RTE_LOG(ERR, PMD, "Cannot parse empty arguments\n"); > return -1; > } > > @@ -110,7 +109,7 @@ rte_kvargs_tokenize(struct rte_kvargs *kvlist, > const char *name, >*/ > args = strdup(params); > if(args == NULL){ > - RTE_LOG(ERR, PMD, "Couldn't parse %s device \n", name); > + RTE_LOG(ERR, PMD, "Cannot parse arguments: not enough > memory\n"); > return -1; > } > > @@ -127,7 +126,8 @@ rte_kvargs_tokenize(struct rte_kvargs *kvlist, > const char *name, > if (pair[0] == NULL || pair[1] == NULL || pair[0][0] == 0 > || pair[1][0] == 0) { > RTE_LOG(ERR, PMD, > - "Couldn't parse %s device, wrong key > or value \n", name); > + "Cannot parse arguments: wrong key or > value\n" > + "params=<%s>\n", params); > goto error; > } > > @@ -230,14 +230,13 @@ rte_kvargs_process(const struct rte_kvargs > *kvlist, > */ > int > rte_kvargs_parse(struct rte_kvargs *kvlist, > - const char *name, > const char *args, > const char *valid_keys[]) > { > > int ret; > > - ret = rte_kvargs_tokenize(kvlist, name, args); > + ret = rte_kvargs_tokenize(kvlist, args); > if (ret < 0) > return ret; > > diff --git a/lib/librte_kvargs/rte_kvargs.h b/lib/librte_kvargs/rte_kvargs.h > index 19485b1..804ea1d 100644 > --- a/lib/librte_kvargs/rte_kvargs.h > +++ b/lib/librte_kvargs/rte_kvargs.h > @@ -100,8 +100,6 @@ int rte_kvargs_init(struct rte_kvargs *kvlist); > * > * @param kvlist > * The rte_kvargs structure > - * @param name > - * The name of the driver > * @param args > * The input string containing the key/value associations > * @param valid_keys > @@ -112,7 +110,7 @@ int rte_kvargs_init(struct rte_kvargs *kvlist); > * - 0 on success > * - Negative on error > */ > -int rte_kvargs_parse(struct rte_kvargs *kvlist, const char *name, > +int rte_kvargs_parse(struct rte_kvargs *kvlist, > const char *args, const char *valid_keys[]); > > /** > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c > b/lib/librte_pmd_pcap/rte_eth_pcap.c > index e47afcb..2006b35 100644 > --- a/lib/librte_pmd_pcap/rte_eth_pcap.c > +++ b/lib/librte_pmd_pcap/rte_eth_pcap.c > @@ -706,6 +706,8 @@ rte_pmd_pcap_init(const char *name, const char > *params) > struct rx_pcaps pcaps; > struct tx_pcaps dumpers; > > + RTE_LOG(INFO, PMD, "Initializing pmd_pcap for %s\n", name); > + > rte_kvargs_init(); > > numa_node = rte_socket_id(); > @@ -714,7 +716,7 @@ rte_pmd_pcap_init(const char *name, const char > *params) > start_cycles = rte_get_timer_cycles(); > hz = rte_get_timer_hz(); > > - if (rte_kvargs_parse(, name, params, valid_arguments) < 0) > + if (rte_kvargs_parse(, params, valid_arguments) < 0) > return -1; > > /* > diff --git a/lib/librte_pmd_ring/rte_eth_ring.c > b/lib/librte_pmd_ring/rte_eth_ring.c > index fa3ff72..abef2e8 100644 > --- a/lib/librte_pmd_ring/rte_eth_ring.c > +++ b/lib/librte_pmd_ring/rte_eth_ring.c > @@ -384,6
[dpdk-dev] [PATCH 02/11] kvargs: use the new library in pmd_pcap
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, January 28, 2014 4:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 02/11] kvargs: use the new library in pmd_pcap > > The rte_kvargs library is a reworked copy of rte_eth_pcap_arg_parser, so it > provides the same service. Therefore we can use it and remove the code of > rte_eth_pcap_arg_parser. > > Signed-off-by: Olivier Matz > --- > lib/librte_pmd_pcap/Makefile | 8 +- > lib/librte_pmd_pcap/rte_eth_pcap.c| 29 +-- > lib/librte_pmd_pcap/rte_eth_pcap_arg_parser.c | 255 > -- > lib/librte_pmd_pcap/rte_eth_pcap_arg_parser.h | 71 --- > 4 files changed, 19 insertions(+), 344 deletions(-) delete mode 100644 > lib/librte_pmd_pcap/rte_eth_pcap_arg_parser.c > delete mode 100644 lib/librte_pmd_pcap/rte_eth_pcap_arg_parser.h > > diff --git a/lib/librte_pmd_pcap/Makefile b/lib/librte_pmd_pcap/Makefile > index 33174c0..741dafc 100644 > --- a/lib/librte_pmd_pcap/Makefile > +++ b/lib/librte_pmd_pcap/Makefile > @@ -1,6 +1,7 @@ > # BSD LICENSE > # > # Copyright(c) 2010-2013 Intel Corporation. All rights reserved. > +# Copyright(c) 2014 6WIND S.A. > # All rights reserved. > # > # Redistribution and use in source and binary forms, with or without > @@ -43,16 +44,15 @@ CFLAGS += $(WERROR_FLAGS) # all source are > stored in SRCS-y # > SRCS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += rte_eth_pcap.c > -SRCS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += rte_eth_pcap_arg_parser.c > - > > # > # Export include files > # > SYMLINK-y-include += rte_eth_pcap.h > -SYMLINK-y-include += rte_eth_pcap_arg_parser.h > > # this lib depends upon: > -DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += lib/librte_mbuf > lib/librte_ether > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += lib/librte_mbuf > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += lib/librte_ether > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += lib/librte_kvargs > > include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c > b/lib/librte_pmd_pcap/rte_eth_pcap.c > index 208e316..e47afcb 100644 > --- a/lib/librte_pmd_pcap/rte_eth_pcap.c > +++ b/lib/librte_pmd_pcap/rte_eth_pcap.c > @@ -2,6 +2,7 @@ > * BSD LICENSE > * > * Copyright(c) 2010-2013 Intel Corporation. All rights reserved. > + * Copyright(c) 2014 6WIND S.A. > * All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > @@ -39,9 +40,9 @@ > #include > #include > #include > +#include > > #include "rte_eth_pcap.h" > -#include "rte_eth_pcap_arg_parser.h" > > #define RTE_ETH_PCAP_SNAPSHOT_LEN 65535 #define > RTE_ETH_PCAP_SNAPLEN 4096 @@ -701,11 +702,11 @@ > rte_pmd_pcap_init(const char *name, const char *params) { > unsigned numa_node, using_dumpers = 0; > int ret; > - struct args_dict dict; > + struct rte_kvargs kvlist; > struct rx_pcaps pcaps; > struct tx_pcaps dumpers; > > - rte_eth_pcap_init_args_dict(); > + rte_kvargs_init(); > > numa_node = rte_socket_id(); > > @@ -713,16 +714,16 @@ rte_pmd_pcap_init(const char *name, const > char *params) > start_cycles = rte_get_timer_cycles(); > hz = rte_get_timer_hz(); > > - if (rte_eth_pcap_parse_args(, name, params, > valid_arguments) < 0) > + if (rte_kvargs_parse(, name, params, valid_arguments) < 0) > return -1; > > /* >* If iface argument is passed we open the NICs and use them for >* reading / writing >*/ > - if (rte_eth_pcap_num_of_args(, ETH_PCAP_IFACE_ARG) == 1) > { > + if (rte_kvargs_count(, ETH_PCAP_IFACE_ARG) == 1) { > > - ret = rte_eth_pcap_post_process_arguments(, > ETH_PCAP_IFACE_ARG, > + ret = rte_kvargs_process(, ETH_PCAP_IFACE_ARG, > _rx_tx_iface, [0]); > if (ret < 0) > return -1; > @@ -734,13 +735,13 @@ rte_pmd_pcap_init(const char *name, const > char *params) >* We check whether we want to open a RX stream from a real NIC > or a >* pcap file >*/ > - if ((pcaps.num_of_rx = rte_eth_pcap_num_of_args(, > ETH_PCAP_RX_PCAP_ARG))) { > - ret = rte_eth_pcap_post_process_arguments(, > ETH_PCAP_RX_PCAP_ARG, > + if ((pcaps.num_of_rx = rte_kvargs_count(, > ETH_PCAP_RX_PCAP_ARG))) { > + ret = rte_kvargs_process(, > ETH_PCAP_RX_PCAP_ARG, > _rx_pcap, ); > } else { > - pcaps.num_of_rx = rte_eth_pcap_num_of_args(, > + pcaps.num_of_rx = rte_kvargs_count(, > ETH_PCAP_RX_IFACE_ARG); > - ret = rte_eth_pcap_post_process_arguments(, > ETH_PCAP_RX_IFACE_ARG, > + ret = rte_kvargs_process(, > ETH_PCAP_RX_IFACE_ARG, > _rx_iface, ); > } > > @@ -751,15 +752,15 @@ rte_pmd_pcap_init(const char *name, const > char
[dpdk-dev] [PATCH 01/11] kvargs: add a new library to parse key/value arguments
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, January 28, 2014 4:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 01/11] kvargs: add a new library to parse > key/value arguments > > Copy the code from rte_eth_pcap_arg_parser.[ch], without any functional > modifications, only: > - rename functions and structure > - restyle (indentation) > - add comments (doxygen style) > - add "const" or "static" attributes, remove unneeded "inline" > > Signed-off-by: Olivier Matz > --- > config/defconfig_i686-default-linuxapp-gcc | 5 + > config/defconfig_i686-default-linuxapp-icc | 5 + > config/defconfig_x86_64-default-linuxapp-gcc | 5 + > config/defconfig_x86_64-default-linuxapp-icc | 5 + > lib/Makefile | 1 + > lib/librte_kvargs/Makefile | 50 ++ > lib/librte_kvargs/rte_kvargs.c | 248 > +++ > lib/librte_kvargs/rte_kvargs.h | 159 + > mk/rte.app.mk| 4 + > 9 files changed, 482 insertions(+) > create mode 100644 lib/librte_kvargs/Makefile create mode 100644 > lib/librte_kvargs/rte_kvargs.c create mode 100644 > lib/librte_kvargs/rte_kvargs.h > > diff --git a/config/defconfig_i686-default-linuxapp-gcc > b/config/defconfig_i686-default-linuxapp-gcc > index 4f57242..349ca24 100644 > --- a/config/defconfig_i686-default-linuxapp-gcc > +++ b/config/defconfig_i686-default-linuxapp-gcc > @@ -141,6 +141,11 @@ CONFIG_RTE_LIBRTE_EAL_BAREMETAL=n > CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y > > # > +# Compile the argument parser library > +# > +CONFIG_RTE_LIBRTE_KVARGS=y > + > +# > # Compile generic ethernet library > # > CONFIG_RTE_LIBRTE_ETHER=y > diff --git a/config/defconfig_i686-default-linuxapp-icc > b/config/defconfig_i686-default-linuxapp-icc > index 90521e1..4e7338f 100644 > --- a/config/defconfig_i686-default-linuxapp-icc > +++ b/config/defconfig_i686-default-linuxapp-icc > @@ -141,6 +141,11 @@ CONFIG_RTE_LIBRTE_EAL_BAREMETAL=n > CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y > > # > +# Compile the argument parser library > +# > +CONFIG_RTE_LIBRTE_KVARGS=y > + > +# > # Compile generic ethernet library > # > CONFIG_RTE_LIBRTE_ETHER=y > diff --git a/config/defconfig_x86_64-default-linuxapp-gcc > b/config/defconfig_x86_64-default-linuxapp-gcc > index 4c05ffc..ca85b1a 100644 > --- a/config/defconfig_x86_64-default-linuxapp-gcc > +++ b/config/defconfig_x86_64-default-linuxapp-gcc > @@ -141,6 +141,11 @@ CONFIG_RTE_LIBRTE_EAL_BAREMETAL=n > CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y > > # > +# Compile the argument parser library > +# > +CONFIG_RTE_LIBRTE_KVARGS=y > + > +# > # Compile generic ethernet library > # > CONFIG_RTE_LIBRTE_ETHER=y > diff --git a/config/defconfig_x86_64-default-linuxapp-icc > b/config/defconfig_x86_64-default-linuxapp-icc > index c4d5c73..37dbdd0 100644 > --- a/config/defconfig_x86_64-default-linuxapp-icc > +++ b/config/defconfig_x86_64-default-linuxapp-icc > @@ -141,6 +141,11 @@ CONFIG_RTE_LIBRTE_EAL_BAREMETAL=n > CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y > > # > +# Compile the argument parser library > +# > +CONFIG_RTE_LIBRTE_KVARGS=y > + > +# > # Compile generic ethernet library > # > CONFIG_RTE_LIBRTE_ETHER=y > diff --git a/lib/Makefile b/lib/Makefile index b655d4f..31644f2 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -52,6 +52,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power > DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter > DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched > DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl > +DIRS-$(CONFIG_RTE_LIBRTE_KVARGS) += librte_kvargs > > ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) > DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni diff --git > a/lib/librte_kvargs/Makefile b/lib/librte_kvargs/Makefile new file mode > 100644 index 000..c4c2855 > --- /dev/null > +++ b/lib/librte_kvargs/Makefile > @@ -0,0 +1,50 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2014 6WIND S.A. > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products derived > +# from this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > +# "AS
[dpdk-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
Hi Pravin, I think your stuff is on the brink of a creating a mini revolution :) Some questions inline below -- +ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk What do you mean by portid here, do you mean the physical interface id like eth0 which I have bound to igb_uio now ? If I have multiple interfaces I have assigned igb_uio to, eg. eth0, eth1, eth2 etc., what is the id mapping for those ? If I have VM's running, then typically how to interface those VM's to this OVS in user space now, do I use the same classical 'tap' interface and add it to the OVS above. What is the actual path the data takes from the VM now all the way to the switch, wouldn't it be hypervisor to kernel to OVS switch in user space to other VM/Network ? I think if we can solve the VM to OVS port connectivity remaining in userspace only, then we have a great thing at our hand. Kindly comment on this. Regards -Prashant -Original Message- From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of pshe...@nicira.com Sent: Tuesday, January 28, 2014 7:19 AM To: dev at openvswitch.org; dev at dpdk.org; dpdk-ovs at lists.01.org Cc: Gerald Rogers Subject: [dpdk-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports. From: Pravin B ShelarFollowing patch adds DPDK netdev-class to userspace datapath. Approach taken in this patch differs from Intel? DPDK vSwitch where DPDK datapath switching is done in saparate process. This patch adds support for DPDK type port and uses OVS userspace datapath for switching. Therefore all DPDK processing and flow miss handling is done in single process. This also avoids code duplication by reusing OVS userspace datapath switching and therefore it supports all flow matching and actions that user-space datapath supports. Refer to INSTALL.DPDK doc for further info. With this patch I got similar performance for netperf TCP_STREAM tests compared to kernel datapath. This is based a patch from Gerald Rogers. Signed-off-by: Pravin B Shelar CC: "Gerald Rogers" --- This patch is tested on latest OVS master (commit 9d0581fdf22bec79). --- INSTALL |1 + INSTALL.DPDK| 85 Makefile.am |1 + acinclude.m4| 40 ++ configure.ac|1 + lib/automake.mk |6 + lib/dpif-netdev.c | 393 +++- lib/netdev-dpdk.c | 1152 +++ lib/netdev-dpdk.h |7 + lib/netdev-dummy.c | 38 +- lib/netdev-linux.c | 33 +- lib/netdev-provider.h | 13 +- lib/netdev-vport.c |1 + lib/netdev.c| 52 ++- lib/netdev.h| 15 +- lib/ofpbuf.c|7 +- lib/ofpbuf.h| 13 +- lib/packets.c |9 + lib/packets.h |1 + vswitchd/ovs-vswitchd.c | 14 +- 20 files changed, 1702 insertions(+), 180 deletions(-) create mode 100644 INSTALL.DPDK create mode 100644 lib/netdev-dpdk.c create mode 100644 lib/netdev-dpdk.h diff --git a/INSTALL b/INSTALL index 001d3cb..74cd278 100644 --- a/INSTALL +++ b/INSTALL @@ -10,6 +10,7 @@ on a specific platform, please see one of these files: - INSTALL.RHEL - INSTALL.XenServer - INSTALL.NetBSD +- INSTALL.DPDK Build Requirements -- diff --git a/INSTALL.DPDK b/INSTALL.DPDK new file mode 100644 index 000..1c95104 --- /dev/null +++ b/INSTALL.DPDK @@ -0,0 +1,85 @@ + Using Open vSwitch with DPDK + + +Open vSwitch can use Intel(R) DPDK lib to operate entirely in +userspace. This file explains how to install and use Open vSwitch in +such a mode. + +The DPDK support of Open vSwitch is considered experimental. +It has not been thoroughly tested. + +This version of Open vSwitch should be built manually with "configure" +and "make". + +Building and Installing: + + +DPDK: +cd DPDK +make install T=x86_64-default-linuxapp-gcc +Refer to http://dpdk.org/ requirements of details. + +Linux kernel: +Refer to intel-dpdk-getting-started-guide.pdf for understanding +DPDK kernel requirement. + +OVS: +cd $(OVS_DIR)/openvswitch +./boot.sh +./configure --with-dpdk=$(DPDK_BUILD) +make + +Refer to INSTALL.userspace for general requirements of building +userspace OVS. + +Using the DPDK with ovs-vswitchd: +- + +Fist setup DPDK devices: + - insert igb_uio.ko +e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko + - mount hugefs +e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/ + - Bind network device to ibg_uio. +e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1 + +Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. + +Start vswitchd: +DPDK configuration arguments can be passed to vswitchd via `--dpdk` +argument. + e.g. + ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach + +To
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On Wed, Jan 29, 2014 at 2:01 AM, Thomas Graf wrote: > On 01/28/2014 02:48 AM, pshelar at nicira.com wrote: >> >> From: Pravin B Shelar >> >> Following patch adds DPDK netdev-class to userspace datapath. >> Approach taken in this patch differs from Intel? DPDK vSwitch >> where DPDK datapath switching is done in saparate process. This >> patch adds support for DPDK type port and uses OVS userspace >> datapath for switching. Therefore all DPDK processing and flow >> miss handling is done in single process. This also avoids code >> duplication by reusing OVS userspace datapath switching and >> therefore it supports all flow matching and actions that >> user-space datapath supports. Refer to INSTALL.DPDK doc for >> further info. >> >> With this patch I got similar performance for netperf TCP_STREAM >> tests compared to kernel datapath. >> >> This is based a patch from Gerald Rogers. >> >> Signed-off-by: Pravin B Shelar >> CC: "Gerald Rogers" > > > Pravin, > > Some initial comments below. I will provide more after deeper > digging. > > Do you have any ideas on how to implement the TX batching yet? > We can batch packets for some interval, then to do tx-burst. But I did not see any performance improvements as we have other bottleneck in userspace datapath. Ben's RCU patch should help there. > >> + >> +static int >> +netdev_dpdk_rx_drain(struct netdev_rx *rx_) >> +{ >> +struct netdev_rx_dpdk *rx = netdev_rx_dpdk_cast(rx_); >> +int pending; >> +int i; >> + >> +pending = rx->ofpbuf_cnt; >> +if (pending) { > > > This conditional seems unneeded. > Right. > >> +for (i = 0; i < pending; i++) { >> + build_ofpbuf(rx, >ofpbuf[i], NULL); >> +} >> +rx->ofpbuf_cnt = 0; >> +return 0; >> +} >> + >> +return 0; >> +} >> + >> +/* Tx function. Transmit packets indefinitely */ >> +static int >> +dpdk_do_tx_copy(struct netdev *netdev, char *buf, int size) >> +{ >> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> +struct rte_mbuf *pkt; >> +uint32_t nb_tx = 0; >> + >> +pkt = rte_pktmbuf_alloc(dev->dpdk_mp->mp); >> +if (!pkt) { >> +return 0; > > > Silent drop? ;-) Shouldn't these drops be accounted for somehow? > ahh, I will keep it in netdev-dpdk. > >> +} >> + >> +/* We have to do a copy for now */ >> +memcpy(pkt->pkt.data, buf, size); >> + >> +rte_pktmbuf_data_len(pkt) = size; >> +rte_pktmbuf_pkt_len(pkt) = size; >> + >> +rte_spinlock_lock(>tx_lock); > > > What is the purpose of tx_lock here? Multiple threads writing to > the same Q? The lock is not acquired for the zerocopy path below. > There are PMD threads which have their own queue. So tx in these threads is lockless. But vswitchd can send packet from other thread all other thread send packets from single queue which is locked. > >> +nb_tx = rte_eth_tx_burst(dev->port_id, NR_QUEUE, , 1); >> +rte_spinlock_unlock(>tx_lock); >> + >> +if (nb_tx != 1) { >> +/* free buffers if we couldn't transmit packets */ >> +rte_mempool_put_bulk(dev->dpdk_mp->mp, (void **), 1); >> +} >> +return nb_tx; >> +} >> + >> +static int >> +netdev_dpdk_send(struct netdev *netdev, >> + struct ofpbuf *ofpbuf, bool may_steal) >> +{ >> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + >> +if (ofpbuf->size > dev->max_packet_len) { >> +VLOG_ERR("2big size %d max_packet_len %d", >> + (int)ofpbuf->size , dev->max_packet_len); > > > Should probably use VLOG_RATE_LIMIT_INIT > ok. > >> +return E2BIG; >> +} >> + >> +rte_prefetch0(>private_p); >> +if (!may_steal || >> +!ofpbuf->private_p || ofpbuf->source != OFPBUF_DPDK) { >> +dpdk_do_tx_copy(netdev, (char *) ofpbuf->data, ofpbuf->size); >> +} else { >> +struct rte_mbuf *pkt; >> +uint32_t nb_tx; >> +int qid; >> + >> +pkt = ofpbuf->private_p; >> +ofpbuf->private_p = NULL; >> +rte_pktmbuf_data_len(pkt) = ofpbuf->size; >> +rte_pktmbuf_pkt_len(pkt) = ofpbuf->size; >> + >> +/* TODO: TX batching. */ >> +qid = rte_lcore_id() % NR_QUEUE; >> +nb_tx = rte_eth_tx_burst(dev->port_id, qid, , 1); >> +if (nb_tx != 1) { >> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + >> +rte_mempool_put_bulk(dev->dpdk_mp->mp, (void **), 1); >> +VLOG_ERR("TX error, zero packets sent"); > > > Same here > ok > >> + } >> +} >> +return 0; >> +} > > >> +static int >> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >> +{ >> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> +int old_mtu, err; >> +struct dpdk_mp *old_mp; >> +struct dpdk_mp *mp; >> + >> +ovs_mutex_lock(_mutex); >> +ovs_mutex_lock(>mutex); >> +if (dev->mtu == mtu) { >> +err = 0; >> +goto out; >> +} >> + >> +mp = dpdk_mp_get(dev->socket_id, dev->mtu); >> +
[dpdk-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On Wed, Jan 29, 2014 at 12:56 AM, Prashant Upadhyaya wrote: > Hi Pravin, > > I think your stuff is on the brink of a creating a mini revolution :) > > Some questions inline below -- > +ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > What do you mean by portid here, do you mean the physical interface id like > eth0 which I have bound to igb_uio now ? > If I have multiple interfaces I have assigned igb_uio to, eg. eth0, eth1, > eth2 etc., what is the id mapping for those ? > Port id is id assigned by DPDK. DPDK interface takes this port id as argument. Currently you need to look at pci id to figure out the device mapping to port id. I know it is clean and I am exploring better interface so that we can specify device names to ovs-vsctl. > If I have VM's running, then typically how to interface those VM's to this > OVS in user space now, do I use the same classical 'tap' interface and add it > to the OVS above. tap device will work, but you would not get performance primarily due to scheduling delay and memcopy. DPDK has multiple drivers to create interface with KVM guests OS. those should perform better. I have no tried it yet. > What is the actual path the data takes from the VM now all the way to the > switch, wouldn't it be hypervisor to kernel to OVS switch in user space to > other VM/Network ? Depends on method you use. e.g. Memnic bypass hypervisor and host kernel entirely. > I think if we can solve the VM to OVS port connectivity remaining in > userspace only, then we have a great thing at our hand. Kindly comment on > this. > right, performance looks pretty good. Still DPDK needs constant polling which consumes more power. RFC ovs-dkdp patch has simple polling which need tweaking for better power usage. Thanks, Pravin. > Regards > -Prashant > >
[dpdk-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On Tue, Jan 28, 2014 at 4:15 PM, Vincent JARDIN wrote: > Hi Pravin, > > >>> Few feature questions: >>> >>>- what's about the vNIC supports (toward the guests)? >>>- what's about IPsec support (VxLAN over IPsec for instance)? >>> I do not understand how your patch will solve those 2 cases. >>> >> At this point I wanted to get basic DPDK support in OVS, once that is >> done we can add support for vNIC. > > > For vNIC, did you notice: > http://dpdk.org/browse/memnic/ > > ? AFAIU it is introducing backend driver for vNIC which shld work with this patch. > >> IPsec and vxlan or any L3 tunneling requires IP stack in userspace and >> needs more design work. > > > OK, understood. > > >> This is based a patch from Gerald Rogers. >>> >>> >>> Please which patch? I cannot find it into the archives. >>> >> It was directly sent to Jesse at Nicira. If you want I can send it out for >> ref. > > > Yes, please. Attached is the patch based upon HEAD-309d9da. Thanks, Pravin. > > Thank you, > Vincent >
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
Hi Thomas, On 29/01/2014 09:15, Thomas Graf wrote: > The obvious and usual best practise would be for DPDK to guarantee > ABI stability between minor releases. > > Since dpdk-dev is copied as well, any comments? DPDK's ABIs are not Kernel's ABIs, they are not POSIX, there is no standard. Currently, there is no such plan to have a stable ABI since we need to keep freedom to chase CPU cycles over having a stable ABI. For instance, some applications on top of the DPDK process the packets in less than 150 CPU cycles (have a look at testpmd: http://dpdk.org/browse/dpdk/tree/app/test-pmd ) I agree that some areas could be improved since they are not into the critical datapath of packets, but still other areas remain very CPU constraints. For instance: http://dpdk.org/browse/dpdk/commit/lib/librte_ether/rte_ethdev.h?id=c3d0564cf0f00c3c9a61cf72bd4bd1c441740637 is bad: struct eth_dev_ops is churned, no comment, and a #ifdef that changes the structure according to compilation! Should an application use the librte libraries of the DPDK: - you can use RTE_VERSION and RTE_VERSION_NUM : http://dpdk.org/doc/api/rte__version_8h.html#a8775053b0f721b9fa0457494cfbb7ed9 - you can write your own wrapper (with CPU overhead) in order to have a stable ABI, that wrapper should be tight to the versions of the librte => the overhead is part of your application instead of the DPDK, - *otherwise recompile your software, it is opensource, what's the issue?* We are opened to any suggestion to have stable ABI, but it should never remove the options to have fast/efficient/compilation/CPU execution processing. Best regards, Vincent
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On 01/28/2014 02:48 AM, pshelar at nicira.com wrote: > From: Pravin B Shelar > > Following patch adds DPDK netdev-class to userspace datapath. > Approach taken in this patch differs from Intel? DPDK vSwitch > where DPDK datapath switching is done in saparate process. This > patch adds support for DPDK type port and uses OVS userspace > datapath for switching. Therefore all DPDK processing and flow > miss handling is done in single process. This also avoids code > duplication by reusing OVS userspace datapath switching and > therefore it supports all flow matching and actions that > user-space datapath supports. Refer to INSTALL.DPDK doc for > further info. > > With this patch I got similar performance for netperf TCP_STREAM > tests compared to kernel datapath. > > This is based a patch from Gerald Rogers. > > Signed-off-by: Pravin B Shelar > CC: "Gerald Rogers" Pravin, Some initial comments below. I will provide more after deeper digging. Do you have any ideas on how to implement the TX batching yet? > + > +static int > +netdev_dpdk_rx_drain(struct netdev_rx *rx_) > +{ > +struct netdev_rx_dpdk *rx = netdev_rx_dpdk_cast(rx_); > +int pending; > +int i; > + > +pending = rx->ofpbuf_cnt; > +if (pending) { This conditional seems unneeded. > +for (i = 0; i < pending; i++) { > + build_ofpbuf(rx, >ofpbuf[i], NULL); > +} > +rx->ofpbuf_cnt = 0; > +return 0; > +} > + > +return 0; > +} > + > +/* Tx function. Transmit packets indefinitely */ > +static int > +dpdk_do_tx_copy(struct netdev *netdev, char *buf, int size) > +{ > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > +struct rte_mbuf *pkt; > +uint32_t nb_tx = 0; > + > +pkt = rte_pktmbuf_alloc(dev->dpdk_mp->mp); > +if (!pkt) { > +return 0; Silent drop? ;-) Shouldn't these drops be accounted for somehow? > +} > + > +/* We have to do a copy for now */ > +memcpy(pkt->pkt.data, buf, size); > + > +rte_pktmbuf_data_len(pkt) = size; > +rte_pktmbuf_pkt_len(pkt) = size; > + > +rte_spinlock_lock(>tx_lock); What is the purpose of tx_lock here? Multiple threads writing to the same Q? The lock is not acquired for the zerocopy path below. > +nb_tx = rte_eth_tx_burst(dev->port_id, NR_QUEUE, , 1); > +rte_spinlock_unlock(>tx_lock); > + > +if (nb_tx != 1) { > +/* free buffers if we couldn't transmit packets */ > +rte_mempool_put_bulk(dev->dpdk_mp->mp, (void **), 1); > +} > +return nb_tx; > +} > + > +static int > +netdev_dpdk_send(struct netdev *netdev, > + struct ofpbuf *ofpbuf, bool may_steal) > +{ > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + > +if (ofpbuf->size > dev->max_packet_len) { > +VLOG_ERR("2big size %d max_packet_len %d", > + (int)ofpbuf->size , dev->max_packet_len); Should probably use VLOG_RATE_LIMIT_INIT > +return E2BIG; > +} > + > +rte_prefetch0(>private_p); > +if (!may_steal || > +!ofpbuf->private_p || ofpbuf->source != OFPBUF_DPDK) { > +dpdk_do_tx_copy(netdev, (char *) ofpbuf->data, ofpbuf->size); > +} else { > +struct rte_mbuf *pkt; > +uint32_t nb_tx; > +int qid; > + > +pkt = ofpbuf->private_p; > +ofpbuf->private_p = NULL; > +rte_pktmbuf_data_len(pkt) = ofpbuf->size; > +rte_pktmbuf_pkt_len(pkt) = ofpbuf->size; > + > +/* TODO: TX batching. */ > +qid = rte_lcore_id() % NR_QUEUE; > +nb_tx = rte_eth_tx_burst(dev->port_id, qid, , 1); > +if (nb_tx != 1) { > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + > +rte_mempool_put_bulk(dev->dpdk_mp->mp, (void **), 1); > +VLOG_ERR("TX error, zero packets sent"); Same here > + } > +} > +return 0; > +} > +static int > +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) > +{ > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > +int old_mtu, err; > +struct dpdk_mp *old_mp; > +struct dpdk_mp *mp; > + > +ovs_mutex_lock(_mutex); > +ovs_mutex_lock(>mutex); > +if (dev->mtu == mtu) { > +err = 0; > +goto out; > +} > + > +mp = dpdk_mp_get(dev->socket_id, dev->mtu); > +if (!mp) { > +err = ENOMEM; > +goto out; > +} > + > +rte_eth_dev_stop(dev->port_id); > + > +old_mtu = dev->mtu; > +old_mp = dev->dpdk_mp; > +dev->dpdk_mp = mp; > +dev->mtu = mtu; > +dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); > + > +err = dpdk_eth_dev_init(dev); > +if (err) { > + > +dpdk_mp_put(mp); > +dev->mtu = old_mtu; > +dev->dpdk_mp = old_mp; > +dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); > +dpdk_eth_dev_init(dev); Would be nice if we don't need these constructs and DPDK would provide an all or nothing init method. > +goto out; > +} > + > +dpdk_mp_put(old_mp); >
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On Wed, 29 Jan 2014 18:14:01 +0100 Thomas Graf wrote: > On 01/29/2014 05:34 PM, Vincent JARDIN wrote: > > Thomas, > > > > First and easy answer: it is open source, so anyone can recompile. So, > > what's the issue? > > I'm talking from a pure distribution perspective here: Requiring to > recompile all DPDK based applications to distribute a bugfix or to > add support for a new PMD is not ideal. > > So ideally OVS would have the possibility to link against the shared > library long term. > > > I get lost: do you mean ABI + API toward the PMDs or towards the > > applications using the librte ? > > Towards the PMDs is more straight forward at first so it seems logical > to focus on that first. > > A stable API and ABI for librte seems required as well long term as > DPDK does offer shared libraries but I realize that this is a stretch > goal in the initial phase. I would hate to see the API/ABI nailed down. We have lots of bug fixes and new drivers that are ready to contribute, but most of them have some change to existing ABI.
[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
On 01/28/2014 07:17 PM, Pravin Shelar wrote: > Right, version mismatch will not work. API provided by DPDK are not > stable, So OVS has to be built for different releases for now. > > I do not see how we can fix it from OVS side. DPDK needs to > standardize API, Actually OVS also needs more API, like DPDK > initialization, mempool destroy, etc. Agreed. It's not fixable from the OVS side. I also don't want to object to including this. I'm just raising awareness of the issue as this will become essential for dstribution. The obvious and usual best practise would be for DPDK to guarantee ABI stability between minor releases. Since dpdk-dev is copied as well, any comments?
[dpdk-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
Hi Pravin, >> Few feature questions: >>- what's about the vNIC supports (toward the guests)? >>- what's about IPsec support (VxLAN over IPsec for instance)? >> I do not understand how your patch will solve those 2 cases. >> > At this point I wanted to get basic DPDK support in OVS, once that is > done we can add support for vNIC. For vNIC, did you notice: http://dpdk.org/browse/memnic/ ? > IPsec and vxlan or any L3 tunneling requires IP stack in userspace and > needs more design work. OK, understood. > This is based a patch from Gerald Rogers. >> >> Please which patch? I cannot find it into the archives. >> > It was directly sent to Jesse at Nicira. If you want I can send it out for > ref. Yes, please. Thank you, Vincent