[dpdk-dev] how to change binding of NIC ports to NUMA nodes
Hi Pablo, Thank you for the reply. I think I did not convey my query properly in my question. I agree that physical placement of NICs in PCIe slots decides the NUMA node to which it is associated. But in the server that I am experimenting(IBM system x 3850 x5 with 4 xeon 7560 processors) there are two IO hubs though which the PCIe slots are connected to the CPU sockets. 4 of the PCIe slots are connected to 1 IOH and 3 slots are connected to the second IOH. Each IOH is connected to 2 cpu sockets- IOH1 is connected to sockets (0 and 1) . IOH2 is connected to sockets (2 and 3). When I put 2 NICs in slots connecting to IOH1, both get binded to socket 0. Similarly when I put 2 NICs in slots connecting to IOH2, both get binded to socket 2. My question is why none of the cards get binded to numa nodes(sockets) 1 or 3? Is there something that I am missing in the physical architecture of the server? is it that each IOH is directly connected to only 1 socket? Regards Rajesh On Fri, Sep 4, 2015 at 12:50 PM, De Lara Guarch, Pablo < pablo.de.lara.guarch at intel.com> wrote: > Hi Rajesh, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rajesh R > > Sent: Friday, September 04, 2015 5:29 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] how to change binding of NIC ports to NUMA nodes > > > > Hi, > > > > I am trying an application based on dpdk on a 4- processor server i.e. 4 > > numa nodes. > > The server is having with 4 NIC cards out of which 2 cards get binded to > > numa node 0 and other 2 cards get binded to numa node 2 (as per the > > /sys/pci/.../numa_node for each card) > > > > > > How to evenly distribute the cards to all the numa nodes so that one card > > each gets binded to one numa node? > > > > Can we control the binding from dpdk, either pmd_ixgbe or igb_uio? > > The drivers cannot change the numa node where your NICs are, > as those nodes are associated to the different physical sockets (CPU and > memory) > that you have on your platform, and your NICs are connected physically > to these sockets via the PCI slots. > > So, if you want to change the numa node, you will have to move the NIC(s) > to another PCI slot that is connected to a different socket. > Look at the user guide of your platform to find out which PCI slots are > connected to which socket. > > Regards, > > Pablo > > > > > > -- > > Regards > > > > Rajesh R > -- Regards Rajesh R
[dpdk-dev] [RFC PATCH 00/18] refactor eal driver registration code
On Fri, Sep 04, 2015 at 12:18:50PM +0100, Bruce Richardson wrote: > On Fri, Sep 04, 2015 at 12:01:36PM +0100, Bernard Iremonger wrote: > > At present the eal driver registration code is more complicated than it > > needs to be. > > > > This RFC proposes to simplify the eal driver registration code. > > > > Remove the type field from the eal driver structure. > > Refactor the eal driver registration code to use the name > > field in the eal driver structure instead of the type field. > > > > Modify all PMD's to use the modified eal driver structure. > > Initialise the name field in the eal driver structure > > in some PMD's where it is not initialised at present. > > > > > Hi, > > I don't think I like this approach very much. It seems very brittle to remove > the explicit type field and starting to rely on the drivers putting a prefix > in the name instead i.e. implicit typing. > > What is the major concern with marking drivers as virtual or physical? My > thinking > is that we should keep the type field, just perhaps change PDEV to be more > descriptive in identifying the type of physical device, e.g. DEV_PCI. > The issue is largely philisophical. We shouldn't need to define the type of bus a driver is on in the init structure of a pmd. Instead we should register it dynamically during pmd initalization As you note, ennumerating the bus type (ie. PCI/USB/etc) is a step in the right direction, but it would be better to register that dynamically than to encode it in the data structure Neil > Regards, > /Bruce >
[dpdk-dev] ixgbe: account more Rx errors Issue
Hi Maryam, Please see below. > XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors Please note than UDP checksum is optional for IPv4, but UDP packets with zero checksum hit XEC. > And general crc errors counts Counts the number of receive packets with CRC > errors. Let me explain you with an example. DPDK 2.0 behavior: host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M ipackets + 1M ierrors (missed) = 10M DPDK 2.1 behavior: host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M? > So our options are we can: > 1. Add only one of these into the error stats. > 2. We can introduce some cooking of stats in this scenario, so only add > either or if they are equal or one is higher than the other. > 3. Add them all which means you can have more errors than the number of > received packets, but TBH this is going to be the case if your packets have > multiple errors anyway. 4. ierrors should reflect NIC drops only. XEC does not count drops, so IMO it should be removed from ierrors. Please note that we still can access the XEC using rte_eth_xstats_get() Regards, Andriy
[dpdk-dev] virtio optimization idea
There is some format issue with the ascii chart of the tx ring. Update that chart. Sorry for the trouble. On 9/4/2015 4:25 PM, Xie, Huawei wrote: > Hi: > > Recently I have done one virtio optimization proof of concept. The > optimization includes two parts: > 1) avail ring set with fixed descriptors > 2) RX vectorization > With the optimizations, we could have several times of performance boost > for purely vhost-virtio throughput. > > Here i will only cover the first part, which is the prerequisite for the > second part. > Let us first take RX for example. Currently when we fill the avail ring > with guest mbuf, we need > a) allocate one descriptor(for non sg mbuf) from free descriptors > b) set the idx of the desc into the entry of avail ring > c) set the addr/len field of the descriptor to point to guest blank mbuf > data area > > Those operation takes time, and especially step b results in modifed (M) > state of the cache line for the avail ring in the virtio processing > core. When vhost processes the avail ring, the cache line transfer from > virtio processing core to vhost processing core takes pretty much CPU > cycles. > To solve this problem, this is the arrangement of RX ring for DPDK > pmd(for non-mergable case). > > avail > idx > + > | > +++---+-+--+ > | 0 | 1 | 2 | ... | 254 | 255 | avail ring > +-+--+-+--+-+-+-+---+--+---+ > ||| | | | > ||| | | | > vvv | v v > +-+--+-+--+-+-+-+---+--+---+ > | 0 | 1 | 2 | ... | 254 | 255 | desc ring > +++---+-+--+ > | > | > +++---+-+--+ > | 0 | 1 | 2 | | 254 | 255 | used ring > +++---+-+--+ > | > + > Avail ring is initialized with fixed descriptor and is never changed, > i.e, the index value of the nth avail ring entry is always n, which > means virtio PMD is actually refilling desc ring only, without having to > change avail ring. > When vhost fetches avail ring, if not evicted, it is always in its first > level cache. > > When RX receives packets from used ring, we use the used->idx as the > desc idx. This requires that vhost processes and returns descs from > avail ring to used ring in order, which is true for both current dpdk > vhost and kernel vhost implementation. In my understanding, there is no > necessity for vhost net to process descriptors OOO. One case could be > zero copy, for example, if one descriptor doesn't meet zero copy > requirment, we could directly return it to used ring, earlier than the > descriptors in front of it. > To enforce this, i want to use a reserved bit to indicate in order > processing of descriptors. > > For tx ring, the arrangement is like below. Each transmitted mbuf needs > a desc for virtio_net_hdr, so actually we have only 128 free slots. > > > > > ++ > > || > > || > >+-+-+-+--+--+--+--+ > >| 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring > >+--+--+--+--+-+---+--+---+--+---+--+--+---+ > > | || || | | | > > v vv || v v v > >+--+--+--+--+-+---+--+---+--+---+--+--+---+ > >| 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for > virtio_net_hdr >+--+--+--+--+-+---+--+---+--+---+--+--+---+ > > | || || | | | > > v vv || v v v > >+--+--+--+--+-+---+--+---+--+---+--+--+---+ > >| 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx > dat > > > > /huawei >
[dpdk-dev] [PATCH v2 3/3] version: 2.2.0-rc0
Signed-off-by: Thomas Monjalon --- lib/librte_eal/common/include/rte_version.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/rte_version.h b/lib/librte_eal/common/include/rte_version.h index 29baa06..08cc87a 100644 --- a/lib/librte_eal/common/include/rte_version.h +++ b/lib/librte_eal/common/include/rte_version.h @@ -60,7 +60,7 @@ extern "C" { /** * Minor version number i.e. the y in x.y.z */ -#define RTE_VER_MINOR 1 +#define RTE_VER_MINOR 2 /** * Patch level number i.e. the z in x.y.z @@ -70,14 +70,14 @@ extern "C" { /** * Extra string to be appended to version number */ -#define RTE_VER_SUFFIX "" +#define RTE_VER_SUFFIX "-rc" /** * Patch release number * 0-15 = release candidates * 16 = release */ -#define RTE_VER_PATCH_RELEASE 16 +#define RTE_VER_PATCH_RELEASE 0 /** * Macro to compute a version number usable for comparisons -- 2.5.1
[dpdk-dev] [PATCH v2 2/3] hash: remove deprecated function and macros
From: Pablo de LaraThe function rte_jhash2() was renamed rte_jhash_32b and macros RTE_HASH_KEY_LENGTH_MAX and RTE_HASH_BUCKET_ENTRIES_MAX were tagged as deprecated, so they can be removed in 2.2. RTE_HASH_KEY_LENGTH is replaced in unit tests by an internal macro for the memory allocation of all keys used. The library version number is incremented. Signed-off-by: Pablo de Lara Signed-off-by: Thomas Monjalon --- app/test/test_hash.c | 7 --- app/test/test_hash_functions.c | 4 ++-- app/test/test_hash_perf.c| 2 +- doc/guides/rel_notes/deprecation.rst | 5 - doc/guides/rel_notes/release_2_2.rst | 5 - lib/librte_hash/Makefile | 2 +- lib/librte_hash/rte_hash.h | 6 -- lib/librte_hash/rte_jhash.h | 15 ++- 8 files changed, 14 insertions(+), 32 deletions(-) diff --git a/app/test/test_hash.c b/app/test/test_hash.c index 7f8c0d3..4f2509d 100644 --- a/app/test/test_hash.c +++ b/app/test/test_hash.c @@ -66,6 +66,7 @@ static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc}; static uint32_t hashtest_initvals[] = {0}; static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64}; +#define MAX_KEYSIZE 64 /**/ #define LOCAL_FBK_HASH_ENTRIES_MAX (1 << 15) @@ -238,7 +239,7 @@ test_crc32_hash_alg_equiv(void) static void run_hash_func_test(rte_hash_function f, uint32_t init_val, uint32_t key_len) { - static uint8_t key[RTE_HASH_KEY_LENGTH_MAX]; + static uint8_t key[MAX_KEYSIZE]; unsigned i; @@ -1100,7 +1101,7 @@ test_hash_creation_with_good_parameters(void) static int test_average_table_utilization(void) { struct rte_hash *handle; - uint8_t simple_key[RTE_HASH_KEY_LENGTH_MAX]; + uint8_t simple_key[MAX_KEYSIZE]; unsigned i, j; unsigned added_keys, average_keys_added = 0; int ret; @@ -1154,7 +1155,7 @@ static int test_hash_iteration(void) { struct rte_hash *handle; unsigned i; - uint8_t keys[NUM_ENTRIES][RTE_HASH_KEY_LENGTH_MAX]; + uint8_t keys[NUM_ENTRIES][MAX_KEYSIZE]; const void *next_key; void *next_data; void *data[NUM_ENTRIES]; diff --git a/app/test/test_hash_functions.c b/app/test/test_hash_functions.c index 8c7cf63..3ad6d80 100644 --- a/app/test/test_hash_functions.c +++ b/app/test/test_hash_functions.c @@ -85,7 +85,7 @@ static uint32_t hash_values_crc[2][10] = {{ * from the array entries is tested. */ #define HASHTEST_ITERATIONS 100 - +#define MAX_KEYSIZE 64 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc}; static uint32_t hashtest_initvals[] = {0, 0xdeadbeef}; static uint32_t hashtest_key_lens[] = { @@ -119,7 +119,7 @@ static void run_hash_func_perf_test(uint32_t key_len, uint32_t init_val, rte_hash_function f) { - static uint8_t key[HASHTEST_ITERATIONS][RTE_HASH_KEY_LENGTH_MAX]; + static uint8_t key[HASHTEST_ITERATIONS][MAX_KEYSIZE]; uint64_t ticks, start, end; unsigned i, j; diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c index a87fc80..9d53c14 100644 --- a/app/test/test_hash_perf.c +++ b/app/test/test_hash_perf.c @@ -140,7 +140,7 @@ shuffle_input_keys(unsigned table_index) { unsigned i; uint32_t swap_idx; - uint8_t temp_key[RTE_HASH_KEY_LENGTH_MAX]; + uint8_t temp_key[MAX_KEYSIZE]; hash_sig_t temp_signature; int32_t temp_position; diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5f6079b..fffad80 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -13,11 +13,6 @@ Deprecation Notices There is no backward compatibility planned from release 2.2. All binaries will need to be rebuilt from release 2.2. -* The Macros RTE_HASH_BUCKET_ENTRIES_MAX and RTE_HASH_KEY_LENGTH_MAX are - deprecated and will be removed with version 2.2. - -* The function rte_jhash2 is deprecated and should be removed. - * The following fields have been deprecated in rte_eth_stats: imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index abe57b4..682f468 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -21,6 +21,9 @@ API Changes * The deprecated ACL API ipv4vlan is removed. +* The deprecated hash function rte_jhash2() is removed. + It was replaced by rte_jhash_32b(). + * The deprecated KNI functions are removed: rte_kni_create(), rte_kni_get_port_id() and rte_kni_info_get(). @@ -58,7 +61,7 @@ The libraries prepended with a plus sign were incremented in this version.
[dpdk-dev] [PATCH v2 1/3] enic: use appropriate key length in hash table
From: Pablo de LaraRTE_HASH_KEY_LENGTH_MAX was deprecated, and the hash table actually is hosting bigger keys than that size, so key length has been increased to properly allocate all keys. Signed-off-by: Pablo de Lara Acked-by: Sujith Sankar --- drivers/net/enic/enic_clsf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/enic/enic_clsf.c b/drivers/net/enic/enic_clsf.c index 9c2abfb..656b25b 100644 --- a/drivers/net/enic/enic_clsf.c +++ b/drivers/net/enic/enic_clsf.c @@ -214,7 +214,7 @@ int enic_fdir_add_fltr(struct enic *enic, struct rte_eth_fdir_filter *params) enic->fdir.stats.add++; } - pos = rte_hash_add_key(enic->fdir.hash, (void *)key); + pos = rte_hash_add_key(enic->fdir.hash, params); enic->fdir.nodes[pos] = key; return 0; } @@ -244,7 +244,7 @@ int enic_clsf_init(struct enic *enic) struct rte_hash_parameters hash_params = { .name = "enicpmd_clsf_hash", .entries = ENICPMD_CLSF_HASH_ENTRIES, - .key_len = RTE_HASH_KEY_LENGTH_MAX, + .key_len = sizeof(struct rte_eth_fdir_filter), .hash_func = DEFAULT_HASH_FUNC, .hash_func_init_val = 0, .socket_id = SOCKET_0, -- 2.5.1
[dpdk-dev] [PATCH v2 0/3] clean deprecated code in hash library
This patchset removes all deprecated macros and functions from the hash library. Then the DPDK version can be changed to 2.2.0-rc0. Changes in v2: - increment hash library version - merge hash patches - increment DPDK version Pablo de Lara (2): enic: use appropriate key length in hash table hash: remove deprecated function and macros Thomas Monjalon (1): version: 2.2.0-rc0 app/test/test_hash.c| 7 --- app/test/test_hash_functions.c | 4 ++-- app/test/test_hash_perf.c | 2 +- doc/guides/rel_notes/deprecation.rst| 5 - doc/guides/rel_notes/release_2_2.rst| 5 - drivers/net/enic/enic_clsf.c| 4 ++-- lib/librte_eal/common/include/rte_version.h | 6 +++--- lib/librte_hash/Makefile| 2 +- lib/librte_hash/rte_hash.h | 6 -- lib/librte_hash/rte_jhash.h | 15 ++- 10 files changed, 19 insertions(+), 37 deletions(-) -- 2.5.1
[dpdk-dev] [PATCH 1/1] ip_frag: fix creating ipv6 fragment extension header
Hi Piotr, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Piotr > Sent: Wednesday, September 02, 2015 3:13 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 1/1] ip_frag: fix creating ipv6 fragment extension > header > > From: Piotr Azarewicz > > Previous implementation won't work on every environment. The order of > allocation of bit-fields within a unit (high-order to low-order or > low-order to high-order) is implementation-defined. > Solution: used bytes instead of bit fields. Seems like right thing to do to me. Though I think we also should replace: union { struct { uint16_t frag_offset:13; /**< Offset from the start of the packet */ uint16_t reserved2:2; /**< Reserved */ uint16_t more_frags:1; /**< 1 if more fragments left, 0 if last fragment */ }; uint16_t frag_data; /**< union of all fragmentation data */ }; With just: uint16_t frag_data; and probably provide macros to read/set fragment_offset and more_flags values. Otherwise people might keep using the wrong layout. Konstantin > > Signed-off-by: Piotr Azarewicz > --- > lib/librte_ip_frag/rte_ipv6_fragmentation.c |6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c > b/lib/librte_ip_frag/rte_ipv6_fragmentation.c > index 0e32aa8..7342421 100644 > --- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c > +++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c > @@ -65,10 +65,8 @@ __fill_ipv6hdr_frag(struct ipv6_hdr *dst, > > fh = (struct ipv6_extension_fragment *) ++dst; > fh->next_header = src->proto; > - fh->reserved1 = 0; > - fh->frag_offset = rte_cpu_to_be_16(fofs); > - fh->reserved2 = 0; > - fh->more_frags = rte_cpu_to_be_16(mf); > + fh->reserved1 = 0; > + fh->frag_data = rte_cpu_to_be_16((fofs & ~IPV6_HDR_FO_MASK) | mf); > fh->id = 0; > } > > -- > 1.7.9.5
[dpdk-dev] [RFC PATCH 01/18] librte_eal: remove type field from rte_driver structure.
2015-09-04 12:01, Bernard Iremonger: > Signed-off-by: Bernard Iremonger There is no explanation in this patch. > - if (driver->type != PMD_PDEV) > - continue; > - /* PDEV drivers don't get passed any parameters */ > - driver->init(NULL, NULL); > + > + /* PCI drivers don't get passed any parameters */ > + /* > + * Search a virtual driver prefix in device name. > + * It should not be found for PCI devices. > + * Use strncmp to compare. > + */ > + > + if ((driver->name) && > + (strncmp(driver->name, "eth_", strlen("eth_")) != 0)) { > + driver->init(NULL, NULL); > + } You don't need to submit a full patchset with changes in every drivers for a RFC. Having just this patch is enough to have an opinion. Here it is a nack. We need to have a common init path instead of the current VDEV/PDEV branches. And instead of "pmd_type", a bus information would be more meaningful. So just replacing a type by a magical string is worst. Please don't try to fix wrong problems and focus on your goal. We had some discussions about possible PCI EAL refactoring but it probably needs to be done step by step with a clear cleaning motivation at each step. I think other people involved in EAL will have other ideas.
[dpdk-dev] [PATCH] app/testpmd: add engine for UDP echo server support
Adapt the ICMP echo code to reply to UDP echo requests on port 7. The testpmd forward engine udpecho is used for that. Signed-off-by: Thadeu Lima de Souza Cascardo --- app/test-pmd/config.c | 7 ++- app/test-pmd/icmpecho.c | 90 ++--- app/test-pmd/testpmd.c | 1 + app/test-pmd/testpmd.h | 1 + doc/guides/testpmd_app_ug/run_app.rst | 2 +- doc/guides/testpmd_app_ug/testpmd_funcs.rst | 6 +- 6 files changed, 79 insertions(+), 28 deletions(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index cf2aa6e..0b5c4e6 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -1239,7 +1239,7 @@ dcb_fwd_config_setup(void) } static void -icmp_echo_config_setup(void) +echo_config_setup(void) { portid_t rxp; queueid_t rxq; @@ -1297,8 +1297,9 @@ void fwd_config_setup(void) { cur_fwd_config.fwd_eng = cur_fwd_eng; - if (strcmp(cur_fwd_eng->fwd_mode_name, "icmpecho") == 0) { - icmp_echo_config_setup(); + if (strcmp(cur_fwd_eng->fwd_mode_name, "icmpecho") == 0 || + strcmp(cur_fwd_eng->fwd_mode_name, "udpecho") == 0) { + echo_config_setup(); return; } if ((nb_rxq > 1) && (nb_txq > 1)){ diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c index e510f9b..a7f882a 100644 --- a/app/test-pmd/icmpecho.c +++ b/app/test-pmd/icmpecho.c @@ -61,6 +61,7 @@ #include #include #include +#include #include #include "testpmd.h" @@ -301,7 +302,7 @@ ipv4_hdr_cksum(struct ipv4_hdr *ip_h) * send back ICMP echo replies. */ static void -reply_to_icmp_echo_rqsts(struct fwd_stream *fs) +reply_to_echo_rqsts(struct fwd_stream *fs, int proto) { struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; struct rte_mbuf *pkt; @@ -310,6 +311,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs) struct arp_hdr *arp_h; struct ipv4_hdr *ip_h; struct icmp_hdr *icmp_h; + struct udp_hdr *udp_h; struct ether_addr eth_addr; uint32_t ip_addr; uint16_t nb_rx; @@ -319,6 +321,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs) uint16_t vlan_id; uint16_t arp_op; uint16_t arp_pro; + uint16_t udp_port; uint32_t cksum; uint8_t i; int l2_len; @@ -448,24 +451,40 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs) ip_proto_name(ip_h->next_proto_id)); } - /* -* Check if packet is a ICMP echo request. -*/ - icmp_h = (struct icmp_hdr *) ((char *)ip_h + - sizeof(struct ipv4_hdr)); - if (! ((ip_h->next_proto_id == IPPROTO_ICMP) && - (icmp_h->icmp_type == IP_ICMP_ECHO_REQUEST) && - (icmp_h->icmp_code == 0))) { - rte_pktmbuf_free(pkt); - continue; + if (proto == IPPROTO_ICMP) { + /* +* Check if packet is a ICMP echo request. +*/ + icmp_h = (struct icmp_hdr *) ((char *)ip_h + + sizeof(struct ipv4_hdr)); + if (! ((ip_h->next_proto_id == IPPROTO_ICMP) && + (icmp_h->icmp_type == IP_ICMP_ECHO_REQUEST) && + (icmp_h->icmp_code == 0))) { + rte_pktmbuf_free(pkt); + continue; + } + } else if (proto == IPPROTO_UDP) { + udp_h = (struct udp_hdr *) ((char *)ip_h + + sizeof(struct ipv4_hdr)); + if ((ip_h->next_proto_id != IPPROTO_UDP) && + (rte_be_to_cpu_16(udp_h->dst_port) != 7)) { + rte_pktmbuf_free(pkt); + continue; + } } - if (verbose_level > 0) - printf(" ICMP: echo request seq id=%d\n", - rte_be_to_cpu_16(icmp_h->icmp_seq_nb)); + if (proto == IPPROTO_ICMP) { + if (verbose_level > 0) + printf(" ICMP: echo request seq id=%d\n", + rte_be_to_cpu_16(icmp_h->icmp_seq_nb)); + } else if (proto == IPPROTO_UDP) { + if (verbose_level > 0) + printf(" UDP: echo request from port=%d\n", + rte_be_to_cpu_16(udp_h->src_port)); + } /* -* Prepare ICMP echo reply to be sent back. +* Prepare ICMP or UDP
[dpdk-dev] [PATCH 4/4] virtio: use any layout on transmit
Virtio supports a feature that allows sender to put transmit header prepended to data. It requires that the mbuf be writeable, correct alignment, and the feature has been negotiatied. If all this works out, then it will be the optimum way to transmit a single segment packet. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtio_rxtx.c | 67 ++ 2 files changed, 49 insertions(+), 21 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 07a9265..f260fbb 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -65,7 +65,8 @@ 1u << VIRTIO_NET_F_CTRL_RX | \ 1u << VIRTIO_NET_F_CTRL_VLAN | \ 1u << VIRTIO_NET_F_MRG_RXBUF | \ -1u << VIRTIO_RING_F_INDIRECT_DESC) +1u << VIRTIO_RING_F_INDIRECT_DESC| \ +1u << VIRTIO_F_ANY_LAYOUT) /* * CQ function prototype diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 8979695..5ec9b29 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -200,13 +200,14 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie) static int virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, - int use_indirect) + int use_indirect, int can_push) { struct vq_desc_extra *dxp; struct vring_desc *start_dp; uint16_t seg_num = cookie->nb_segs; - uint16_t needed = use_indirect ? 1 : 1 + seg_num; + uint16_t needed = use_indirect ? 1 : !can_push + seg_num; uint16_t head_idx, idx; + uint16_t head_size = txvq->hw->vtnet_hdr_size; unsigned long offs; if (unlikely(txvq->vq_free_cnt == 0)) @@ -236,27 +237,31 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, idx = 0; } - offs = offsetof(struct virtio_tx_region, tx_hdr) - + idx * sizeof(struct virtio_tx_region); + if (can_push) { + /* put on zero'd transmit header (no offloads) */ + void *hdr = rte_pktmbuf_prepend(cookie, head_size); - start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs; - start_dp[idx].len = txvq->hw->vtnet_hdr_size; - start_dp[idx].flags = VRING_DESC_F_NEXT; + memset(hdr, 0, head_size); + } else { + offs = offsetof(struct virtio_tx_region, tx_hdr) + + idx * sizeof(struct virtio_tx_region); - for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) { + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs; + start_dp[idx].len = head_size; + start_dp[idx].flags = VRING_DESC_F_NEXT; idx = start_dp[idx].next; + } + + for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) { start_dp[idx].addr = RTE_MBUF_DATA_DMA_ADDR(cookie); start_dp[idx].len = cookie->data_len; - start_dp[idx].flags = VRING_DESC_F_NEXT; cookie = cookie->next; + start_dp[idx].flags = cookie ? VRING_DESC_F_NEXT : 0; + idx = start_dp[idx].next; } - start_dp[idx].flags &= ~VRING_DESC_F_NEXT; - if (use_indirect) idx = txvq->vq_ring.desc[head_idx].next; - else - idx = start_dp[idx].next; txvq->vq_desc_head_idx = idx; if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END) @@ -762,6 +767,26 @@ virtio_recv_mergeable_pkts(void *rx_queue, return nb_rx; } +/* Evaluate whether the virtio header can just be put in place in the mbuf */ +static int virtio_xmit_push_ok(const struct virtqueue *txvq, + const struct rte_mbuf *m) +{ + if (rte_mbuf_refcnt_read(m) != 1) + return 0; /* no mbuf is shared */ + + if (rte_pktmbuf_headroom(m) < txvq->hw->vtnet_hdr_size) + return 0; /* no space in headroom */ + + if (!rte_is_aligned(rte_pktmbuf_mtod(m, char *), + sizeof(struct virtio_net_hdr_mrg_rxbuf))) + return 0; /* not alligned */ + + if (m->nb_segs > 1) + return 0; /* better off using indirect */ + + return vtpci_with_feature(txvq->hw, VIRTIO_F_ANY_LAYOUT); +} + uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { @@ -781,14 +806,16 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { struct rte_mbuf *txm = tx_pkts[nb_tx]; - int use_indirect, slots, need; - - use_indirect = vtpci_with_feature(txvq->hw, -
[dpdk-dev] [PATCH 3/4] virtio: use indirect ring elements
The virtio ring in QEMU/KVM is usually limited to 256 entries and the normal way that virtio driver was queuing mbufs required nsegs + 1 ring elements. By using the indirect ring element feature if available, each packet will take only one ring slot even for multi-segment packets. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_ethdev.c | 11 +--- drivers/net/virtio/virtio_ethdev.h | 3 ++- drivers/net/virtio/virtio_rxtx.c | 51 ++ drivers/net/virtio/virtqueue.h | 8 ++ 4 files changed, 57 insertions(+), 16 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 465d3cd..bcfb87b 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -359,12 +359,15 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, if (queue_type == VTNET_TQ) { /* * For each xmit packet, allocate a virtio_net_hdr +* and indirect ring elements */ snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone", - dev->data->port_id, queue_idx); - vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name, - vq_size * hw->vtnet_hdr_size, - socket_id, 0, RTE_CACHE_LINE_SIZE); +dev->data->port_id, queue_idx); + + vq->virtio_net_hdr_mz = + rte_memzone_reserve_aligned(vq_name, + vq_size * sizeof(struct virtio_tx_region), + socket_id, 0, RTE_CACHE_LINE_SIZE); if (vq->virtio_net_hdr_mz == NULL) { if (rte_errno == EEXIST) vq->virtio_net_hdr_mz = diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 9026d42..07a9265 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -64,7 +64,8 @@ 1u << VIRTIO_NET_F_CTRL_VQ | \ 1u << VIRTIO_NET_F_CTRL_RX | \ 1u << VIRTIO_NET_F_CTRL_VLAN | \ -1u << VIRTIO_NET_F_MRG_RXBUF) +1u << VIRTIO_NET_F_MRG_RXBUF | \ +1u << VIRTIO_RING_F_INDIRECT_DESC) /* * CQ function prototype diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index dbe6665..8979695 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -199,14 +199,15 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie) } static int -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie) +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie, + int use_indirect) { struct vq_desc_extra *dxp; struct vring_desc *start_dp; uint16_t seg_num = cookie->nb_segs; - uint16_t needed = 1 + seg_num; + uint16_t needed = use_indirect ? 1 : 1 + seg_num; uint16_t head_idx, idx; - uint16_t head_size = txvq->hw->vtnet_hdr_size; + unsigned long offs; if (unlikely(txvq->vq_free_cnt == 0)) return -ENOSPC; @@ -220,11 +221,26 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie) dxp = >vq_descx[idx]; dxp->cookie = (void *)cookie; dxp->ndescs = needed; - start_dp = txvq->vq_ring.desc; - start_dp[idx].addr = - txvq->virtio_net_hdr_mem + idx * head_size; - start_dp[idx].len = (uint32_t)head_size; + + if (use_indirect) { + offs = offsetof(struct virtio_tx_region, tx_indir) + + idx * sizeof(struct virtio_tx_region); + + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs; + start_dp[idx].len = sizeof(struct vring_desc); + start_dp[idx].flags = VRING_DESC_F_INDIRECT; + + start_dp = (struct vring_desc *) + ((char *)txvq->virtio_net_hdr_mz->addr + offs); + idx = 0; + } + + offs = offsetof(struct virtio_tx_region, tx_hdr) + + idx * sizeof(struct virtio_tx_region); + + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs; + start_dp[idx].len = txvq->hw->vtnet_hdr_size; start_dp[idx].flags = VRING_DESC_F_NEXT; for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) { @@ -236,7 +252,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie) } start_dp[idx].flags &= ~VRING_DESC_F_NEXT; - idx = start_dp[idx].next; + + if (use_indirect) + idx = txvq->vq_ring.desc[head_idx].next; + else + idx = start_dp[idx].next; + txvq->vq_desc_head_idx = idx; if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
[dpdk-dev] [PATCH 2/4] virtio: don't use unlikely for normal tx stuff
Don't use unlikely() for VLAN or ring getting full. GCC will not optimize code in unlikely paths and since these can happen with normal code that can hurt performance. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_rxtx.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 5b50ed0..dbe6665 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -763,7 +763,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) int need = txm->nb_segs - txvq->vq_free_cnt + 1; /* Positive value indicates it need free vring descriptors */ - if (unlikely(need > 0)) { + if (need > 0) { nb_used = VIRTQUEUE_NUSED(txvq); virtio_rmb(); need = RTE_MIN(need, (int)nb_used); @@ -778,7 +778,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) } /* Do VLAN tag insertion */ - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) { + if (txm->ol_flags & PKT_TX_VLAN_PKT) { error = rte_vlan_insert(); if (unlikely(error)) { rte_pktmbuf_free(txm); @@ -798,10 +798,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) break; } txvq->bytes += txm->pkt_len; + ++txvq->packets; } - txvq->packets += nb_tx; - if (likely(nb_tx)) { vq_update_avail_idx(txvq); -- 2.1.4
[dpdk-dev] [PATCH 1/4] virtio: clean up space checks on xmit
The space check for transmit ring only needs a single conditional. I.e only need to recheck for space if there was no space in first check. This can help performance and simplifies loop. Signed-off-by: Stephen Hemminger --- drivers/net/virtio/virtio_rxtx.c | 66 1 file changed, 27 insertions(+), 39 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index c5b53bb..5b50ed0 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -745,7 +745,6 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { struct virtqueue *txvq = tx_queue; - struct rte_mbuf *txm; uint16_t nb_used, nb_tx; int error; @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh)) virtio_xmit_cleanup(txvq, nb_used); - nb_tx = 0; + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { + struct rte_mbuf *txm = tx_pkts[nb_tx]; + int need = txm->nb_segs - txvq->vq_free_cnt + 1; - while (nb_tx < nb_pkts) { - /* Need one more descriptor for virtio header. */ - int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1; - - /*Positive value indicates it need free vring descriptors */ + /* Positive value indicates it need free vring descriptors */ if (unlikely(need > 0)) { nb_used = VIRTQUEUE_NUSED(txvq); virtio_rmb(); need = RTE_MIN(need, (int)nb_used); virtio_xmit_cleanup(txvq, need); - need = (int)tx_pkts[nb_tx]->nb_segs - - txvq->vq_free_cnt + 1; - } - - /* -* Zero or negative value indicates it has enough free -* descriptors to use for transmitting. -*/ - if (likely(need <= 0)) { - txm = tx_pkts[nb_tx]; - - /* Do VLAN tag insertion */ - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) { - error = rte_vlan_insert(); - if (unlikely(error)) { - rte_pktmbuf_free(txm); - ++nb_tx; - continue; - } + need = txm->nb_segs - txvq->vq_free_cnt + 1; + if (unlikely(need > 0)) { + PMD_TX_LOG(ERR, + "No free tx descriptors to transmit"); + break; } + } - /* Enqueue Packet buffers */ - error = virtqueue_enqueue_xmit(txvq, txm); + /* Do VLAN tag insertion */ + if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) { + error = rte_vlan_insert(); if (unlikely(error)) { - if (error == ENOSPC) - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0"); - else if (error == EMSGSIZE) - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1"); - else - PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error); - break; + rte_pktmbuf_free(txm); + continue; } - nb_tx++; - txvq->bytes += txm->pkt_len; - } else { - PMD_TX_LOG(ERR, "No free tx descriptors to transmit"); + } + + /* Enqueue Packet buffers */ + error = virtqueue_enqueue_xmit(txvq, txm); + if (unlikely(error)) { + if (error == ENOSPC) + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0"); + else if (error == EMSGSIZE) + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1"); + else + PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error); break; } + txvq->bytes += txm->pkt_len; } txvq->packets += nb_tx; -- 2.1.4
[dpdk-dev] [PATCH 0/4] RFC virtio performance enhancement and cleanups
These are compile tested only, haven't debugged or checked out the corner case. Submitted for discussion and future planning. Stephen Hemminger (4): virtio: clean up space checks on xmit virtio: don't use unlikely for normal tx stuff virtio: use indirect ring elements virtio: use any layout on transmit drivers/net/virtio/virtio_ethdev.c | 11 ++- drivers/net/virtio/virtio_ethdev.h | 4 +- drivers/net/virtio/virtio_rxtx.c | 151 - drivers/net/virtio/virtqueue.h | 8 ++ 4 files changed, 115 insertions(+), 59 deletions(-) -- 2.1.4
[dpdk-dev] [RFC PATCH 00/18] refactor eal driver registration code
On Fri, Sep 04, 2015 at 01:46:11PM +0100, Iremonger, Bernard wrote: > Hi Bruce, > > > Subject: Re: [dpdk-dev] [RFC PATCH 00/18] refactor eal driver registration > > code > > > > On Fri, Sep 04, 2015 at 12:01:36PM +0100, Bernard Iremonger wrote: > > > At present the eal driver registration code is more complicated than > > > it needs to be. > > > > > > This RFC proposes to simplify the eal driver registration code. > > > > > > Remove the type field from the eal driver structure. > > > Refactor the eal driver registration code to use the name field in the > > > eal driver structure instead of the type field. > > > > > > Modify all PMD's to use the modified eal driver structure. > > > Initialise the name field in the eal driver structure in some PMD's > > > where it is not initialised at present. > > > > > > > > Hi, > > > > I don't think I like this approach very much. It seems very brittle to > > remove > > the explicit type field and starting to rely on the drivers putting a > > prefix in the > > name instead i.e. implicit typing. > > > > What is the major concern with marking drivers as virtual or physical? My > > thinking is that we should keep the type field, just perhaps change PDEV to > > be more descriptive in identifying the type of physical device, e.g. > > DEV_PCI. > > > > Regards, > > /Bruce > > The eth_ prefix is already required for vdev's for example: > testpmd -c f -n 4 --vdev='eth_pcap0,iface=eth0' > testpmd -c f -n 4 --vdev=eth_ring0 > > The eth_ prefix should not be used for pdev's. > > Keeping the type field and name field is duplicating information > > Regards, > > Bernard. Hi Bernard, It's duplicating information until such a time as we decide to relax the restriction on having vdev's starting with "eth" or we want to have a driver for a physical nic starting with "eth". :-) Overall, I'm not seeing the need for this particular patchset right now. I think your previous patchset - removing the need for a pci_dev structure on vdevs - as being the more important change for cleaning up our code. Regards, /Bruce
[dpdk-dev] PMD/l3fwd issue
> -Original Message- > From: Harish Patil [mailto:harish.patil at qlogic.com] > Sent: Friday, September 04, 2015 2:08 PM > To: Ananyev, Konstantin; dev at dpdk.org > Cc: Ameen Rahman > Subject: Re: PMD/l3fwd issue > > Hi Konstantin, > > >Hi Patil, > > > >> -Original Message- > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Harish Patil > >> Sent: Thursday, September 03, 2015 4:53 PM > >> To: dev at dpdk.org > >> Subject: [dpdk-dev] PMD/l3fwd issue > >> > >> Hello, > >> Have a question regarding l3fwd application. The l3fwd application > >>expects > >> the poll mode driver to return packets whose L2 header is 16-byte > >>aligned. > > > >Yep, and as I remember, by default PMD returns ti the upper layer mbufs > >with data offsets > >aligned to cahce line size (64B). > >Unless you'll change RTE_PKTMBUF_HEADROOM config parameter. > > > >> Otherwise, it results in a crash. This is due to use of _mm_load_si128() > >> and _mm_store_si128() intrinsics which expects the address to be 16-byte > >> aligned. However, most of the real protocol stack expects packets such > >> that its IP header be aligned on a 16-byte boundary (not L2). Its not > >>just > >> for IP but any L3 for that matter. That?s way we usually see > >> skb_reserve(skb, NET_IP_ALIGN) calls in linux drivers. > > > >Well, l3fwd is just an example application to demonstrate usage of DPDK > >API > >And max performance it could get for that type of workload. > >No-one forces you to use aligned load/store in your own application. > > Yes, I agree if its our private application. But l3fwd being widely used > as a benchmarking/testing tool and they may ran into this issue. > If someone would try to run it with RTE_PKTMBUF_HEADROOM non-aligned on 16B, then probably yes. > > > >> > >> So I?m looking for suggestions here, whether l3wd application or poll > >>mode > >> driver should be changed to fix that? What is the right thing to do? > >> Can a check be added in l3fwd to use _mm_loadu_si128/_mm_storeu_si128 > >> instructions instead of mm_load_si128/_mm_store_si128 if address is > >>found > >> not be 16B aligned? > > > >I'd personally just change l3fwd to use to use > >_mm_loadu_si128/_mm_storeu_si128 unconditionally. > >As by default address is 16B aligned anyway, I think that using MOVDQU > >instead of MOVDQA here > >shouldn't make that big difference. > >But off course testing need to be done to make sure there is no > >performance drop with that change. > > I too would just change l3fwd application so that all poll mode drivers > would just work. Are you proposing that we upstream l3fwd change if we > don?t see performance drop? Yep, I'd suggest to verify there is no performance difference and submit a patch.
[dpdk-dev] PMD/l3fwd issue
Hi Konstantin, >Hi Patil, > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Harish Patil >> Sent: Thursday, September 03, 2015 4:53 PM >> To: dev at dpdk.org >> Subject: [dpdk-dev] PMD/l3fwd issue >> >> Hello, >> Have a question regarding l3fwd application. The l3fwd application >>expects >> the poll mode driver to return packets whose L2 header is 16-byte >>aligned. > >Yep, and as I remember, by default PMD returns ti the upper layer mbufs >with data offsets >aligned to cahce line size (64B). >Unless you'll change RTE_PKTMBUF_HEADROOM config parameter. > >> Otherwise, it results in a crash. This is due to use of _mm_load_si128() >> and _mm_store_si128() intrinsics which expects the address to be 16-byte >> aligned. However, most of the real protocol stack expects packets such >> that its IP header be aligned on a 16-byte boundary (not L2). Its not >>just >> for IP but any L3 for that matter. That?s way we usually see >> skb_reserve(skb, NET_IP_ALIGN) calls in linux drivers. > >Well, l3fwd is just an example application to demonstrate usage of DPDK >API >And max performance it could get for that type of workload. >No-one forces you to use aligned load/store in your own application. Yes, I agree if its our private application. But l3fwd being widely used as a benchmarking/testing tool and they may ran into this issue. > >> >> So I?m looking for suggestions here, whether l3wd application or poll >>mode >> driver should be changed to fix that? What is the right thing to do? >> Can a check be added in l3fwd to use _mm_loadu_si128/_mm_storeu_si128 >> instructions instead of mm_load_si128/_mm_store_si128 if address is >>found >> not be 16B aligned? > >I'd personally just change l3fwd to use to use >_mm_loadu_si128/_mm_storeu_si128 unconditionally. >As by default address is 16B aligned anyway, I think that using MOVDQU >instead of MOVDQA here >shouldn't make that big difference. >But off course testing need to be done to make sure there is no >performance drop with that change. I too would just change l3fwd application so that all poll mode drivers would just work. Are you proposing that we upstream l3fwd change if we don?t see performance drop? >Konstantin > >> >> Thanks, >> Harish >> >> >> >> >> >> This message and any attached documents contain information from the >>sending company or its parent company(s), subsidiaries, >> divisions or branch offices that may be confidential. If you are not >>the intended recipient, you may not read, copy, distribute, or use >> this information. If you have received this transmission in error, >>please notify the sender immediately by reply e-mail and then delete >> this message. > This message and any attached documents contain information from the sending company or its parent company(s), subsidiaries, divisions or branch offices that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
[dpdk-dev] [RFC PATCH 00/18] refactor eal driver registration code
Hi Bruce, > Subject: Re: [dpdk-dev] [RFC PATCH 00/18] refactor eal driver registration > code > > On Fri, Sep 04, 2015 at 12:01:36PM +0100, Bernard Iremonger wrote: > > At present the eal driver registration code is more complicated than > > it needs to be. > > > > This RFC proposes to simplify the eal driver registration code. > > > > Remove the type field from the eal driver structure. > > Refactor the eal driver registration code to use the name field in the > > eal driver structure instead of the type field. > > > > Modify all PMD's to use the modified eal driver structure. > > Initialise the name field in the eal driver structure in some PMD's > > where it is not initialised at present. > > > > > Hi, > > I don't think I like this approach very much. It seems very brittle to remove > the explicit type field and starting to rely on the drivers putting a prefix > in the > name instead i.e. implicit typing. > > What is the major concern with marking drivers as virtual or physical? My > thinking is that we should keep the type field, just perhaps change PDEV to > be more descriptive in identifying the type of physical device, e.g. DEV_PCI. > > Regards, > /Bruce The eth_ prefix is already required for vdev's for example: testpmd -c f -n 4 --vdev='eth_pcap0,iface=eth0' testpmd -c f -n 4 --vdev=eth_ring0 The eth_ prefix should not be used for pdev's. Keeping the type field and name field is duplicating information Regards, Bernard.
[dpdk-dev] PMD/l3fwd issue
Hi Patil, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Harish Patil > Sent: Thursday, September 03, 2015 4:53 PM > To: dev at dpdk.org > Subject: [dpdk-dev] PMD/l3fwd issue > > Hello, > Have a question regarding l3fwd application. The l3fwd application expects > the poll mode driver to return packets whose L2 header is 16-byte aligned. Yep, and as I remember, by default PMD returns ti the upper layer mbufs with data offsets aligned to cahce line size (64B). Unless you'll change RTE_PKTMBUF_HEADROOM config parameter. > Otherwise, it results in a crash. This is due to use of _mm_load_si128() > and _mm_store_si128() intrinsics which expects the address to be 16-byte > aligned. However, most of the real protocol stack expects packets such > that its IP header be aligned on a 16-byte boundary (not L2). Its not just > for IP but any L3 for that matter. That?s way we usually see > skb_reserve(skb, NET_IP_ALIGN) calls in linux drivers. Well, l3fwd is just an example application to demonstrate usage of DPDK API And max performance it could get for that type of workload. No-one forces you to use aligned load/store in your own application. > > So I?m looking for suggestions here, whether l3wd application or poll mode > driver should be changed to fix that? What is the right thing to do? > Can a check be added in l3fwd to use _mm_loadu_si128/_mm_storeu_si128 > instructions instead of mm_load_si128/_mm_store_si128 if address is found > not be 16B aligned? I'd personally just change l3fwd to use to use _mm_loadu_si128/_mm_storeu_si128 unconditionally. As by default address is 16B aligned anyway, I think that using MOVDQU instead of MOVDQA here shouldn't make that big difference. But off course testing need to be done to make sure there is no performance drop with that change. Konstantin > > Thanks, > Harish > > > > > > This message and any attached documents contain information from the sending > company or its parent company(s), subsidiaries, > divisions or branch offices that may be confidential. If you are not the > intended recipient, you may not read, copy, distribute, or use > this information. If you have received this transmission in error, please > notify the sender immediately by reply e-mail and then delete > this message.
[dpdk-dev] testpmd - configuration of the fdir filter
Hi, I want to use the fdir filtering on a NIC based on the Intel 82599. I have tested the testpmd application. I configured masks and added a filter but the fdir filter never matched any packet. I even tried different masks and filters (with/without ports, TCP/UDP flow, IP prefixes, ...), but it never worked. Here is an example of commands I used during testing: ./testpmd -c 0xff -n 2 -- -i --rxq=2 --txq=2 --pkt-filter-mode=perfect --portmask=0x3 --nb-ports=2 --disable-rss testpmd> port stop 0 testpmd> flow_director_mask 0 vlan 0x src_mask 255.255.255.255 ::::::: 0x dst_mask 255.255.255.255 ::::::: 0x testpmd> flow_director_flex_mask 0 flow all (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) testpmd> port start 0 testpmd> flow_director_filter 0 add flow ipv4-tcp src 1.0.0.1 1 dst 2.0.0.1 1 vlan 0x0 flexbytes (0x00,0x00) fwd queue 1 fd_id 1 testpmd> start Then I sent generated traffic (simple packets with ethernet/IP/TCP headers) with parametrs I specified in the flow_director_filter to the NIC port 0 and all packets arrived to the queue 0. Please, could you advise me what I am doing wrong? Maybe some other configuration I didn't notice? Regards, Jan
[dpdk-dev] ixgbe: account more Rx errors Issue
> From: Andriy Berestovskyy [mailto:aber at semihalf.com] > Sent: Friday, September 4, 2015 10:38 AM > To: Tahhan, Maryam; dev at dpdk.org > Subject: ixgbe: account more Rx errors Issue > > Hi, > Updating to DPDK 2.1 I noticed an issue with the ixgbe stats. > > In commit f6bf669b9900 "ixgbe: account more Rx errors" we add XEC > hardware counter (l3_l4_xsum_error) to the ierrors now. The issue is the > UDP packets with zero check sum are counted in XEC and now in ierrors too. > > I've tried to disable hw_ip_checksum in rxmode, but it didn't help. > > I'm not sure we should add XEC to ierrors, because packets counted in XEC > are not dropped by the NIC actually. So in my case ierrors counter is now > greater than actual number of packets received by the NIC, which makes no > sense. > > What's your opinion? Hi Andriy Thanks for flagging this, I'm aware of this phenomenon, unfortunately it means we are hitting 2 hw registers on the NIC. XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors And general crc errors counts Counts the number of receive packets with CRC errors. In order for a packet to be counted in this register, it must be 64 bytes or greater (from through , inclusively) in length. This register counts all packets received, regardless of L2 filtering and receive enablement So our options are we can: 1. Add only one of these into the error stats. 2. We can introduce some cooking of stats in this scenario, so only add either or if they are equal or one is higher than the other. 3. Add them all which means you can have more errors than the number of received packets, but TBH this is going to be the case if your packets have multiple errors anyway. I'm happy to go with either 1, 2 or 3 but would like some more feedback from the community on this front. Regards Maryam > Regards, > Andriy
[dpdk-dev] [PATCH v1] change hugepage sorting to avoid overlapping memcpy
with only one hugepage or already sorted hugepage addresses, the sort function called memcpy with same src and dst pointer. Debugging with valgrind will issue a warning about overlapping area. This patch changes the bubble sort to avoid this behavior. Also, the function cannot fail any longer. Signed-off-by: Ralf Hoffmann --- lib/librte_eal/linuxapp/eal/eal_memory.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index ac2745e..6d01f61 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -699,25 +699,25 @@ error: * higher address first on powerpc). We use a slow algorithm, but we won't * have millions of pages, and this is only done at init time. */ -static int +static void sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi) { unsigned i, j; - int compare_idx; + unsigned compare_idx; uint64_t compare_addr; struct hugepage_file tmp; for (i = 0; i < hpi->num_pages[0]; i++) { - compare_addr = 0; - compare_idx = -1; + compare_addr = hugepg_tbl[i].physaddr; + compare_idx = i; /* -* browse all entries starting at 'i', and find the +* browse all entries starting at 'i+1', and find the * entry with the smallest addr */ - for (j=i; j< hpi->num_pages[0]; j++) { + for (j=i + 1; j < hpi->num_pages[0]; j++) { - if (compare_addr == 0 || + if ( #ifdef RTE_ARCH_PPC_64 hugepg_tbl[j].physaddr > compare_addr) { #else @@ -728,10 +728,9 @@ sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi) } } - /* should not happen */ - if (compare_idx == -1) { - RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", __func__); - return -1; + if (compare_idx == i) { + /* no smaller page found */ + continue; } /* swap the 2 entries in the table */ @@ -741,7 +740,8 @@ sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi) sizeof(struct hugepage_file)); memcpy(_tbl[i], , sizeof(struct hugepage_file)); } - return 0; + + return; } /* @@ -1164,8 +1164,7 @@ rte_eal_hugepage_init(void) goto fail; } - if (sort_by_physaddr(_hp[hp_offset], hpi) < 0) - goto fail; + sort_by_physaddr(_hp[hp_offset], hpi); #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS /* remap all hugepages into single file segments */ -- 2.1.4
[dpdk-dev] [RFC PATCH 18/18] xenvirt: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/xenvirt/rte_eth_xenvirt.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.c b/drivers/net/xenvirt/rte_eth_xenvirt.c index 73e8bce..4ce1730 100644 --- a/drivers/net/xenvirt/rte_eth_xenvirt.c +++ b/drivers/net/xenvirt/rte_eth_xenvirt.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -706,8 +706,7 @@ rte_pmd_xenvirt_devinit(const char *name, const char *params) } static struct rte_driver pmd_xenvirt_drv = { - .name = "eth_xenvirt", - .type = PMD_VDEV, + .name = "eth_xenvirt", /* Virtual device */ .init = rte_pmd_xenvirt_devinit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 17/18] vmxnet3: remove type field and initialise name field in rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c index a70be5c..04fff43 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c @@ -884,7 +884,7 @@ vmxnet3_process_events(struct vmxnet3_hw *hw) #endif static struct rte_driver rte_vmxnet3_driver = { - .type = PMD_PDEV, + .name = "rte_vmxnet3_driver", /* PCI device */ .init = rte_vmxnet3_pmd_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 15/18] ring: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/ring/rte_eth_ring.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c index 6fd3d0a..cbb3dc7 100644 --- a/drivers/net/ring/rte_eth_ring.c +++ b/drivers/net/ring/rte_eth_ring.c @@ -624,8 +624,7 @@ rte_pmd_ring_devuninit(const char *name) } static struct rte_driver pmd_ring_drv = { - .name = "eth_ring", - .type = PMD_VDEV, + .name = "eth_ring", /* Virtual device */ .init = rte_pmd_ring_devinit, .uninit = rte_pmd_ring_devuninit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 14/18] pcap: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/pcap/rte_eth_pcap.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c index f2e4634..fd38894 100644 --- a/drivers/net/pcap/rte_eth_pcap.c +++ b/drivers/net/pcap/rte_eth_pcap.c @@ -1104,8 +1104,7 @@ rte_pmd_pcap_devuninit(const char *name) } static struct rte_driver pmd_pcap_drv = { - .name = "eth_pcap", - .type = PMD_VDEV, + .name = "eth_pcap", /* Virtual device */ .init = rte_pmd_pcap_devinit, .uninit = rte_pmd_pcap_devuninit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 13/18] null: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/null/rte_eth_null.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c index e244595..5f9871c 100644 --- a/drivers/net/null/rte_eth_null.c +++ b/drivers/net/null/rte_eth_null.c @@ -577,8 +577,7 @@ rte_pmd_null_devuninit(const char *name) } static struct rte_driver pmd_null_drv = { - .name = "eth_null", - .type = PMD_VDEV, + .name = "eth_null", /* Virtual device */ .init = rte_pmd_null_devinit, .uninit = rte_pmd_null_devuninit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 12/18] mpipe: remove type field and update name in rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/mpipe/mpipe_tilegx.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/mpipe/mpipe_tilegx.c b/drivers/net/mpipe/mpipe_tilegx.c index 743feef..9454d4e 100644 --- a/drivers/net/mpipe/mpipe_tilegx.c +++ b/drivers/net/mpipe/mpipe_tilegx.c @@ -2,6 +2,7 @@ * BSD LICENSE * * Copyright(c) 2015 EZchip Semiconductor Ltd. All rights reserved. + * Copyright(c) 2015 Intel Corporation. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -1602,14 +1603,12 @@ rte_pmd_mpipe_devinit(const char *ifname, } static struct rte_driver pmd_mpipe_xgbe_drv = { - .name = "xgbe", - .type = PMD_VDEV, + .name = "eth_xgbe", /* Virtual device */ .init = rte_pmd_mpipe_devinit, }; static struct rte_driver pmd_mpipe_gbe_drv = { - .name = "gbe", - .type = PMD_VDEV, + .name = "eth_gbe", /* Virtual device */ .init = rte_pmd_mpipe_devinit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 11/18] mlx4: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/mlx4/mlx4.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index fa3cb7e..532307d 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -3,6 +3,7 @@ * * Copyright 2012-2015 6WIND S.A. * Copyright 2012 Mellanox. + * Copyright 2015 Intel. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -5107,8 +5108,7 @@ rte_mlx4_pmd_init(const char *name, const char *args) } static struct rte_driver rte_mlx4_driver = { - .type = PMD_PDEV, - .name = MLX4_DRIVER_NAME, + .name = MLX4_DRIVER_NAME, /* PCI device */ .init = rte_mlx4_pmd_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 10/18] ixgbe: remove type field and initialise name field in rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index b8ee1e9..d59d4b5 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -5514,12 +5514,12 @@ ixgbe_set_eeprom(struct rte_eth_dev *dev, } static struct rte_driver rte_ixgbe_driver = { - .type = PMD_PDEV, + .name = "rte_ixgbe_driver", /* PCI device */ .init = rte_ixgbe_pmd_init, }; static struct rte_driver rte_ixgbevf_driver = { - .type = PMD_PDEV, + .name = "rte_ixgbevf_driver", /* PCI device */ .init = rte_ixgbevf_pmd_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 09/18] i40e: remove type field and initialise name field in rte_driver structures
Signed-off-by: Bernard Iremonger --- drivers/net/i40e/i40e_ethdev.c| 2 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 40b0526..2d0551c 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -347,7 +347,7 @@ rte_i40e_pmd_init(const char *name __rte_unused, } static struct rte_driver rte_i40e_driver = { - .type = PMD_PDEV, + .name = "rte_i40e_driver", /* PCI device */ .init = rte_i40e_pmd_init, }; diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index b694400..fe44966 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1268,7 +1268,7 @@ rte_i40evf_pmd_init(const char *name __rte_unused, } static struct rte_driver rte_i40evf_driver = { - .type = PMD_PDEV, + .name = "rte_i40evf_driver",/* PCI device */ .init = rte_i40evf_pmd_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 08/18] fm10k: remove type field and initialise name field in rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/fm10k/fm10k_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index a69c990..bda5a81 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2312,7 +2312,7 @@ rte_pmd_fm10k_init(__rte_unused const char *name, } static struct rte_driver rte_fm10k_driver = { - .type = PMD_PDEV, + .name = "rte_fm10k_driver", /* PCI device */ .init = rte_pmd_fm10k_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 07/18] enic: remove type field and initialise name field in rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/enic/enic_ethdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 8280cea..af2c57e 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -3,6 +3,7 @@ * Copyright 2007 Nuova Systems, Inc. All rights reserved. * * Copyright (c) 2014, Cisco Systems, Inc. + * Copyright(c) 2015 Intel Corporation. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -633,7 +634,7 @@ rte_enic_pmd_init(const char *name __rte_unused, } static struct rte_driver rte_enic_driver = { - .type = PMD_PDEV, + .name = "rte_enic_driver", /* PCI device */ .init = rte_enic_pmd_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 04/18] bonding: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/bonding/rte_eth_bond_pmd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 5cc6372..0e222b2 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -2302,8 +2302,7 @@ bond_ethdev_configure(struct rte_eth_dev *dev) } static struct rte_driver bond_drv = { - .name = "eth_bond", - .type = PMD_VDEV, + .name = "eth_bond", /* Virtual device */ .init = bond_init, .uninit = bond_uninit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 03/18] bnx2x: remove type field and initialise name field in rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/bnx2x/bnx2x_ethdev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c b/drivers/net/bnx2x/bnx2x_ethdev.c index 09b5920..b25ca21 100644 --- a/drivers/net/bnx2x/bnx2x_ethdev.c +++ b/drivers/net/bnx2x/bnx2x_ethdev.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2013-2015 Brocade Communications Systems, Inc. + * Copyright(c) 2015 Intel Corporation. * * All rights reserved. */ @@ -529,12 +530,12 @@ static int rte_bnx2xvf_pmd_init(const char *name __rte_unused, const char *param } static struct rte_driver rte_bnx2x_driver = { - .type = PMD_PDEV, + .name = "rte_bnx2x_driver", /* PCI device */ .init = rte_bnx2x_pmd_init, }; static struct rte_driver rte_bnx2xvf_driver = { - .type = PMD_PDEV, + .name = "rte_bnx2xvf_driver", /* PCI device */ .init = rte_bnx2xvf_pmd_init, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 02/18] af_packet: remove type field from rte_driver structure
Signed-off-by: Bernard Iremonger --- drivers/net/af_packet/rte_eth_af_packet.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index bdd9628..0ce6540 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -5,7 +5,7 @@ * * Originally based upon librte_pmd_pcap code: * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * Copyright(c) 2014 6WIND S.A. * All rights reserved. * @@ -839,8 +839,7 @@ exit: } static struct rte_driver pmd_af_packet_drv = { - .name = "eth_af_packet", - .type = PMD_VDEV, + .name = "eth_af_packet", /* Virtual device */ .init = rte_pmd_af_packet_devinit, }; -- 1.9.1
[dpdk-dev] [RFC PATCH 01/18] librte_eal: remove type field from rte_driver structure.
Signed-off-by: Bernard Iremonger --- lib/librte_eal/common/eal_common_dev.c | 22 +- lib/librte_eal/common/include/rte_dev.h | 11 +-- 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 4089d66..ccfbb8c 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * Copyright(c) 2014 6WIND S.A. * All rights reserved. * @@ -72,8 +72,6 @@ rte_eal_vdev_init(const char *name, const char *args) return -EINVAL; TAILQ_FOREACH(driver, _driver_list, next) { - if (driver->type != PMD_VDEV) - continue; /* * search a driver prefix in virtual device name. @@ -117,10 +115,18 @@ rte_eal_dev_init(void) /* Once the vdevs are initalized, start calling all the pdev drivers */ TAILQ_FOREACH(driver, _driver_list, next) { - if (driver->type != PMD_PDEV) - continue; - /* PDEV drivers don't get passed any parameters */ - driver->init(NULL, NULL); + + /* PCI drivers don't get passed any parameters */ + /* +* Search a virtual driver prefix in device name. +* It should not be found for PCI devices. +* Use strncmp to compare. +*/ + + if ((driver->name) && + (strncmp(driver->name, "eth_", strlen("eth_")) != 0)) { + driver->init(NULL, NULL); + } } return 0; } @@ -134,8 +140,6 @@ rte_eal_vdev_uninit(const char *name) return -EINVAL; TAILQ_FOREACH(driver, _driver_list, next) { - if (driver->type != PMD_VDEV) - continue; /* * search a driver prefix in virtual device name. diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index f601d21..6253185 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -62,20 +62,11 @@ typedef int (rte_dev_init_t)(const char *name, const char *args); typedef int (rte_dev_uninit_t)(const char *name); /** - * Driver type enumeration - */ -enum pmd_type { - PMD_VDEV = 0, - PMD_PDEV = 1, -}; - -/** * A structure describing a device driver. */ struct rte_driver { TAILQ_ENTRY(rte_driver) next; /**< Next in list. */ - enum pmd_type type;/**< PMD Driver type */ - const char *name; /**< Driver name. */ + const char *name; /**< Driver name. */ rte_dev_init_t *init; /**< Device init. function. */ rte_dev_uninit_t *uninit; /**< Device uninit. function. */ }; -- 1.9.1
[dpdk-dev] [RFC PATCH 00/18] refactor eal driver registration code
At present the eal driver registration code is more complicated than it needs to be. This RFC proposes to simplify the eal driver registration code. Remove the type field from the eal driver structure. Refactor the eal driver registration code to use the name field in the eal driver structure instead of the type field. Modify all PMD's to use the modified eal driver structure. Initialise the name field in the eal driver structure in some PMD's where it is not initialised at present. Bernard Iremonger (18): librte_eal: remove type field from rte_driver structure. af_packet: remove type field from rte_driver structure bnx2x: remove type field and initialise name field in rte_driver structure bonding: remove type field from rte_driver structure cxgbe: remove type field from rte_driver structure e1000: remove type field and initialise name field in rte_driver structures enic: remove type field and initialise name field in rte_driver structure fm10k: remove type field and initialise name field in rte_driver structure i40e: remove type field and initialise name field in rte_driver structures ixgbe: remove type field and initialise name field in rte_driver structure mlx4: remove type field from rte_driver structure mpipe: remove type field and update name in rte_driver structure null: remove type field from rte_driver structure pcap: remove type field from rte_driver structure ring: remove type field from rte_driver structure virtio_ethdev: remove type field and initialise name field in rte_driver structure vmxnet3: remove type field and initialise name field in rte_driver structure xenvirt: remove type field from rte_driver structure drivers/net/af_packet/rte_eth_af_packet.c | 5 ++--- drivers/net/bnx2x/bnx2x_ethdev.c | 5 +++-- drivers/net/bonding/rte_eth_bond_pmd.c| 3 +-- drivers/net/cxgbe/cxgbe_ethdev.c | 4 ++-- drivers/net/e1000/em_ethdev.c | 2 +- drivers/net/e1000/igb_ethdev.c| 4 ++-- drivers/net/enic/enic_ethdev.c| 3 ++- drivers/net/fm10k/fm10k_ethdev.c | 2 +- drivers/net/i40e/i40e_ethdev.c| 2 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++-- drivers/net/mlx4/mlx4.c | 4 ++-- drivers/net/mpipe/mpipe_tilegx.c | 7 +++ drivers/net/null/rte_eth_null.c | 3 +-- drivers/net/pcap/rte_eth_pcap.c | 3 +-- drivers/net/ring/rte_eth_ring.c | 3 +-- drivers/net/virtio/virtio_ethdev.c| 2 +- drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +- drivers/net/xenvirt/rte_eth_xenvirt.c | 5 ++--- lib/librte_eal/common/eal_common_dev.c| 22 +- lib/librte_eal/common/include/rte_dev.h | 11 +-- 21 files changed, 44 insertions(+), 54 deletions(-) -- 1.9.1
[dpdk-dev] [PATCH v3] librte_cfgfile(rte_cfgfile.h): modify the macros values
This patch refers to the ABI change proposed for librte_cfgfile (rte_cfgfile.h). In order to allow for longer names and values, the new values of macros CFG_NAME_LEN and CFG_NAME_VAL are set. Signed-off-by: Jasvinder Singh --- doc/guides/rel_notes/deprecation.rst | 4 doc/guides/rel_notes/release_2_2.rst | 7 ++- lib/librte_cfgfile/Makefile | 2 +- lib/librte_cfgfile/rte_cfgfile.h | 9 +++-- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5f6079b..2fbdee2 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -53,10 +53,6 @@ Deprecation Notices * The scheduler statistics structure will change to allow keeping track of RED actions. -* librte_cfgfile: In order to allow for longer names and values, - the value of macros CFG_NAME_LEN and CFG_NAME_VAL will be increased. - Most likely, the new values will be 64 and 256, respectively. - * librte_port: Macros to access the packet meta-data stored within the packet buffer will be adjusted to cover the packet mbuf structure as well, as currently they are able to access any packet buffer location except the diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index abe57b4..ff64da8 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -44,6 +44,11 @@ ABI Changes * The LPM structure is changed. The deprecated field mem_location is removed. +* librte_cfgfile: In order to allow for longer names and values, + the value of macros CFG_NAME_LEN and CFG_NAME_VAL is increased, + the new values are 64 and 256, respectively + + Shared Library Versions --- @@ -54,7 +59,7 @@ The libraries prepended with a plus sign were incremented in this version. + libethdev.so.2 + librte_acl.so.2 - librte_cfgfile.so.1 + + librte_cfgfile.so.2 librte_cmdline.so.1 librte_distributor.so.1 + librte_eal.so.2 diff --git a/lib/librte_cfgfile/Makefile b/lib/librte_cfgfile/Makefile index 032c240..616aef0 100644 --- a/lib/librte_cfgfile/Makefile +++ b/lib/librte_cfgfile/Makefile @@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS) EXPORT_MAP := rte_cfgfile_version.map -LIBABIVER := 1 +LIBABIVER := 2 # # all source are stored in SRCS-y diff --git a/lib/librte_cfgfile/rte_cfgfile.h b/lib/librte_cfgfile/rte_cfgfile.h index 7c9fc91..d443782 100644 --- a/lib/librte_cfgfile/rte_cfgfile.h +++ b/lib/librte_cfgfile/rte_cfgfile.h @@ -47,8 +47,13 @@ extern "C" { * ***/ -#define CFG_NAME_LEN 32 -#define CFG_VALUE_LEN 64 +#ifndef CFG_NAME_LEN +#define CFG_NAME_LEN 64 +#endif + +#ifndef CFG_VALUE_LEN +#define CFG_VALUE_LEN 256 +#endif /** Configuration file */ struct rte_cfgfile; -- 2.1.0
[dpdk-dev] ixgbe: account more Rx errors Issue
Hi, Updating to DPDK 2.1 I noticed an issue with the ixgbe stats. In commit f6bf669b9900 "ixgbe: account more Rx errors" we add XEC hardware counter (l3_l4_xsum_error) to the ierrors now. The issue is the UDP packets with zero check sum are counted in XEC and now in ierrors too. I've tried to disable hw_ip_checksum in rxmode, but it didn't help. I'm not sure we should add XEC to ierrors, because packets counted in XEC are not dropped by the NIC actually. So in my case ierrors counter is now greater than actual number of packets received by the NIC, which makes no sense. What's your opinion? Regards, Andriy
[dpdk-dev] [RFC PATCH 03/18] bnx2x: remove type field and initialise name field in rte_driver structure
> >Signed-off-by: Bernard Iremonger >--- > drivers/net/bnx2x/bnx2x_ethdev.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > >diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c >b/drivers/net/bnx2x/bnx2x_ethdev.c >index 09b5920..b25ca21 100644 >--- a/drivers/net/bnx2x/bnx2x_ethdev.c >+++ b/drivers/net/bnx2x/bnx2x_ethdev.c >@@ -1,5 +1,6 @@ > /* > * Copyright (c) 2013-2015 Brocade Communications Systems, Inc. >+ * Copyright(c) 2015 Intel Corporation. > * > * All rights reserved. > */ >@@ -529,12 +530,12 @@ static int rte_bnx2xvf_pmd_init(const char *name >__rte_unused, const char *param > } > > static struct rte_driver rte_bnx2x_driver = { >- .type = PMD_PDEV, >+ .name = "rte_bnx2x_driver", /* PCI device */ > .init = rte_bnx2x_pmd_init, > }; > > static struct rte_driver rte_bnx2xvf_driver = { >- .type = PMD_PDEV, >+ .name = "rte_bnx2xvf_driver", /* PCI device */ > .init = rte_bnx2xvf_pmd_init, > }; > >-- >1.9.1 > > Acked-by: Harish Patil Thanks, Harish This message and any attached documents contain information from the sending company or its parent company(s), subsidiaries, divisions or branch offices that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
[dpdk-dev] i40e PMD VSI/QUEUE setting can't be satisfied
Hello All, Hopefully this is the correct place to post questions. I am a brand new users of DPDK and I am having issues starting the i40e PMD. I get the following error message when I start any of the test applications: PMD: eth_i40e_dev_init(): FW 0.0 API 0.0 NVM 04.02.04 eetrack 800013fc PMD: i40e_pf_parameter_init(): Max supported VSIs:0 PMD: i40e_pf_parameter_init(): PF queue pairs:1 PMD: i40e_pf_parameter_init(): VSI/QUEUE setting can't be satisfied PMD: i40e_pf_parameter_init(): Max VSIs: 0, asked:0 PMD: i40e_pf_parameter_init(): Total queue pairs:0, asked:1 PMD: eth_i40e_dev_init(): Failed to do parameter init: -22 EAL: Error - exiting with code: 1 Cause: Requested device :04:00.0 cannot be used I get the same error binding the device to the uio_pci_generic or igb_uio driver. My setup: CentOS 6.6 running kernel 3.18.12-11.el6.x86_64 DPDK version 2.1.0 Intel XL710 dual port NIC Any help would be appreciated. Nick Buchanan
[dpdk-dev] [PATCH 2/3] enic: use appropriate key length in hash table
On 04/09/15 2:35 pm, "Pablo de Lara" wrote: >RTE_HASH_KEY_LENGTH_MAX was deprecated, and the hash table >actually is hosting bigger keys than that size, so key length >has been increased to properly allocate all keys. > >Signed-off-by: Pablo de Lara >--- > drivers/net/enic/enic_clsf.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/drivers/net/enic/enic_clsf.c b/drivers/net/enic/enic_clsf.c >index 9c2abfb..656b25b 100644 >--- a/drivers/net/enic/enic_clsf.c >+++ b/drivers/net/enic/enic_clsf.c >@@ -214,7 +214,7 @@ int enic_fdir_add_fltr(struct enic *enic, struct >rte_eth_fdir_filter *params) > enic->fdir.stats.add++; > } > >- pos = rte_hash_add_key(enic->fdir.hash, (void *)key); >+ pos = rte_hash_add_key(enic->fdir.hash, params); > enic->fdir.nodes[pos] = key; > return 0; > } >@@ -244,7 +244,7 @@ int enic_clsf_init(struct enic *enic) > struct rte_hash_parameters hash_params = { > .name = "enicpmd_clsf_hash", > .entries = ENICPMD_CLSF_HASH_ENTRIES, >- .key_len = RTE_HASH_KEY_LENGTH_MAX, >+ .key_len = sizeof(struct rte_eth_fdir_filter), > .hash_func = DEFAULT_HASH_FUNC, > .hash_func_init_val = 0, > .socket_id = SOCKET_0, >-- Looks good. Thanks, -Sujith > >2.4.2 >
[dpdk-dev] pcap->eth low TX performance
Hello, Did anyone try to work with pcap PMD recently? We're testing our app with this setup: PCAP --- rte_eth_rx_burst--> APP-> rte_eth_tx_burst -> ethdev I'm experiencing very low TX performance leading to massive mbuf drop while trying to send those packets over the Ethernet device. I tried running ordinary l2fwd and got the same issue with over 80-90% of packets drop. When I substitute PCAP with another ordinary Ethernet device, everything works fine. Can anyone share an idea? -- Sincerely, Yerden Zhumabekov State Technical Service Astana, KZ
[dpdk-dev] [PATCH 3/3] hash: remove deprecated functions and macros
The function rte_jhash2() was renamed rte_jhash_32b and macros RTE_HASH_KEY_LENGTH_MAX and RTE_HASH_BUCKET_ENTRIES_MAX were tagged as deprecated, so they can be removed in 2.2. Signed-off-by: Pablo de Lara --- doc/guides/rel_notes/deprecation.rst | 5 - doc/guides/rel_notes/release_2_2.rst | 3 +++ lib/librte_hash/rte_hash.h | 6 -- lib/librte_hash/rte_jhash.h | 15 ++- 4 files changed, 5 insertions(+), 24 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5f6079b..fffad80 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -13,11 +13,6 @@ Deprecation Notices There is no backward compatibility planned from release 2.2. All binaries will need to be rebuilt from release 2.2. -* The Macros RTE_HASH_BUCKET_ENTRIES_MAX and RTE_HASH_KEY_LENGTH_MAX are - deprecated and will be removed with version 2.2. - -* The function rte_jhash2 is deprecated and should be removed. - * The following fields have been deprecated in rte_eth_stats: imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index abe57b4..aa44862 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -27,6 +27,9 @@ API Changes * The deprecated ring PMD functions are removed: rte_eth_ring_pair_create() and rte_eth_ring_pair_attach(). +* The function rte_jhash2() is removed. + It was replaced by rte_jhash_32b(). + ABI Changes --- diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index 1cddc07..175c0bb 100644 --- a/lib/librte_hash/rte_hash.h +++ b/lib/librte_hash/rte_hash.h @@ -49,12 +49,6 @@ extern "C" { /** Maximum size of hash table that can be created. */ #define RTE_HASH_ENTRIES_MAX (1 << 30) -/** @deprecated Maximum bucket size that can be created. */ -#define RTE_HASH_BUCKET_ENTRIES_MAX4 - -/** @deprecated Maximum length of key that can be used. */ -#define RTE_HASH_KEY_LENGTH_MAX64 - /** Maximum number of characters in hash name.*/ #define RTE_HASH_NAMESIZE 32 diff --git a/lib/librte_hash/rte_jhash.h b/lib/librte_hash/rte_jhash.h index f9a8266..457f225 100644 --- a/lib/librte_hash/rte_jhash.h +++ b/lib/librte_hash/rte_jhash.h @@ -267,10 +267,10 @@ rte_jhash_2hashes(const void *key, uint32_t length, uint32_t *pc, uint32_t *pb) } /** - * Same as rte_jhash2, but takes two seeds and return two uint32_ts. + * Same as rte_jhash_32b, but takes two seeds and return two uint32_ts. * pc and pb must be non-null, and *pc and *pb must both be initialized * with seeds. If you pass in (*pb)=0, the output (*pc) will be - * the same as the return value from rte_jhash2. + * the same as the return value from rte_jhash_32b. * * @param k * Key to calculate hash of. @@ -335,17 +335,6 @@ rte_jhash_32b(const uint32_t *k, uint32_t length, uint32_t initval) } static inline uint32_t -__attribute__ ((deprecated)) -rte_jhash2(const uint32_t *k, uint32_t length, uint32_t initval) -{ - uint32_t initval2 = 0; - - rte_jhash_32b_2hashes(k, length, , ); - - return initval; -} - -static inline uint32_t __rte_jhash_3words(uint32_t a, uint32_t b, uint32_t c, uint32_t initval) { a += RTE_JHASH_GOLDEN_RATIO + initval; -- 2.4.2
[dpdk-dev] [PATCH 2/3] enic: use appropriate key length in hash table
RTE_HASH_KEY_LENGTH_MAX was deprecated, and the hash table actually is hosting bigger keys than that size, so key length has been increased to properly allocate all keys. Signed-off-by: Pablo de Lara --- drivers/net/enic/enic_clsf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/enic/enic_clsf.c b/drivers/net/enic/enic_clsf.c index 9c2abfb..656b25b 100644 --- a/drivers/net/enic/enic_clsf.c +++ b/drivers/net/enic/enic_clsf.c @@ -214,7 +214,7 @@ int enic_fdir_add_fltr(struct enic *enic, struct rte_eth_fdir_filter *params) enic->fdir.stats.add++; } - pos = rte_hash_add_key(enic->fdir.hash, (void *)key); + pos = rte_hash_add_key(enic->fdir.hash, params); enic->fdir.nodes[pos] = key; return 0; } @@ -244,7 +244,7 @@ int enic_clsf_init(struct enic *enic) struct rte_hash_parameters hash_params = { .name = "enicpmd_clsf_hash", .entries = ENICPMD_CLSF_HASH_ENTRIES, - .key_len = RTE_HASH_KEY_LENGTH_MAX, + .key_len = sizeof(struct rte_eth_fdir_filter), .hash_func = DEFAULT_HASH_FUNC, .hash_func_init_val = 0, .socket_id = SOCKET_0, -- 2.4.2
[dpdk-dev] [PATCH 1/3] hash: use max key length as internal macro instead of deprecated one
RTE_HASH_KEY_LENGTH_MAX has been deprecated in DPDK 2.1 and it is going to be removed in 2.2, so the macro is defined internally for the memory allocation of all keys used. Signed-off-by: Pablo de Lara --- app/test/test_hash.c | 7 --- app/test/test_hash_functions.c | 4 ++-- app/test/test_hash_perf.c | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/app/test/test_hash.c b/app/test/test_hash.c index 7f8c0d3..4f2509d 100644 --- a/app/test/test_hash.c +++ b/app/test/test_hash.c @@ -66,6 +66,7 @@ static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc}; static uint32_t hashtest_initvals[] = {0}; static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 31, 32, 33, 63, 64}; +#define MAX_KEYSIZE 64 /**/ #define LOCAL_FBK_HASH_ENTRIES_MAX (1 << 15) @@ -238,7 +239,7 @@ test_crc32_hash_alg_equiv(void) static void run_hash_func_test(rte_hash_function f, uint32_t init_val, uint32_t key_len) { - static uint8_t key[RTE_HASH_KEY_LENGTH_MAX]; + static uint8_t key[MAX_KEYSIZE]; unsigned i; @@ -1100,7 +1101,7 @@ test_hash_creation_with_good_parameters(void) static int test_average_table_utilization(void) { struct rte_hash *handle; - uint8_t simple_key[RTE_HASH_KEY_LENGTH_MAX]; + uint8_t simple_key[MAX_KEYSIZE]; unsigned i, j; unsigned added_keys, average_keys_added = 0; int ret; @@ -1154,7 +1155,7 @@ static int test_hash_iteration(void) { struct rte_hash *handle; unsigned i; - uint8_t keys[NUM_ENTRIES][RTE_HASH_KEY_LENGTH_MAX]; + uint8_t keys[NUM_ENTRIES][MAX_KEYSIZE]; const void *next_key; void *next_data; void *data[NUM_ENTRIES]; diff --git a/app/test/test_hash_functions.c b/app/test/test_hash_functions.c index 8c7cf63..3ad6d80 100644 --- a/app/test/test_hash_functions.c +++ b/app/test/test_hash_functions.c @@ -85,7 +85,7 @@ static uint32_t hash_values_crc[2][10] = {{ * from the array entries is tested. */ #define HASHTEST_ITERATIONS 100 - +#define MAX_KEYSIZE 64 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc}; static uint32_t hashtest_initvals[] = {0, 0xdeadbeef}; static uint32_t hashtest_key_lens[] = { @@ -119,7 +119,7 @@ static void run_hash_func_perf_test(uint32_t key_len, uint32_t init_val, rte_hash_function f) { - static uint8_t key[HASHTEST_ITERATIONS][RTE_HASH_KEY_LENGTH_MAX]; + static uint8_t key[HASHTEST_ITERATIONS][MAX_KEYSIZE]; uint64_t ticks, start, end; unsigned i, j; diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c index a87fc80..9d53c14 100644 --- a/app/test/test_hash_perf.c +++ b/app/test/test_hash_perf.c @@ -140,7 +140,7 @@ shuffle_input_keys(unsigned table_index) { unsigned i; uint32_t swap_idx; - uint8_t temp_key[RTE_HASH_KEY_LENGTH_MAX]; + uint8_t temp_key[MAX_KEYSIZE]; hash_sig_t temp_signature; int32_t temp_position; -- 2.4.2
[dpdk-dev] [PATCH 0/3] clean deprecated code in hash library
This patchset is to remove all deprecated macros and functions from the hash library, as well as to modify the unit tests and ENIC driver that were using them. Pablo de Lara (3): hash: use max key length as internal macro instead of deprecated one enic: use appropriate key length in hash table hash: remove deprecated functions and macros app/test/test_hash.c | 7 --- app/test/test_hash_functions.c | 4 ++-- app/test/test_hash_perf.c| 2 +- doc/guides/rel_notes/deprecation.rst | 5 - doc/guides/rel_notes/release_2_2.rst | 3 +++ drivers/net/enic/enic_clsf.c | 4 ++-- lib/librte_hash/rte_hash.h | 6 -- lib/librte_hash/rte_jhash.h | 15 ++- 8 files changed, 14 insertions(+), 32 deletions(-) -- 2.4.2
[dpdk-dev] pcap->eth low TX performance
Are you reading from the pcap faster than the device can transmit? Does the app hold off reading from the pcap when the ethdev is pushing back, or does it just tail drop? On Fri, Sep 4, 2015 at 12:14 AM, Yerden Zhumabekov wrote: > Hello, > > Did anyone try to work with pcap PMD recently? We're testing our app > with this setup: > > PCAP --- rte_eth_rx_burst--> APP-> rte_eth_tx_burst -> ethdev > > I'm experiencing very low TX performance leading to massive mbuf drop > while trying to send those packets over the Ethernet device. I tried > running ordinary l2fwd and got the same issue with over 80-90% of > packets drop. When I substitute PCAP with another ordinary Ethernet > device, everything works fine. Can anyone share an idea? > > -- > Sincerely, > > Yerden Zhumabekov > State Technical Service > Astana, KZ > >
[dpdk-dev] [PATCH v2 00/10] clean deprecated code
2015-09-02 15:16, Thomas Monjalon: > Before starting a new integration cycle (2.2.0-rc0), > the deprecated code is removed. > > The hash library is not cleaned in this patchset and would be > better done by its maintainers. Bruce, Pablo, please check the > file doc/guides/rel_notes/deprecation.rst. > > Changes in v2: > - increment KNI and ring PMD versions > - list library versions in release notes > - list API/ABI changes in release notes > > Stephen Hemminger (2): > kni: remove deprecated functions > ring: remove deprecated functions > > Thomas Monjalon (8): > doc: init next release notes > ethdev: remove Rx interrupt switch > mbuf: remove packet type from offload flags > ethdev: remove SCTP flow entries switch > eal: remove deprecated function > mem: remove dummy malloc library > lpm: remove deprecated field > acl: remove old API Applied
[dpdk-dev] [PATCH v2 01/10] doc: init next release notes
2015-09-03 15:44, Mcnamara, John: > P.S. Perhaps we should announce, or maybe this will do as an announcement, > that from this release forward the Release Notes should be updated as part of > a patchset that contains one of the following: > > * New Features > * Resolved Issues (in relation to features existing in the previous > releases) > * Known Issues > * API Changes > * ABI Changes > * Shared Library Versions Maybe we should update doc/guides/contributing/documentation.rst to clearly state it.
[dpdk-dev] "cannot use T= with gcov target" when doing "makefile clean" with DPDK-2.1.0
Hi John, > -Original Message- > From: Mcnamara, John [mailto:john.mcnamara at intel.com] > Sent: mercoled? 2 settembre 2015 16:32 > To: Montorsi, Francesco ; dev at dpdk.org > Subject: RE: "cannot use T= with gcov target" when doing "makefile clean" > with DPDK-2.1.0 >... > That fix seems reasonable and you should submit it as a patch. > > There may be other ways to fix this (there are several ways to fix things > within the build system) but if you submit a patch we can get some > comments. I will submit the patch ASAP (together with a few others). I guess I need to follow closely what's written here: http://dpdk.org/dev Thanks, Francesco
[dpdk-dev] virtio optimization idea
Hi: Recently I have done one virtio optimization proof of concept. The optimization includes two parts: 1) avail ring set with fixed descriptors 2) RX vectorization With the optimizations, we could have several times of performance boost for purely vhost-virtio throughput. Here i will only cover the first part, which is the prerequisite for the second part. Let us first take RX for example. Currently when we fill the avail ring with guest mbuf, we need a) allocate one descriptor(for non sg mbuf) from free descriptors b) set the idx of the desc into the entry of avail ring c) set the addr/len field of the descriptor to point to guest blank mbuf data area Those operation takes time, and especially step b results in modifed (M) state of the cache line for the avail ring in the virtio processing core. When vhost processes the avail ring, the cache line transfer from virtio processing core to vhost processing core takes pretty much CPU cycles. To solve this problem, this is the arrangement of RX ring for DPDK pmd(for non-mergable case). avail idx + | +++---+-+--+ | 0 | 1 | 2 | ... | 254 | 255 | avail ring +-+--+-+--+-+-+-+---+--+---+ ||| | | | ||| | | | vvv | v v +-+--+-+--+-+-+-+---+--+---+ | 0 | 1 | 2 | ... | 254 | 255 | desc ring +++---+-+--+ | | +++---+-+--+ | 0 | 1 | 2 | | 254 | 255 | used ring +++---+-+--+ | + Avail ring is initialized with fixed descriptor and is never changed, i.e, the index value of the nth avail ring entry is always n, which means virtio PMD is actually refilling desc ring only, without having to change avail ring. When vhost fetches avail ring, if not evicted, it is always in its first level cache. When RX receives packets from used ring, we use the used->idx as the desc idx. This requires that vhost processes and returns descs from avail ring to used ring in order, which is true for both current dpdk vhost and kernel vhost implementation. In my understanding, there is no necessity for vhost net to process descriptors OOO. One case could be zero copy, for example, if one descriptor doesn't meet zero copy requirment, we could directly return it to used ring, earlier than the descriptors in front of it. To enforce this, i want to use a reserved bit to indicate in order processing of descriptors. For tx ring, the arrangement is like below. Each transmitted mbuf needs a desc for virtio_net_hdr, so actually we have only 128 free slots. ++ || || +-+-+-+--+--+--+--+ | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring with fixed descriptor +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr +--+--+--+--+-+---+--+---+--+---+--+--+---+ | || || | | | v vv || v v v +--+--+--+--+-+---+--+---+--+---+--+--+---+ | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat +-+-+-+--+--+--+--+ /huawei
[dpdk-dev] how to change binding of NIC ports to NUMA nodes
Hi Rajesh, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rajesh R > Sent: Friday, September 04, 2015 5:29 AM > To: dev at dpdk.org > Subject: [dpdk-dev] how to change binding of NIC ports to NUMA nodes > > Hi, > > I am trying an application based on dpdk on a 4- processor server i.e. 4 > numa nodes. > The server is having with 4 NIC cards out of which 2 cards get binded to > numa node 0 and other 2 cards get binded to numa node 2 (as per the > /sys/pci/.../numa_node for each card) > > > How to evenly distribute the cards to all the numa nodes so that one card > each gets binded to one numa node? > > Can we control the binding from dpdk, either pmd_ixgbe or igb_uio? The drivers cannot change the numa node where your NICs are, as those nodes are associated to the different physical sockets (CPU and memory) that you have on your platform, and your NICs are connected physically to these sockets via the PCI slots. So, if you want to change the numa node, you will have to move the NIC(s) to another PCI slot that is connected to a different socket. Look at the user guide of your platform to find out which PCI slots are connected to which socket. Regards, Pablo > > > -- > Regards > > Rajesh R
[dpdk-dev] vmxnet2-usermap kmod compile errors with ubuntu 15.04
Downloaded the latest vmxnet3-usermap package (ver 1.2) from dpdk.org, tried compiling it under an Ubuntu VM but it fails to compile, is there a newer version of this driver available from somewhere that will compile correctly under Ubuntu 15.04 ? The kernel (Ubuntu 15.04) "uname -a" ===> Linux ubuntu-vm-mansoor 3.19.0-15-generic #15-Ubuntu SMP Thu Apr 16 23:32:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux First I got an error about undefined VM_RESERVED, which I fixed by setting to (VM_DONTEXPAND | VM_DONTDUMP) to get past the error, now I get following compile errors, have followed the instructions inside the "vmxnet3-usermap-1.2/kmod/README" file. Also noticed the message "Using 2.6.x kernel build system", have setup the RTE environment variables as below: # env | grep RTERTE_INCLUDE=/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/build/includeRTE_SDK=/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0RTE_TARGET=x86_64-native-linuxapp-gcc Thanks in advance for your help. # makeUsing 2.6.x kernel build system.make -C /lib/modules/3.19.0-15-generic/build/include/.. SUBDIRS=$PWD SRCROOT=$PWD/. \ MODULEBUILDDIR= modulesmake[1]: Entering directory '/usr/src/linux-headers-3.19.0-15-generic' CC [M] /home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.o/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.c: In function ???vmxnet3_set_features???:/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.c:361:48: error: ???NETIF_F_HW_VLAN_RX??? undeclared (first use in this function) if (changed & (NETIF_F_RXCSUM | NETIF_F_LRO | NETIF_F_HW_VLAN_RX)) { ^/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.c:361:48: note: each undeclared identifier is reported only once for each function it appears in/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.c: In function ???vmxnet3_set_ethtool_ops???:/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.c:677:2: error: implicit declaration of function ???SET_ETHTOOL_OPS??? [-Werror=implicit-function-declaration] SET_ETHTOOL_OPS(netdev, _ethtool_ops); ^cc1: some warnings being treated as errorsscripts/Makefile.build:257: recipe for target '/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.o' failedmake[2]: *** [/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod/vmxnet3_ethtool.o] Error 1Makefile:1394: recipe for target '_module_/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod' failedmake[1]: *** [_module_/home/mansoor/dpdk_download/dpdk_2.1/dpdk-2.1.0/vmxnet3-usermap-1.2/kmod] Error 2make[1]: Leaving directory '/usr/src/linux-headers-3.19.0-15-generic'Makefile:123: recipe for target 'vmxnet3-usermap.ko' failedmake: *** [vmxnet3-usermap.ko] Error 2
[dpdk-dev] how to change binding of NIC ports to NUMA nodes
Hi, I am trying an application based on dpdk on a 4- processor server i.e. 4 numa nodes. The server is having with 4 NIC cards out of which 2 cards get binded to numa node 0 and other 2 cards get binded to numa node 2 (as per the /sys/pci/.../numa_node for each card) How to evenly distribute the cards to all the numa nodes so that one card each gets binded to one numa node? Can we control the binding from dpdk, either pmd_ixgbe or igb_uio? -- Regards Rajesh R