[dpdk-dev] Is vhost vring_avail size tunable?
Thank you Changchun! Tim ? 2015-06-08 09:56:43?"Ouyang, Changchun" ??? > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tim Deng >> Sent: Sunday, June 7, 2015 4:02 PM >> To: dev at dpdk.org >> Subject: [dpdk-dev] Is vhost vring_avail size tunable? >> >> >> Hi, >> >> Under heavy work load, I found there were some packet lost caused by >> "Failed?to get enough desc from vring...", is there any way to get the vring >> size larger? >> > >Qemu has hard code to set the vring size as 256, you need change the value to >a bigger one if you want to enlarge it. > >Thanks >Changchun >
[dpdk-dev] [PATCH v4 1/1] pipeline: add statistics for librte_pipeline ports and tables
> This patch adds statistics collection for librte_pipeline. > Those statistics ale disabled by default during build time. > Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v4 00/10] table: added table statistics
> Added statistics for every type of table. By default all table statistics > are disabled, user must activate them in config file. > > Changes in v4: > - created single config option for all table statistics Acked-by: Cristian Dumitrescu
[dpdk-dev] [PATCH v4 00/13] port: added port statistics
> Added statistics for every type of port. By default all port statistics > are disabled, user must activate them in config file. > > > Changes in v4: > - created single config option for all port statistics > Acked-by: Cristian Dumitrescu
[dpdk-dev] KNIC to Kernel IP interface mapping
HI Is there any many to one mapping schemes available for KNIC interface. Let's say I have 5 kernel IP interfaces and I want to map all the five interfaces to one KNIC (single KNIC), so we can transmit all the frames to KNIC from all the five kernel IP interfaces and when we get the packet from forwarding plane we can put the packet into KNIC and give the packet to corresponding Kernel IP interfaces. We have one to one mapping available according to the documentation but if I have too many IP interfaces I don't want to use too many KNIC as well to provide one to one mapping Thanks in advance! Please let me know if you have any suggestion/ideas to solve this problem! Thanks Jaffar
[dpdk-dev] Shared library build broken
Sorry, I apologize on behalf of my fingers. I meant combined library build is broken when PMD_BOND is selected. On 6/8/15 4:14 PM, Thomas F Herbert wrote: > I just noticed that shared library build is broking. I am building > current master. I had to make this change to get it to build: > > -CONFIG_RTE_LIBRTE_PMD_BOND=y > +CONFIG_RTE_LIBRTE_PMD_BOND=n > > > One of the recent bonding commits broke some dependencies I think but I > didn't investigate further. > > test_link_bonding.o: In function `test_add_slave_to_bonded_device': > test_link_bonding.c:(.text+0x44a): undefined reference to > `rte_eth_bond_slave_add' > test_link_bonding.c:(.text+0x462): undefined reference to > `rte_eth_bond_slaves_get' > test_link_bonding.c:(.text+0x487): undefined reference to > `rte_eth_bond_active_slaves_get > > > --TFH
[dpdk-dev] [PATCH v4 1/1] pipeline: add statistics for librte_pipeline ports and tables
From: Maciej Gajdzica This patch adds statistics collection for librte_pipeline. Those statistics ale disabled by default during build time. Signed-off-by: Pawel Wodkowski --- config/common_bsdapp |1 + config/common_linuxapp |1 + lib/librte_pipeline/rte_pipeline.c | 185 +--- lib/librte_pipeline/rte_pipeline.h | 98 +++ 4 files changed, 274 insertions(+), 11 deletions(-) diff --git a/config/common_bsdapp b/config/common_bsdapp index 68d5110..4c20fe0 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -395,6 +395,7 @@ RTE_TABLE_STATS_COLLECT=n # Compile librte_pipeline # CONFIG_RTE_LIBRTE_PIPELINE=y +RTE_PIPELINE_STATS_COLLECT=n # # Compile librte_kni diff --git a/config/common_linuxapp b/config/common_linuxapp index 7e9b7fa..ca93abc 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -402,6 +402,7 @@ RTE_TABLE_STATS_COLLECT=n # Compile librte_pipeline # CONFIG_RTE_LIBRTE_PIPELINE=y +RTE_PIPELINE_STATS_COLLECT=y # # Compile librte_kni diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c index 36d92c9..69bf003 100644 --- a/lib/librte_pipeline/rte_pipeline.c +++ b/lib/librte_pipeline/rte_pipeline.c @@ -48,6 +48,17 @@ #define RTE_TABLE_INVALID UINT32_MAX +#ifdef RTE_PIPELINE_STATS_COLLECT +#define RTE_PIPELINE_STATS_ADD(counter, val) \ + ({ (counter) += (val); }) + +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) \ + ({ (counter) += __builtin_popcountll(mask); }) +#else +#define RTE_PIPELINE_STATS_ADD(counter, val) +#define RTE_PIPELINE_STATS_ADD_M(counter, mask) +#endif + struct rte_port_in { /* Input parameters */ struct rte_port_in_ops ops; @@ -63,6 +74,8 @@ struct rte_port_in { /* List of enabled ports */ struct rte_port_in *next; + + uint64_t n_pkts_dropped_by_ah; }; struct rte_port_out { @@ -74,6 +87,8 @@ struct rte_port_out { /* Handle to low-level port */ void *h_port; + + uint64_t n_pkts_dropped_by_ah; }; struct rte_table { @@ -90,6 +105,12 @@ struct rte_table { /* Handle to the low-level table object */ void *h_table; + + /* Stats for this table. */ + uint64_t n_pkts_dropped_by_lkp_hit_ah; + uint64_t n_pkts_dropped_by_lkp_miss_ah; + uint64_t n_pkts_dropped_lkp_hit; + uint64_t n_pkts_dropped_lkp_miss; }; #define RTE_PIPELINE_MAX_NAME_SZ 124 @@ -1040,6 +1061,8 @@ rte_pipeline_action_handler_port_bulk(struct rte_pipeline *p, port_out->f_action_bulk(p->pkts, &pkts_mask, port_out->arg_ah); p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= pkts_mask ^ mask; + RTE_PIPELINE_STATS_ADD_M(port_out->n_pkts_dropped_by_ah, + pkts_mask ^ mask); } /* Output port TX */ @@ -1071,6 +1094,9 @@ rte_pipeline_action_handler_port(struct rte_pipeline *p, uint64_t pkts_mask) p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= (pkt_mask ^ 1LLU) << i; + RTE_PIPELINE_STATS_ADD(port_out->n_pkts_dropped_by_ah, + pkt_mask ^ 1LLU); + /* Output port TX */ if (pkt_mask != 0) port_out->ops.f_tx(port_out->h_port, @@ -1104,6 +1130,9 @@ rte_pipeline_action_handler_port(struct rte_pipeline *p, uint64_t pkts_mask) p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= (pkt_mask ^ 1LLU) << i; + RTE_PIPELINE_STATS_ADD(port_out->n_pkts_dropped_by_ah, + pkt_mask ^ 1LLU); + /* Output port TX */ if (pkt_mask != 0) port_out->ops.f_tx(port_out->h_port, @@ -1140,6 +1169,9 @@ rte_pipeline_action_handler_port_meta(struct rte_pipeline *p, p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= (pkt_mask ^ 1LLU) << i; + RTE_PIPELINE_STATS_ADD(port_out->n_pkts_dropped_by_ah, + pkt_mask ^ 1ULL); + /* Output port TX */ if (pkt_mask != 0) port_out->ops.f_tx(port_out->h_port, @@ -1174,6 +1206,9 @@ rte_pipeline_action_handler_port_meta(struct rte_pipeline *p, p->action_mask0[RTE_PIPELINE_ACTION_DROP] |= (pkt_mask ^ 1LLU) << i; + RTE_PIPELINE_STATS_ADD(port_out->n_pkts_dropped_by_ah, +
[dpdk-dev] [PATCH v4 10/10] table: added lpm table stats
From: Maciej Gajdzica Added lpm table statistics. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_lpm.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/lib/librte_table/rte_table_lpm.c b/lib/librte_table/rte_table_lpm.c index 64c684d..300e680 100644 --- a/lib/librte_table/rte_table_lpm.c +++ b/lib/librte_table/rte_table_lpm.c @@ -46,7 +46,23 @@ #define RTE_TABLE_LPM_MAX_NEXT_HOPS256 +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_LPM_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_LPM_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_LPM_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_LPM_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_table_lpm { + struct rte_table_stats stats; + /* Input parameters */ uint32_t entry_size; uint32_t entry_unique_size; @@ -313,6 +329,9 @@ rte_table_lpm_lookup( uint64_t pkts_out_mask = 0; uint32_t i; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_LPM_STATS_PKTS_IN_ADD(lpm, n_pkts_in); + pkts_out_mask = 0; for (i = 0; i < (uint32_t)(RTE_PORT_IN_BURST_SIZE_MAX - __builtin_clzll(pkts_mask)); i++) { @@ -335,6 +354,20 @@ rte_table_lpm_lookup( } *lookup_hit_mask = pkts_out_mask; + RTE_TABLE_LPM_STATS_PKTS_LOOKUP_MISS(lpm, n_pkts_in - __builtin_popcountll(pkts_out_mask)); + return 0; +} + +static int +rte_table_lpm_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_lpm *t = (struct rte_table_lpm *) table; + + if (stats != NULL) + memcpy(stats, &t->stats, sizeof(t->stats)); + + if (clear) + memset(&t->stats, 0, sizeof(t->stats)); return 0; } @@ -345,4 +378,5 @@ struct rte_table_ops rte_table_lpm_ops = { .f_add = rte_table_lpm_entry_add, .f_delete = rte_table_lpm_entry_delete, .f_lookup = rte_table_lpm_lookup, + .f_stats = rte_table_lpm_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 09/10] table: added lpm_ipv6 table stats
From: Maciej Gajdzica Added lpm ipv6 table statistics. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_lpm_ipv6.c | 34 + 1 file changed, 34 insertions(+) diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c index ce4ddc0..ce7fa02 100644 --- a/lib/librte_table/rte_table_lpm_ipv6.c +++ b/lib/librte_table/rte_table_lpm_ipv6.c @@ -46,7 +46,23 @@ #define RTE_TABLE_LPM_MAX_NEXT_HOPS256 +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_LPM_IPV6_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_LPM_IPV6_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_LPM_IPV6_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_LPM_IPV6_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_table_lpm_ipv6 { + struct rte_table_stats stats; + /* Input parameters */ uint32_t entry_size; uint32_t entry_unique_size; @@ -327,6 +343,9 @@ rte_table_lpm_ipv6_lookup( uint64_t pkts_out_mask = 0; uint32_t i; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_LPM_IPV6_STATS_PKTS_IN_ADD(lpm, n_pkts_in); + pkts_out_mask = 0; for (i = 0; i < (uint32_t)(RTE_PORT_IN_BURST_SIZE_MAX - __builtin_clzll(pkts_mask)); i++) { @@ -349,6 +368,20 @@ rte_table_lpm_ipv6_lookup( } *lookup_hit_mask = pkts_out_mask; + RTE_TABLE_LPM_IPV6_STATS_PKTS_LOOKUP_MISS(lpm, n_pkts_in - __builtin_popcountll(pkts_out_mask)); + return 0; +} + +static int +rte_table_lpm_ipv6_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_lpm_ipv6 *t = (struct rte_table_lpm_ipv6 *) table; + + if (stats != NULL) + memcpy(stats, &t->stats, sizeof(t->stats)); + + if (clear) + memset(&t->stats, 0, sizeof(t->stats)); return 0; } @@ -359,4 +392,5 @@ struct rte_table_ops rte_table_lpm_ipv6_ops = { .f_add = rte_table_lpm_ipv6_entry_add, .f_delete = rte_table_lpm_ipv6_entry_delete, .f_lookup = rte_table_lpm_ipv6_lookup, + .f_stats = rte_table_lpm_ipv6_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 08/10] table: added hash_lru table stats
From: Maciej Gajdzica Added statistics for hash_lru table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_hash_lru.c | 44 + 1 file changed, 44 insertions(+) diff --git a/lib/librte_table/rte_table_hash_lru.c b/lib/librte_table/rte_table_hash_lru.c index c9a8afd..c4b6079 100644 --- a/lib/librte_table/rte_table_hash_lru.c +++ b/lib/librte_table/rte_table_hash_lru.c @@ -45,6 +45,20 @@ #define KEYS_PER_BUCKET4 +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_HASH_LRU_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_HASH_LRU_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_HASH_LRU_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_HASH_LRU_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct bucket { union { struct bucket *next; @@ -63,6 +77,8 @@ struct grinder { }; struct rte_table_hash { + struct rte_table_stats stats; + /* Input parameters */ uint32_t key_size; uint32_t entry_size; @@ -368,6 +384,9 @@ static int rte_table_hash_lru_lookup_unoptimized( struct rte_table_hash *t = (struct rte_table_hash *) table; uint64_t pkts_mask_out = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_LRU_STATS_PKTS_IN_ADD(t, n_pkts_in); + for ( ; pkts_mask; ) { struct bucket *bkt; struct rte_mbuf *pkt; @@ -412,6 +431,7 @@ static int rte_table_hash_lru_lookup_unoptimized( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_LRU_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -804,6 +824,9 @@ static int rte_table_hash_lru_lookup( uint64_t pkts_mask_out = 0, pkts_mask_match_many = 0; int status = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_LRU_STATS_PKTS_IN_ADD(t, n_pkts_in); + /* Cannot run the pipeline with less than 7 packets */ if (__builtin_popcountll(pkts_mask) < 7) return rte_table_hash_lru_lookup_unoptimized(table, pkts, @@ -916,6 +939,7 @@ static int rte_table_hash_lru_lookup( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_LRU_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return status; } @@ -933,6 +957,9 @@ static int rte_table_hash_lru_lookup_dosig( uint64_t pkts_mask_out = 0, pkts_mask_match_many = 0; int status = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_LRU_STATS_PKTS_IN_ADD(t, n_pkts_in); + /* Cannot run the pipeline with less than 7 packets */ if (__builtin_popcountll(pkts_mask) < 7) return rte_table_hash_lru_lookup_unoptimized(table, pkts, @@ -1045,15 +1072,31 @@ static int rte_table_hash_lru_lookup_dosig( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_LRU_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return status; } +static int +rte_table_hash_lru_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_hash *t = (struct rte_table_hash *) table; + + if (stats != NULL) + memcpy(stats, &t->stats, sizeof(t->stats)); + + if (clear) + memset(&t->stats, 0, sizeof(t->stats)); + + return 0; +} + struct rte_table_ops rte_table_hash_lru_ops = { .f_create = rte_table_hash_lru_create, .f_free = rte_table_hash_lru_free, .f_add = rte_table_hash_lru_entry_add, .f_delete = rte_table_hash_lru_entry_delete, .f_lookup = rte_table_hash_lru_lookup, + .f_stats = rte_table_hash_lru_stats_read, }; struct rte_table_ops rte_table_hash_lru_dosig_ops = { @@ -1062,4 +1105,5 @@ struct rte_table_ops rte_table_hash_lru_dosig_ops = { .f_add = rte_table_hash_lru_entry_add, .f_delete = rte_table_hash_lru_entry_delete, .f_lookup = rte_table_hash_lru_lookup_dosig, + .f_stats = rte_table_hash_lru_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 07/10] table: added hash_key8 table stats
From: Maciej Gajdzica Added statistics for hash key8 table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_hash_key8.c | 52 1 file changed, 52 insertions(+) diff --git a/lib/librte_table/rte_table_hash_key8.c b/lib/librte_table/rte_table_hash_key8.c index 6803eb2..374e3e3 100644 --- a/lib/librte_table/rte_table_hash_key8.c +++ b/lib/librte_table/rte_table_hash_key8.c @@ -44,6 +44,20 @@ #define RTE_TABLE_HASH_KEY_SIZE 8 +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_HASH_KEY8_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_HASH_KEY8_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_bucket_4_8 { /* Cache line 0 */ uint64_t signature; @@ -58,6 +72,8 @@ struct rte_bucket_4_8 { }; struct rte_table_hash { + struct rte_table_stats stats; + /* Input parameters */ uint32_t n_buckets; uint32_t n_entries_per_bucket; @@ -846,6 +862,9 @@ rte_table_hash_lookup_key8_lru( pkt11_index, pkt20_index, pkt21_index; uint64_t pkts_mask_out = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY8_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -860,6 +879,7 @@ rte_table_hash_lookup_key8_lru( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -949,6 +969,7 @@ rte_table_hash_lookup_key8_lru( bucket20, bucket21, pkts_mask_out, entries, f); *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key8_lru() */ @@ -967,6 +988,9 @@ rte_table_hash_lookup_key8_lru_dosig( uint32_t pkt11_index, pkt20_index, pkt21_index; uint64_t pkts_mask_out = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY8_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -981,6 +1005,7 @@ rte_table_hash_lookup_key8_lru_dosig( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -1070,6 +1095,7 @@ rte_table_hash_lookup_key8_lru_dosig( bucket20, bucket21, pkts_mask_out, entries, f); *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key8_lru_dosig() */ @@ -1090,6 +1116,9 @@ rte_table_hash_lookup_key8_ext( struct rte_bucket_4_8 *buckets[RTE_PORT_IN_BURST_SIZE_MAX]; uint64_t *keys[RTE_PORT_IN_BURST_SIZE_MAX]; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY8_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -1216,6 +1245,7 @@ grind_next_buckets: } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key8_ext() */ @@ -1236,6 +1266,9 @@ rte_table_hash_lookup_key8_ext_dosig( struct rte_bucket_4_8 *buckets[RTE_PORT_IN_BURST_SIZE_MAX]; uint64_t *keys[RTE_PORT_IN_BURST_SIZE_MAX]; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY8_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -1362,15 +1395,31 @@ grind_next_buckets: } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY8_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key8_dosig_ext() */ +static int +rte_table_hash_key8_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_hash *t = (struct rte_table_hash *) table; + + if (stats != NULL) + mem
[dpdk-dev] [PATCH v4 06/10] table: added hash_key32 table stats
From: Maciej Gajdzica Added statistics for hash key32 table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_hash_key32.c | 41 +++ 1 file changed, 41 insertions(+) diff --git a/lib/librte_table/rte_table_hash_key32.c b/lib/librte_table/rte_table_hash_key32.c index 6790594..c230629 100644 --- a/lib/librte_table/rte_table_hash_key32.c +++ b/lib/librte_table/rte_table_hash_key32.c @@ -46,6 +46,20 @@ #define RTE_BUCKET_ENTRY_VALID 0x1LLU +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_HASH_KEY32_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_HASH_KEY32_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_HASH_KEY32_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_HASH_KEY32_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_bucket_4_32 { /* Cache line 0 */ uint64_t signature[4 + 1]; @@ -61,6 +75,8 @@ struct rte_bucket_4_32 { }; struct rte_table_hash { + struct rte_table_stats stats; + /* Input parameters */ uint32_t n_buckets; uint32_t n_entries_per_bucket; @@ -850,6 +866,9 @@ rte_table_hash_lookup_key32_lru( uint32_t pkt11_index, pkt20_index, pkt21_index; uint64_t pkts_mask_out = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY32_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -864,6 +883,7 @@ rte_table_hash_lookup_key32_lru( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY32_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -954,6 +974,7 @@ rte_table_hash_lookup_key32_lru( mbuf20, mbuf21, bucket20, bucket21, pkts_mask_out, entries, f); *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY32_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key32_lru() */ @@ -974,6 +995,9 @@ rte_table_hash_lookup_key32_ext( struct rte_bucket_4_32 *buckets[RTE_PORT_IN_BURST_SIZE_MAX]; uint64_t *keys[RTE_PORT_IN_BURST_SIZE_MAX]; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY32_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -1100,15 +1124,31 @@ grind_next_buckets: } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY32_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key32_ext() */ +static int +rte_table_hash_key32_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_hash *t = (struct rte_table_hash *) table; + + if (stats != NULL) + memcpy(stats, &t->stats, sizeof(t->stats)); + + if (clear) + memset(&t->stats, 0, sizeof(t->stats)); + + return 0; +} + struct rte_table_ops rte_table_hash_key32_lru_ops = { .f_create = rte_table_hash_create_key32_lru, .f_free = rte_table_hash_free_key32_lru, .f_add = rte_table_hash_entry_add_key32_lru, .f_delete = rte_table_hash_entry_delete_key32_lru, .f_lookup = rte_table_hash_lookup_key32_lru, + .f_stats = rte_table_hash_key32_stats_read, }; struct rte_table_ops rte_table_hash_key32_ext_ops = { @@ -1117,4 +1157,5 @@ struct rte_table_ops rte_table_hash_key32_ext_ops = { .f_add = rte_table_hash_entry_add_key32_ext, .f_delete = rte_table_hash_entry_delete_key32_ext, .f_lookup = rte_table_hash_lookup_key32_ext, + .f_stats =rte_table_hash_key32_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 05/10] table: added hash_key16 table stats
From: Maciej Gajdzica Added statistics for hash key16 table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_hash_key16.c | 41 +++ 1 file changed, 41 insertions(+) diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c index f87ea0e..b936323 100644 --- a/lib/librte_table/rte_table_hash_key16.c +++ b/lib/librte_table/rte_table_hash_key16.c @@ -46,6 +46,20 @@ #define RTE_BUCKET_ENTRY_VALID 0x1LLU +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_HASH_KEY16_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_HASH_KEY16_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_HASH_KEY16_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_HASH_KEY16_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_bucket_4_16 { /* Cache line 0 */ uint64_t signature[4 + 1]; @@ -61,6 +75,8 @@ struct rte_bucket_4_16 { }; struct rte_table_hash { + struct rte_table_stats stats; + /* Input parameters */ uint32_t n_buckets; uint32_t n_entries_per_bucket; @@ -831,6 +847,9 @@ rte_table_hash_lookup_key16_lru( uint32_t pkt11_index, pkt20_index, pkt21_index; uint64_t pkts_mask_out = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY16_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -845,6 +864,7 @@ rte_table_hash_lookup_key16_lru( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY16_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -934,6 +954,7 @@ rte_table_hash_lookup_key16_lru( bucket20, bucket21, pkts_mask_out, entries, f); *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY16_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key16_lru() */ @@ -954,6 +975,9 @@ rte_table_hash_lookup_key16_ext( struct rte_bucket_4_16 *buckets[RTE_PORT_IN_BURST_SIZE_MAX]; uint64_t *keys[RTE_PORT_IN_BURST_SIZE_MAX]; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_KEY16_STATS_PKTS_IN_ADD(f, n_pkts_in); + /* Cannot run the pipeline with less than 5 packets */ if (__builtin_popcountll(pkts_mask) < 5) { for ( ; pkts_mask; ) { @@ -1080,15 +1104,31 @@ grind_next_buckets: } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_KEY16_STATS_PKTS_LOOKUP_MISS(f, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } /* rte_table_hash_lookup_key16_ext() */ +static int +rte_table_hash_key16_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_hash *t = (struct rte_table_hash *) table; + + if (stats != NULL) + memcpy(stats, &t->stats, sizeof(t->stats)); + + if (clear) + memset(&t->stats, 0, sizeof(t->stats)); + + return 0; +} + struct rte_table_ops rte_table_hash_key16_lru_ops = { .f_create = rte_table_hash_create_key16_lru, .f_free = rte_table_hash_free_key16_lru, .f_add = rte_table_hash_entry_add_key16_lru, .f_delete = rte_table_hash_entry_delete_key16_lru, .f_lookup = rte_table_hash_lookup_key16_lru, + .f_stats = rte_table_hash_key16_stats_read, }; struct rte_table_ops rte_table_hash_key16_ext_ops = { @@ -1097,4 +1137,5 @@ struct rte_table_ops rte_table_hash_key16_ext_ops = { .f_add = rte_table_hash_entry_add_key16_ext, .f_delete = rte_table_hash_entry_delete_key16_ext, .f_lookup = rte_table_hash_lookup_key16_ext, + .f_stats = rte_table_hash_key16_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 04/10] table: added hash_ext table stats
From: Maciej Gajdzica Added statistics for hash ext table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_hash_ext.c | 44 + 1 file changed, 44 insertions(+) diff --git a/lib/librte_table/rte_table_hash_ext.c b/lib/librte_table/rte_table_hash_ext.c index 66e416b..3c12273 100644 --- a/lib/librte_table/rte_table_hash_ext.c +++ b/lib/librte_table/rte_table_hash_ext.c @@ -74,6 +74,20 @@ do \ (bucket)->next = (bucket2)->next; \ while (0) +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct grinder { struct bucket *bkt; uint64_t sig; @@ -82,6 +96,8 @@ struct grinder { }; struct rte_table_hash { + struct rte_table_stats stats; + /* Input parameters */ uint32_t key_size; uint32_t entry_size; @@ -440,6 +456,9 @@ static int rte_table_hash_ext_lookup_unoptimized( struct rte_table_hash *t = (struct rte_table_hash *) table; uint64_t pkts_mask_out = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(t, n_pkts_in); + for ( ; pkts_mask; ) { struct bucket *bkt0, *bkt; struct rte_mbuf *pkt; @@ -484,6 +503,7 @@ static int rte_table_hash_ext_lookup_unoptimized( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -861,6 +881,9 @@ static int rte_table_hash_ext_lookup( uint64_t pkts_mask_out = 0, pkts_mask_match_many = 0; int status = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(t, n_pkts_in); + /* Cannot run the pipeline with less than 7 packets */ if (__builtin_popcountll(pkts_mask) < 7) return rte_table_hash_ext_lookup_unoptimized(table, pkts, @@ -973,6 +996,7 @@ static int rte_table_hash_ext_lookup( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return status; } @@ -990,6 +1014,9 @@ static int rte_table_hash_ext_lookup_dosig( uint64_t pkts_mask_out = 0, pkts_mask_match_many = 0; int status = 0; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(t, n_pkts_in); + /* Cannot run the pipeline with less than 7 packets */ if (__builtin_popcountll(pkts_mask) < 7) return rte_table_hash_ext_lookup_unoptimized(table, pkts, @@ -1102,15 +1129,31 @@ static int rte_table_hash_ext_lookup_dosig( } *lookup_hit_mask = pkts_mask_out; + RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return status; } +static int +rte_table_hash_ext_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_hash *t = (struct rte_table_hash *) table; + + if (stats != NULL) + memcpy(stats, &t->stats, sizeof(t->stats)); + + if (clear) + memset(&t->stats, 0, sizeof(t->stats)); + + return 0; +} + struct rte_table_ops rte_table_hash_ext_ops = { .f_create = rte_table_hash_ext_create, .f_free = rte_table_hash_ext_free, .f_add = rte_table_hash_ext_entry_add, .f_delete = rte_table_hash_ext_entry_delete, .f_lookup = rte_table_hash_ext_lookup, + .f_stats = rte_table_hash_ext_stats_read, }; struct rte_table_ops rte_table_hash_ext_dosig_ops = { @@ -1119,4 +1162,5 @@ struct rte_table_ops rte_table_hash_ext_dosig_ops = { .f_add = rte_table_hash_ext_entry_add, .f_delete = rte_table_hash_ext_entry_delete, .f_lookup = rte_table_hash_ext_lookup_dosig, + .f_stats = rte_table_hash_ext_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 03/10] table: added array table stats
From: Maciej Gajdzica Added statistics for array table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_array.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/lib/librte_table/rte_table_array.c b/lib/librte_table/rte_table_array.c index c031070..d75cc25 100644 --- a/lib/librte_table/rte_table_array.c +++ b/lib/librte_table/rte_table_array.c @@ -42,7 +42,23 @@ #include "rte_table_array.h" +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_ARRAY_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_ARRAY_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_ARRAY_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_ARRAY_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_table_array { + struct rte_table_stats stats; + /* Input parameters */ uint32_t entry_size; uint32_t n_entries; @@ -164,7 +180,8 @@ rte_table_array_lookup( void **entries) { struct rte_table_array *t = (struct rte_table_array *) table; - + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_ARRAY_STATS_PKTS_IN_ADD(t, n_pkts_in); *lookup_hit_mask = pkts_mask; if ((pkts_mask & (pkts_mask + 1)) == 0) { @@ -196,10 +213,25 @@ rte_table_array_lookup( return 0; } +static int +rte_table_array_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_array *array = (struct rte_table_array *) table; + + if (stats != NULL) + memcpy(stats, &array->stats, sizeof(array->stats)); + + if (clear) + memset(&array->stats, 0, sizeof(array->stats)); + + return 0; +} + struct rte_table_ops rte_table_array_ops = { .f_create = rte_table_array_create, .f_free = rte_table_array_free, .f_add = rte_table_array_entry_add, .f_delete = NULL, .f_lookup = rte_table_array_lookup, + .f_stats = rte_table_array_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 02/10] table: added acl table stats
From: Maciej Gajdzica Added statistics for ACL table. Signed-off-by: Maciej Gajdzica --- lib/librte_table/rte_table_acl.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/lib/librte_table/rte_table_acl.c b/lib/librte_table/rte_table_acl.c index 4416311..f02de3e 100644 --- a/lib/librte_table/rte_table_acl.c +++ b/lib/librte_table/rte_table_acl.c @@ -43,7 +43,23 @@ #include "rte_table_acl.h" #include +#ifdef RTE_TABLE_STATS_COLLECT + +#define RTE_TABLE_ACL_STATS_PKTS_IN_ADD(table, val) \ + table->stats.n_pkts_in += val +#define RTE_TABLE_ACL_STATS_PKTS_LOOKUP_MISS(table, val) \ + table->stats.n_pkts_lookup_miss += val + +#else + +#define RTE_TABLE_ACL_STATS_PKTS_IN_ADD(table, val) +#define RTE_TABLE_ACL_STATS_PKTS_LOOKUP_MISS(table, val) + +#endif + struct rte_table_acl { + struct rte_table_stats stats; + /* Low-level ACL table */ char name[2][RTE_ACL_NAMESIZE]; struct rte_acl_param acl_params; /* for creating low level acl table */ @@ -441,6 +457,9 @@ rte_table_acl_lookup( uint64_t pkts_out_mask; uint32_t n_pkts, i, j; + __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); + RTE_TABLE_ACL_STATS_PKTS_IN_ADD(acl, n_pkts_in); + /* Input conversion */ for (i = 0, j = 0; i < (uint32_t)(RTE_PORT_IN_BURST_SIZE_MAX - __builtin_clzll(pkts_mask)); i++) { @@ -478,6 +497,21 @@ rte_table_acl_lookup( } *lookup_hit_mask = pkts_out_mask; + RTE_TABLE_ACL_STATS_PKTS_LOOKUP_MISS(acl, n_pkts_in - __builtin_popcountll(pkts_out_mask)); + + return 0; +} + +static int +rte_table_acl_stats_read(void *table, struct rte_table_stats *stats, int clear) +{ + struct rte_table_acl *acl = (struct rte_table_acl *) table; + + if (stats != NULL) + memcpy(stats, &acl->stats, sizeof(acl->stats)); + + if (clear) + memset(&acl->stats, 0, sizeof(acl->stats)); return 0; } @@ -488,4 +522,5 @@ struct rte_table_ops rte_table_acl_ops = { .f_add = rte_table_acl_entry_add, .f_delete = rte_table_acl_entry_delete, .f_lookup = rte_table_acl_lookup, + .f_stats = rte_table_acl_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 01/10] table: added structure for storing table stats and config option
From: Maciej Gajdzica Added common structure for table statistics. Added config option to enable table stats collecting. Signed-off-by: Maciej Gajdzica --- config/common_bsdapp |1 + config/common_linuxapp |1 + lib/librte_table/rte_table.h | 25 + 3 files changed, 27 insertions(+) diff --git a/config/common_bsdapp b/config/common_bsdapp index 1d26956..68d5110 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -389,6 +389,7 @@ RTE_PORT_STATS_COLLECT=n # Compile librte_table # CONFIG_RTE_LIBRTE_TABLE=y +RTE_TABLE_STATS_COLLECT=n # # Compile librte_pipeline diff --git a/config/common_linuxapp b/config/common_linuxapp index 5105b25..7e9b7fa 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -396,6 +396,7 @@ RTE_PORT_STATS_COLLECT=n # Compile librte_table # CONFIG_RTE_LIBRTE_TABLE=y +RTE_TABLE_STATS_COLLECT=n # # Compile librte_pipeline diff --git a/lib/librte_table/rte_table.h b/lib/librte_table/rte_table.h index 6e51fe6..1732fbf 100644 --- a/lib/librte_table/rte_table.h +++ b/lib/librte_table/rte_table.h @@ -59,6 +59,12 @@ extern "C" { struct rte_mbuf; +/** Lookup table statistics */ +struct rte_table_stats { + uint64_t n_pkts_in; + uint64_t n_pkts_lookup_miss; +}; + /** * Lookup table create * @@ -187,6 +193,24 @@ typedef int (*rte_table_op_lookup)( uint64_t *lookup_hit_mask, void **entries); +/** + * Lookup table stats read + * + * @param port + * Handle to lookup table instance + * @param stats + * Handle to table stats struct to copy data + * @param clear + * Flag indicating that stats should be cleared after read + * + * @return + * Error code or 0 on success. + */ +typedef int (*rte_table_op_stats_read)( + void *table, + struct rte_table_stats *stats, + int clear); + /** Lookup table interface defining the lookup table operation */ struct rte_table_ops { rte_table_op_create f_create; /**< Create */ @@ -194,6 +218,7 @@ struct rte_table_ops { rte_table_op_entry_add f_add; /**< Entry add */ rte_table_op_entry_delete f_delete; /**< Entry delete */ rte_table_op_lookup f_lookup; /**< Lookup */ + rte_table_op_stats_read f_stats;/**< Stats */ }; #ifdef __cplusplus -- 1.7.9.5
[dpdk-dev] [PATCH v4 00/10] table: added table statistics
From: Maciej Gajdzica Added statistics for every type of table. By default all table statistics are disabled, user must activate them in config file. Changes in v2: - added missing signoffs Changes in v3: - removed new config options to enable/disable stats - using RTE_LOG_LEVEL instead Changes in v4: - created single config option for all table statistics Maciej Gajdzica (10): table: added structure for storing table stats and config option table: added acl table stats table: added array table stats table: added hash_ext table stats table: added hash_key16 table stats table: added hash_key32 table stats table: added hash_key8 table stats table: added hash_lru table stats table: added lpm_ipv6 table stats table: added lpm table stats config/common_bsdapp|1 + config/common_linuxapp |1 + lib/librte_table/rte_table.h| 25 +++ lib/librte_table/rte_table_acl.c| 35 + lib/librte_table/rte_table_array.c | 34 +++- lib/librte_table/rte_table_hash_ext.c | 44 ++ lib/librte_table/rte_table_hash_key16.c | 41 lib/librte_table/rte_table_hash_key32.c | 41 lib/librte_table/rte_table_hash_key8.c | 52 +++ lib/librte_table/rte_table_hash_lru.c | 44 ++ lib/librte_table/rte_table_lpm.c| 34 lib/librte_table/rte_table_lpm_ipv6.c | 34 12 files changed, 385 insertions(+), 1 deletion(-) -- 1.7.9.5
[dpdk-dev] [PATCH v4 13/13] port: added port_sink stats
From: Maciej Gajdzica Added statistics for sink port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_source_sink.c | 63 ++-- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/lib/librte_port/rte_port_source_sink.c b/lib/librte_port/rte_port_source_sink.c index 234ab18..0a70228 100644 --- a/lib/librte_port/rte_port_source_sink.c +++ b/lib/librte_port/rte_port_source_sink.c @@ -133,28 +133,64 @@ rte_port_source_stats_read(void *port, /* * Port SINK */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_SINK_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_SINK_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_SINK_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_SINK_STATS_PKTS_DROP_ADD(port, val) + +#endif + +struct rte_port_sink { + struct rte_port_out_stats stats; +}; + static void * -rte_port_sink_create(__rte_unused void *params, __rte_unused int socket_id) +rte_port_sink_create(__rte_unused void *params, int socket_id) { - return (void *) 1; + struct rte_port_sink *port; + + /* Memory allocation */ + port = rte_zmalloc_socket("PORT", sizeof(*port), + RTE_CACHE_LINE_SIZE, socket_id); + if (port == NULL) { + RTE_LOG(ERR, PORT, "%s: Failed to allocate port\n", __func__); + return NULL; + } + + return port; } static int -rte_port_sink_tx(__rte_unused void *port, struct rte_mbuf *pkt) +rte_port_sink_tx(void *port, struct rte_mbuf *pkt) { + __rte_unused struct rte_port_sink *p = (struct rte_port_sink *) port; + + RTE_PORT_SINK_STATS_PKTS_IN_ADD(p, 1); rte_pktmbuf_free(pkt); + RTE_PORT_SINK_STATS_PKTS_DROP_ADD(p, 1); return 0; } static int -rte_port_sink_tx_bulk(__rte_unused void *port, struct rte_mbuf **pkts, +rte_port_sink_tx_bulk(void *port, struct rte_mbuf **pkts, uint64_t pkts_mask) { + __rte_unused struct rte_port_sink *p = (struct rte_port_sink *) port; + if ((pkts_mask & (pkts_mask + 1)) == 0) { uint64_t n_pkts = __builtin_popcountll(pkts_mask); uint32_t i; + RTE_PORT_SINK_STATS_PKTS_IN_ADD(p, n_pkts); + RTE_PORT_SINK_STATS_PKTS_DROP_ADD(p, n_pkts); for (i = 0; i < n_pkts; i++) { struct rte_mbuf *pkt = pkts[i]; @@ -166,6 +202,8 @@ rte_port_sink_tx_bulk(__rte_unused void *port, struct rte_mbuf **pkts, uint64_t pkt_mask = 1LLU << pkt_index; struct rte_mbuf *pkt = pkts[pkt_index]; + RTE_PORT_SINK_STATS_PKTS_IN_ADD(p, 1); + RTE_PORT_SINK_STATS_PKTS_DROP_ADD(p, 1); rte_pktmbuf_free(pkt); pkts_mask &= ~pkt_mask; } @@ -174,6 +212,22 @@ rte_port_sink_tx_bulk(__rte_unused void *port, struct rte_mbuf **pkts, return 0; } +static int +rte_port_sink_stats_read(void *port, struct rte_port_out_stats *stats, + int clear) +{ + struct rte_port_sink *p = + (struct rte_port_sink *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Summary of port operations */ @@ -190,4 +244,5 @@ struct rte_port_out_ops rte_port_sink_ops = { .f_tx = rte_port_sink_tx, .f_tx_bulk = rte_port_sink_tx_bulk, .f_flush = NULL, + .f_stats = rte_port_sink_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 12/13] port: added port_source stats
From: Maciej Gajdzica Added statistics for source port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_source_sink.c | 35 1 file changed, 35 insertions(+) diff --git a/lib/librte_port/rte_port_source_sink.c b/lib/librte_port/rte_port_source_sink.c index b9a25bb..234ab18 100644 --- a/lib/librte_port/rte_port_source_sink.c +++ b/lib/librte_port/rte_port_source_sink.c @@ -42,7 +42,23 @@ /* * Port SOURCE */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_SOURCE_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_SOURCE_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_SOURCE_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_SOURCE_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_source { + struct rte_port_in_stats stats; + struct rte_mempool *mempool; }; @@ -93,9 +109,27 @@ rte_port_source_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) if (rte_mempool_get_bulk(p->mempool, (void **) pkts, n_pkts) != 0) return 0; + RTE_PORT_SOURCE_STATS_PKTS_IN_ADD(p, n_pkts); + return n_pkts; } +static int +rte_port_source_stats_read(void *port, + struct rte_port_in_stats *stats, int clear) +{ + struct rte_port_source *p = + (struct rte_port_source *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Port SINK */ @@ -147,6 +181,7 @@ struct rte_port_in_ops rte_port_source_ops = { .f_create = rte_port_source_create, .f_free = rte_port_source_free, .f_rx = rte_port_source_rx, + .f_stats = rte_port_source_stats_read, }; struct rte_port_out_ops rte_port_sink_ops = { -- 1.7.9.5
[dpdk-dev] [PATCH v4 11/13] port: added port_sched_writer stats
From: Maciej Gajdzica Added statistics for sched writer port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_sched.c | 57 ++ 1 file changed, 52 insertions(+), 5 deletions(-) diff --git a/lib/librte_port/rte_port_sched.c b/lib/librte_port/rte_port_sched.c index a82e4fa..c5ff8ab 100644 --- a/lib/librte_port/rte_port_sched.c +++ b/lib/librte_port/rte_port_sched.c @@ -132,7 +132,23 @@ rte_port_sched_reader_stats_read(void *port, /* * Writer */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_SCHED_WRITER_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_SCHED_WRITER_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_sched_writer { + struct rte_port_out_stats stats; + struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; struct rte_sched_port *sched; uint32_t tx_burst_sz; @@ -180,8 +196,12 @@ rte_port_sched_writer_tx(void *port, struct rte_mbuf *pkt) struct rte_port_sched_writer *p = (struct rte_port_sched_writer *) port; p->tx_buf[p->tx_buf_count++] = pkt; + RTE_PORT_SCHED_WRITER_STATS_PKTS_IN_ADD(p, 1); if (p->tx_buf_count >= p->tx_burst_sz) { - rte_sched_port_enqueue(p->sched, p->tx_buf, p->tx_buf_count); + __rte_unused uint32_t nb_tx; + + nb_tx = rte_sched_port_enqueue(p->sched, p->tx_buf, p->tx_buf_count); + RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); p->tx_buf_count = 0; } @@ -200,15 +220,18 @@ rte_port_sched_writer_tx_bulk(void *port, ((pkts_mask & bsz_mask) ^ bsz_mask); if (expr == 0) { + __rte_unused uint32_t nb_tx; uint64_t n_pkts = __builtin_popcountll(pkts_mask); if (tx_buf_count) { - rte_sched_port_enqueue(p->sched, p->tx_buf, + nb_tx = rte_sched_port_enqueue(p->sched, p->tx_buf, tx_buf_count); + RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(p, tx_buf_count - nb_tx); p->tx_buf_count = 0; } - rte_sched_port_enqueue(p->sched, pkts, n_pkts); + nb_tx = rte_sched_port_enqueue(p->sched, pkts, n_pkts); + RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(p, n_pkts - nb_tx); } else { for ( ; pkts_mask; ) { uint32_t pkt_index = __builtin_ctzll(pkts_mask); @@ -216,13 +239,17 @@ rte_port_sched_writer_tx_bulk(void *port, struct rte_mbuf *pkt = pkts[pkt_index]; p->tx_buf[tx_buf_count++] = pkt; + RTE_PORT_SCHED_WRITER_STATS_PKTS_IN_ADD(p, 1); pkts_mask &= ~pkt_mask; } p->tx_buf_count = tx_buf_count; if (tx_buf_count >= p->tx_burst_sz) { - rte_sched_port_enqueue(p->sched, p->tx_buf, + __rte_unused uint32_t nb_tx; + + nb_tx = rte_sched_port_enqueue(p->sched, p->tx_buf, tx_buf_count); + RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(p, tx_buf_count - nb_tx); p->tx_buf_count = 0; } } @@ -236,7 +263,10 @@ rte_port_sched_writer_flush(void *port) struct rte_port_sched_writer *p = (struct rte_port_sched_writer *) port; if (p->tx_buf_count) { - rte_sched_port_enqueue(p->sched, p->tx_buf, p->tx_buf_count); + __rte_unused uint32_t nb_tx; + + nb_tx = rte_sched_port_enqueue(p->sched, p->tx_buf, p->tx_buf_count); + RTE_PORT_SCHED_WRITER_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); p->tx_buf_count = 0; } @@ -257,6 +287,22 @@ rte_port_sched_writer_free(void *port) return 0; } +static int +rte_port_sched_writer_stats_read(void *port, + struct rte_port_out_stats *stats, int clear) +{ + struct rte_port_sched_writer *p = + (struct rte_port_sched_writer *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Summary of port operations */ @@ -273,4 +319,5 @@ struct rte_port_out_ops rte_port_sched_writer_ops = { .f_tx = rte_port_sched_writer_tx, .f_tx_bulk = rte_port_sched_writer_tx_bulk, .f_flush = rte_port_sched_writer_flush, + .f_stats = rte_port_sched_writer_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 10/13] port: added port_sched_reader stats
From: Maciej Gajdzica Added statistics for sched reader port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_sched.c | 39 +- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/lib/librte_port/rte_port_sched.c b/lib/librte_port/rte_port_sched.c index 2107f4c..a82e4fa 100644 --- a/lib/librte_port/rte_port_sched.c +++ b/lib/librte_port/rte_port_sched.c @@ -40,7 +40,23 @@ /* * Reader */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_SCHED_READER_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_SCHED_READER_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_SCHED_READER_PKTS_IN_ADD(port, val) +#define RTE_PORT_SCHED_READER_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_sched_reader { + struct rte_port_in_stats stats; + struct rte_sched_port *sched; }; @@ -76,8 +92,12 @@ static int rte_port_sched_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) { struct rte_port_sched_reader *p = (struct rte_port_sched_reader *) port; + uint32_t nb_rx; - return rte_sched_port_dequeue(p->sched, pkts, n_pkts); + nb_rx = rte_sched_port_dequeue(p->sched, pkts, n_pkts); + RTE_PORT_SCHED_READER_PKTS_IN_ADD(p, nb_rx); + + return nb_rx; } static int @@ -93,6 +113,22 @@ rte_port_sched_reader_free(void *port) return 0; } +static int +rte_port_sched_reader_stats_read(void *port, + struct rte_port_in_stats *stats, int clear) +{ + struct rte_port_sched_reader *p = + (struct rte_port_sched_reader *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Writer */ @@ -228,6 +264,7 @@ struct rte_port_in_ops rte_port_sched_reader_ops = { .f_create = rte_port_sched_reader_create, .f_free = rte_port_sched_reader_free, .f_rx = rte_port_sched_reader_rx, + .f_stats = rte_port_sched_reader_stats_read, }; struct rte_port_out_ops rte_port_sched_writer_ops = { -- 1.7.9.5
[dpdk-dev] [PATCH v4 09/13] port: added port_ring_writer_nodrop stats
From: Maciej Gajdzica Added statistics for ring writer nodrop port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ring.c | 37 + 1 file changed, 37 insertions(+) diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c index ff58009..9461c05 100644 --- a/lib/librte_port/rte_port_ring.c +++ b/lib/librte_port/rte_port_ring.c @@ -309,7 +309,23 @@ rte_port_ring_writer_stats_read(void *port, /* * Port RING Writer Nodrop */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ring_writer_nodrop { + struct rte_port_out_stats stats; + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; struct rte_ring *ring; uint32_t tx_burst_sz; @@ -379,6 +395,7 @@ send_burst_nodrop(struct rte_port_ring_writer_nodrop *p) } /* We didn't send the packets in maximum allowed attempts */ + RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); for ( ; nb_tx < p->tx_buf_count; nb_tx++) rte_pktmbuf_free(p->tx_buf[nb_tx]); @@ -392,6 +409,7 @@ rte_port_ring_writer_nodrop_tx(void *port, struct rte_mbuf *pkt) (struct rte_port_ring_writer_nodrop *) port; p->tx_buf[p->tx_buf_count++] = pkt; + RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); if (p->tx_buf_count >= p->tx_burst_sz) send_burst_nodrop(p); @@ -418,6 +436,7 @@ rte_port_ring_writer_nodrop_tx_bulk(void *port, if (tx_buf_count) send_burst_nodrop(p); + RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_IN_ADD(p, n_pkts); n_pkts_ok = rte_ring_sp_enqueue_burst(p->ring, (void **)pkts, n_pkts); if (n_pkts_ok >= n_pkts) @@ -439,6 +458,7 @@ rte_port_ring_writer_nodrop_tx_bulk(void *port, struct rte_mbuf *pkt = pkts[pkt_index]; p->tx_buf[tx_buf_count++] = pkt; + RTE_PORT_RING_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); pkts_mask &= ~pkt_mask; } @@ -476,6 +496,22 @@ rte_port_ring_writer_nodrop_free(void *port) return 0; } +static int +rte_port_ring_writer_nodrop_stats_read(void *port, + struct rte_port_out_stats *stats, int clear) +{ + struct rte_port_ring_writer_nodrop *p = + (struct rte_port_ring_writer_nodrop *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Summary of port operations */ @@ -501,4 +537,5 @@ struct rte_port_out_ops rte_port_ring_writer_nodrop_ops = { .f_tx = rte_port_ring_writer_nodrop_tx, .f_tx_bulk = rte_port_ring_writer_nodrop_tx_bulk, .f_flush = rte_port_ring_writer_nodrop_flush, + .f_stats = rte_port_ring_writer_nodrop_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 08/13] port: added port_ring_writer stats
From: Maciej Gajdzica Added statistics for port writer port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ring.c | 38 ++ 1 file changed, 38 insertions(+) diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c index 091b052..ff58009 100644 --- a/lib/librte_port/rte_port_ring.c +++ b/lib/librte_port/rte_port_ring.c @@ -133,7 +133,23 @@ rte_port_ring_reader_stats_read(void *port, /* * Port RING Writer */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_RING_WRITER_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_RING_WRITER_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_RING_WRITER_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_RING_WRITER_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ring_writer { + struct rte_port_out_stats stats; + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; struct rte_ring *ring; uint32_t tx_burst_sz; @@ -181,6 +197,7 @@ send_burst(struct rte_port_ring_writer *p) nb_tx = rte_ring_sp_enqueue_burst(p->ring, (void **)p->tx_buf, p->tx_buf_count); + RTE_PORT_RING_WRITER_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); for ( ; nb_tx < p->tx_buf_count; nb_tx++) rte_pktmbuf_free(p->tx_buf[nb_tx]); @@ -193,6 +210,7 @@ rte_port_ring_writer_tx(void *port, struct rte_mbuf *pkt) struct rte_port_ring_writer *p = (struct rte_port_ring_writer *) port; p->tx_buf[p->tx_buf_count++] = pkt; + RTE_PORT_RING_WRITER_STATS_PKTS_IN_ADD(p, 1); if (p->tx_buf_count >= p->tx_burst_sz) send_burst(p); @@ -219,8 +237,10 @@ rte_port_ring_writer_tx_bulk(void *port, if (tx_buf_count) send_burst(p); + RTE_PORT_RING_WRITER_STATS_PKTS_IN_ADD(p, n_pkts); n_pkts_ok = rte_ring_sp_enqueue_burst(p->ring, (void **)pkts, n_pkts); + RTE_PORT_RING_WRITER_STATS_PKTS_DROP_ADD(p, n_pkts - n_pkts_ok); for ( ; n_pkts_ok < n_pkts; n_pkts_ok++) { struct rte_mbuf *pkt = pkts[n_pkts_ok]; @@ -233,6 +253,7 @@ rte_port_ring_writer_tx_bulk(void *port, struct rte_mbuf *pkt = pkts[pkt_index]; p->tx_buf[tx_buf_count++] = pkt; + RTE_PORT_RING_WRITER_STATS_PKTS_IN_ADD(p, 1); pkts_mask &= ~pkt_mask; } @@ -269,6 +290,22 @@ rte_port_ring_writer_free(void *port) return 0; } +static int +rte_port_ring_writer_stats_read(void *port, + struct rte_port_out_stats *stats, int clear) +{ + struct rte_port_ring_writer *p = + (struct rte_port_ring_writer *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Port RING Writer Nodrop */ @@ -455,6 +492,7 @@ struct rte_port_out_ops rte_port_ring_writer_ops = { .f_tx = rte_port_ring_writer_tx, .f_tx_bulk = rte_port_ring_writer_tx_bulk, .f_flush = rte_port_ring_writer_flush, + .f_stats = rte_port_ring_writer_stats_read, }; struct rte_port_out_ops rte_port_ring_writer_nodrop_ops = { -- 1.7.9.5
[dpdk-dev] [PATCH v4 07/13] port: added port_ring_reader stats
From: Maciej Gajdzica Added statistics for ring reader port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ring.c | 39 ++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c index 89b9641..091b052 100644 --- a/lib/librte_port/rte_port_ring.c +++ b/lib/librte_port/rte_port_ring.c @@ -42,7 +42,23 @@ /* * Port RING Reader */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_RING_READER_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_RING_READER_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ring_reader { + struct rte_port_in_stats stats; + struct rte_ring *ring; }; @@ -77,8 +93,12 @@ static int rte_port_ring_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) { struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port; + uint32_t nb_rx; - return rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts); + nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts); + RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx); + + return nb_rx; } static int @@ -94,6 +114,22 @@ rte_port_ring_reader_free(void *port) return 0; } +static int +rte_port_ring_reader_stats_read(void *port, + struct rte_port_in_stats *stats, int clear) +{ + struct rte_port_ring_reader *p = + (struct rte_port_ring_reader *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Port RING Writer */ @@ -410,6 +446,7 @@ struct rte_port_in_ops rte_port_ring_reader_ops = { .f_create = rte_port_ring_reader_create, .f_free = rte_port_ring_reader_free, .f_rx = rte_port_ring_reader_rx, + .f_stats = rte_port_ring_reader_stats_read, }; struct rte_port_out_ops rte_port_ring_writer_ops = { -- 1.7.9.5
[dpdk-dev] [PATCH v4 06/13] port: added port_ras stats
From: Maciej Gajdzica Added statistics for IPv4 and IPv6 reassembly ports. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ras.c | 38 ++ 1 file changed, 38 insertions(+) diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c index 5eb627a..2c1822a 100644 --- a/lib/librte_port/rte_port_ras.c +++ b/lib/librte_port/rte_port_ras.c @@ -51,6 +51,20 @@ #define RTE_PORT_RAS_N_ENTRIES (RTE_PORT_RAS_N_BUCKETS * RTE_PORT_RAS_N_ENTRIES_PER_BUCKET) #endif +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_RING_WRITER_RAS_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_RING_WRITER_RAS_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_RING_WRITER_RAS_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_RING_WRITER_RAS_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ring_writer_ras; typedef void (*ras_op)( @@ -63,6 +77,8 @@ static void process_ipv6(struct rte_port_ring_writer_ras *p, struct rte_mbuf *pkt); struct rte_port_ring_writer_ras { + struct rte_port_out_stats stats; + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; struct rte_ring *ring; uint32_t tx_burst_sz; @@ -153,6 +169,7 @@ send_burst(struct rte_port_ring_writer_ras *p) nb_tx = rte_ring_sp_enqueue_burst(p->ring, (void **)p->tx_buf, p->tx_buf_count); + RTE_PORT_RING_WRITER_RAS_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); for ( ; nb_tx < p->tx_buf_count; nb_tx++) rte_pktmbuf_free(p->tx_buf[nb_tx]); @@ -225,6 +242,7 @@ rte_port_ring_writer_ras_tx(void *port, struct rte_mbuf *pkt) struct rte_port_ring_writer_ras *p = (struct rte_port_ring_writer_ras *) port; + RTE_PORT_RING_WRITER_RAS_STATS_PKTS_IN_ADD(p, 1); p->f_ras(p, pkt); if (p->tx_buf_count >= p->tx_burst_sz) send_burst(p); @@ -247,6 +265,7 @@ rte_port_ring_writer_ras_tx_bulk(void *port, for (i = 0; i < n_pkts; i++) { struct rte_mbuf *pkt = pkts[i]; + RTE_PORT_RING_WRITER_RAS_STATS_PKTS_IN_ADD(p, 1); p->f_ras(p, pkt); if (p->tx_buf_count >= p->tx_burst_sz) send_burst(p); @@ -257,6 +276,7 @@ rte_port_ring_writer_ras_tx_bulk(void *port, uint64_t pkt_mask = 1LLU << pkt_index; struct rte_mbuf *pkt = pkts[pkt_index]; + RTE_PORT_RING_WRITER_RAS_STATS_PKTS_IN_ADD(p, 1); p->f_ras(p, pkt); if (p->tx_buf_count >= p->tx_burst_sz) send_burst(p); @@ -298,6 +318,22 @@ rte_port_ring_writer_ras_free(void *port) return 0; } +static int +rte_port_ras_writer_stats_read(void *port, + struct rte_port_out_stats *stats, int clear) +{ + struct rte_port_ring_writer_ras *p = + (struct rte_port_ring_writer_ras *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Summary of port operations */ @@ -307,6 +343,7 @@ struct rte_port_out_ops rte_port_ring_writer_ipv4_ras_ops = { .f_tx = rte_port_ring_writer_ras_tx, .f_tx_bulk = rte_port_ring_writer_ras_tx_bulk, .f_flush = rte_port_ring_writer_ras_flush, + .f_stats = rte_port_ras_writer_stats_read, }; struct rte_port_out_ops rte_port_ring_writer_ipv6_ras_ops = { @@ -315,4 +352,5 @@ struct rte_port_out_ops rte_port_ring_writer_ipv6_ras_ops = { .f_tx = rte_port_ring_writer_ras_tx, .f_tx_bulk = rte_port_ring_writer_ras_tx_bulk, .f_flush = rte_port_ring_writer_ras_flush, + .f_stats = rte_port_ras_writer_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 05/13] port: added port_frag stats
From: Maciej Gajdzica Added statistics for IPv4 and IPv6 fragmentation ports. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_frag.c | 36 1 file changed, 36 insertions(+) diff --git a/lib/librte_port/rte_port_frag.c b/lib/librte_port/rte_port_frag.c index c4c05dc..3720d5d 100644 --- a/lib/librte_port/rte_port_frag.c +++ b/lib/librte_port/rte_port_frag.c @@ -41,6 +41,20 @@ /* Max number of fragments per packet allowed */ #defineRTE_PORT_FRAG_MAX_FRAGS_PER_PACKET 0x80 +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_RING_READER_FRAG_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_RING_READER_FRAG_STATS_PKTS_DROP_ADD(port, val) + +#endif + typedef int32_t (*frag_op)(struct rte_mbuf *pkt_in, struct rte_mbuf **pkts_out, @@ -50,6 +64,8 @@ typedef int32_t struct rte_mempool *pool_indirect); struct rte_port_ring_reader_frag { + struct rte_port_in_stats stats; + /* Input parameters */ struct rte_ring *ring; uint32_t mtu; @@ -171,6 +187,7 @@ rte_port_ring_reader_frag_rx(void *port, if (p->n_pkts == 0) { p->n_pkts = rte_ring_sc_dequeue_burst(p->ring, (void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX); + RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(p, p->n_pkts); if (p->n_pkts == 0) return n_pkts_out; p->pos_pkts = 0; @@ -203,6 +220,7 @@ rte_port_ring_reader_frag_rx(void *port, if (status < 0) { rte_pktmbuf_free(pkt); + RTE_PORT_RING_READER_FRAG_STATS_PKTS_DROP_ADD(p, 1); continue; } @@ -252,6 +270,22 @@ rte_port_ring_reader_frag_free(void *port) return 0; } +static int +rte_port_frag_reader_stats_read(void *port, + struct rte_port_in_stats *stats, int clear) +{ + struct rte_port_ring_reader_frag *p = + (struct rte_port_ring_reader_frag *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Summary of port operations */ @@ -259,10 +293,12 @@ struct rte_port_in_ops rte_port_ring_reader_ipv4_frag_ops = { .f_create = rte_port_ring_reader_ipv4_frag_create, .f_free = rte_port_ring_reader_frag_free, .f_rx = rte_port_ring_reader_frag_rx, + .f_stats = rte_port_frag_reader_stats_read, }; struct rte_port_in_ops rte_port_ring_reader_ipv6_frag_ops = { .f_create = rte_port_ring_reader_ipv6_frag_create, .f_free = rte_port_ring_reader_frag_free, .f_rx = rte_port_ring_reader_frag_rx, + .f_stats = rte_port_frag_reader_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 04/13] port: added port_ethdev_writer_nodrop stats
From: Maciej Gajdzica Added statistics for ethdev writer nodrop port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ethdev.c | 36 1 file changed, 36 insertions(+) diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c index b5b39f8..cee1b33 100644 --- a/lib/librte_port/rte_port_ethdev.c +++ b/lib/librte_port/rte_port_ethdev.c @@ -314,7 +314,23 @@ static int rte_port_ethdev_writer_stats_read(void *port, /* * Port ETHDEV Writer Nodrop */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ethdev_writer_nodrop { + struct rte_port_out_stats stats; + struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; uint32_t tx_burst_sz; uint16_t tx_buf_count; @@ -387,6 +403,7 @@ send_burst_nodrop(struct rte_port_ethdev_writer_nodrop *p) } /* We didn't send the packets in maximum allowed attempts */ + RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); for ( ; nb_tx < p->tx_buf_count; nb_tx++) rte_pktmbuf_free(p->tx_buf[nb_tx]); @@ -400,6 +417,7 @@ rte_port_ethdev_writer_nodrop_tx(void *port, struct rte_mbuf *pkt) (struct rte_port_ethdev_writer_nodrop *) port; p->tx_buf[p->tx_buf_count++] = pkt; + RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); if (p->tx_buf_count >= p->tx_burst_sz) send_burst_nodrop(p); @@ -426,6 +444,7 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void *port, if (tx_buf_count) send_burst_nodrop(p); + RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, n_pkts); n_pkts_ok = rte_eth_tx_burst(p->port_id, p->queue_id, pkts, n_pkts); @@ -448,6 +467,7 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void *port, struct rte_mbuf *pkt = pkts[pkt_index]; p->tx_buf[tx_buf_count++] = pkt; + RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); pkts_mask &= ~pkt_mask; } @@ -485,6 +505,21 @@ rte_port_ethdev_writer_nodrop_free(void *port) return 0; } +static int rte_port_ethdev_writer_nodrop_stats_read(void *port, + struct rte_port_out_stats *stats, int clear) +{ + struct rte_port_ethdev_writer_nodrop *p = + (struct rte_port_ethdev_writer_nodrop *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Summary of port operations */ @@ -510,4 +545,5 @@ struct rte_port_out_ops rte_port_ethdev_writer_nodrop_ops = { .f_tx = rte_port_ethdev_writer_nodrop_tx, .f_tx_bulk = rte_port_ethdev_writer_nodrop_tx_bulk, .f_flush = rte_port_ethdev_writer_nodrop_flush, + .f_stats = rte_port_ethdev_writer_nodrop_stats_read, }; -- 1.7.9.5
[dpdk-dev] [PATCH v4 03/13] port: added port_ethdev_writer stats
From: Maciej Gajdzica Added statistics for ethdev writer port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ethdev.c | 37 + 1 file changed, 37 insertions(+) diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c index da1af08..b5b39f8 100644 --- a/lib/librte_port/rte_port_ethdev.c +++ b/lib/librte_port/rte_port_ethdev.c @@ -134,7 +134,23 @@ static int rte_port_ethdev_reader_stats_read(void *port, /* * Port ETHDEV Writer */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_ETHDEV_WRITER_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_ETHDEV_WRITER_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ethdev_writer { + struct rte_port_out_stats stats; + struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; uint32_t tx_burst_sz; uint16_t tx_buf_count; @@ -185,6 +201,7 @@ send_burst(struct rte_port_ethdev_writer *p) nb_tx = rte_eth_tx_burst(p->port_id, p->queue_id, p->tx_buf, p->tx_buf_count); + RTE_PORT_ETHDEV_WRITER_STATS_PKTS_DROP_ADD(p, p->tx_buf_count - nb_tx); for ( ; nb_tx < p->tx_buf_count; nb_tx++) rte_pktmbuf_free(p->tx_buf[nb_tx]); @@ -198,6 +215,7 @@ rte_port_ethdev_writer_tx(void *port, struct rte_mbuf *pkt) (struct rte_port_ethdev_writer *) port; p->tx_buf[p->tx_buf_count++] = pkt; + RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1); if (p->tx_buf_count >= p->tx_burst_sz) send_burst(p); @@ -223,9 +241,11 @@ rte_port_ethdev_writer_tx_bulk(void *port, if (tx_buf_count) send_burst(p); + RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, n_pkts); n_pkts_ok = rte_eth_tx_burst(p->port_id, p->queue_id, pkts, n_pkts); + RTE_PORT_ETHDEV_WRITER_STATS_PKTS_DROP_ADD(p, n_pkts - n_pkts_ok); for ( ; n_pkts_ok < n_pkts; n_pkts_ok++) { struct rte_mbuf *pkt = pkts[n_pkts_ok]; @@ -238,6 +258,7 @@ rte_port_ethdev_writer_tx_bulk(void *port, struct rte_mbuf *pkt = pkts[pkt_index]; p->tx_buf[tx_buf_count++] = pkt; + RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1); pkts_mask &= ~pkt_mask; } @@ -275,6 +296,21 @@ rte_port_ethdev_writer_free(void *port) return 0; } +static int rte_port_ethdev_writer_stats_read(void *port, + struct rte_port_out_stats *stats, int clear) +{ + struct rte_port_ethdev_writer *p = + (struct rte_port_ethdev_writer *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Port ETHDEV Writer Nodrop */ @@ -465,6 +501,7 @@ struct rte_port_out_ops rte_port_ethdev_writer_ops = { .f_tx = rte_port_ethdev_writer_tx, .f_tx_bulk = rte_port_ethdev_writer_tx_bulk, .f_flush = rte_port_ethdev_writer_flush, + .f_stats = rte_port_ethdev_writer_stats_read, }; struct rte_port_out_ops rte_port_ethdev_writer_nodrop_ops = { -- 1.7.9.5
[dpdk-dev] [PATCH v4 02/13] port: added port_ethdev_reader stats
From: Maciej Gajdzica Added statistics for ethdev reader port. Signed-off-by: Maciej Gajdzica --- lib/librte_port/rte_port_ethdev.c | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c index 39ed72d..da1af08 100644 --- a/lib/librte_port/rte_port_ethdev.c +++ b/lib/librte_port/rte_port_ethdev.c @@ -42,7 +42,23 @@ /* * Port ETHDEV Reader */ +#ifdef RTE_PORT_STATS_COLLECT + +#define RTE_PORT_ETHDEV_READER_STATS_PKTS_IN_ADD(port, val) \ + port->stats.n_pkts_in += val +#define RTE_PORT_ETHDEV_READER_STATS_PKTS_DROP_ADD(port, val) \ + port->stats.n_pkts_drop += val + +#else + +#define RTE_PORT_ETHDEV_READER_STATS_PKTS_IN_ADD(port, val) +#define RTE_PORT_ETHDEV_READER_STATS_PKTS_DROP_ADD(port, val) + +#endif + struct rte_port_ethdev_reader { + struct rte_port_in_stats stats; + uint16_t queue_id; uint8_t port_id; }; @@ -80,8 +96,11 @@ rte_port_ethdev_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) { struct rte_port_ethdev_reader *p = (struct rte_port_ethdev_reader *) port; + uint16_t rx_pkt_cnt; - return rte_eth_rx_burst(p->port_id, p->queue_id, pkts, n_pkts); + rx_pkt_cnt = rte_eth_rx_burst(p->port_id, p->queue_id, pkts, n_pkts); + RTE_PORT_ETHDEV_READER_STATS_PKTS_IN_ADD(p, rx_pkt_cnt); + return rx_pkt_cnt; } static int @@ -97,6 +116,21 @@ rte_port_ethdev_reader_free(void *port) return 0; } +static int rte_port_ethdev_reader_stats_read(void *port, + struct rte_port_in_stats * stats, int clear) +{ + struct rte_port_ethdev_reader *p = + (struct rte_port_ethdev_reader *) port; + + if (stats != NULL) + memcpy(stats, &p->stats, sizeof(p->stats)); + + if (clear) + memset(&p->stats, 0, sizeof(p->stats)); + + return 0; +} + /* * Port ETHDEV Writer */ @@ -422,6 +456,7 @@ struct rte_port_in_ops rte_port_ethdev_reader_ops = { .f_create = rte_port_ethdev_reader_create, .f_free = rte_port_ethdev_reader_free, .f_rx = rte_port_ethdev_reader_rx, + .f_stats = rte_port_ethdev_reader_stats_read, }; struct rte_port_out_ops rte_port_ethdev_writer_ops = { -- 1.7.9.5
[dpdk-dev] [PATCH v4 01/13] port: added structures for port stats and config option
From: Maciej Gajdzica Added common data structures for port statistics. Added config option to enable stats collecting. Signed-off-by: Maciej Gajdzica --- config/common_bsdapp |1 + config/common_linuxapp |1 + lib/librte_port/rte_port.h | 60 3 files changed, 57 insertions(+), 5 deletions(-) diff --git a/config/common_bsdapp b/config/common_bsdapp index c2374c0..1d26956 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -383,6 +383,7 @@ CONFIG_RTE_LIBRTE_REORDER=y # Compile librte_port # CONFIG_RTE_LIBRTE_PORT=y +RTE_PORT_STATS_COLLECT=n # # Compile librte_table diff --git a/config/common_linuxapp b/config/common_linuxapp index 0078dc9..5105b25 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -390,6 +390,7 @@ CONFIG_RTE_LIBRTE_REORDER=y # Compile librte_port # CONFIG_RTE_LIBRTE_PORT=y +RTE_PORT_STATS_COLLECT=n # # Compile librte_table diff --git a/lib/librte_port/rte_port.h b/lib/librte_port/rte_port.h index d84e5a1..ab433e5 100644 --- a/lib/librte_port/rte_port.h +++ b/lib/librte_port/rte_port.h @@ -81,6 +81,12 @@ extern "C" { Cannot be changed. */ #define RTE_PORT_IN_BURST_SIZE_MAX 64 +/** Input port statistics */ +struct rte_port_in_stats { + uint64_t n_pkts_in; + uint64_t n_pkts_drop; +}; + /** * Input port create * @@ -120,17 +126,42 @@ typedef int (*rte_port_in_op_rx)( struct rte_mbuf **pkts, uint32_t n_pkts); +/** + * Input port stats get + * + * @param port + * Handle to output port instance + * @param stats + * Handle to port_in stats struct to copy data + * @param clear + * Flag indicating that stats should be cleared after read + * + * @return + * Error code or 0 on success. + */ +typedef int (*rte_port_in_op_stats_read)( + void *port, + struct rte_port_in_stats *stats, + int clear); + /** Input port interface defining the input port operation */ struct rte_port_in_ops { rte_port_in_op_create f_create; /**< Create */ rte_port_in_op_free f_free; /**< Free */ rte_port_in_op_rx f_rx; /**< Packet RX (packet burst) */ + rte_port_in_op_stats_read f_stats; /**< Stats */ }; /* * Port OUT * */ +/** Output port statistics */ +struct rte_port_out_stats { + uint64_t n_pkts_in; + uint64_t n_pkts_drop; +}; + /** * Output port create * @@ -197,13 +228,32 @@ typedef int (*rte_port_out_op_tx_bulk)( */ typedef int (*rte_port_out_op_flush)(void *port); +/** + * Output port stats read + * + * @param port + * Handle to output port instance + * @param stats + * Handle to port_out stats struct to copy data + * @param clear + * Flag indicating that stats should be cleared after read + * + * @return + * Error code or 0 on success. + */ +typedef int (*rte_port_out_op_stats_read)( + void *port, + struct rte_port_out_stats *stats, + int clear); + /** Output port interface defining the output port operation */ struct rte_port_out_ops { - rte_port_out_op_create f_create; /**< Create */ - rte_port_out_op_free f_free; /**< Free */ - rte_port_out_op_tx f_tx; /**< Packet TX (single packet) */ - rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst) */ - rte_port_out_op_flush f_flush; /**< Flush */ + rte_port_out_op_create f_create;/**< Create */ + rte_port_out_op_free f_free;/**< Free */ + rte_port_out_op_tx f_tx;/**< Packet TX (single packet) */ + rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst) */ + rte_port_out_op_flush f_flush; /**< Flush */ + rte_port_out_op_stats_read f_stats; /**< Stats */ }; #ifdef __cplusplus -- 1.7.9.5
[dpdk-dev] [PATCH v4 00/13] port: added port statistics
From: Maciej Gajdzica Added statistics for every type of port. By default all port statistics are disabled, user must activate them in config file. Changes in v2: - added missing signoffs Changes in v3: - removed new config options to enable/disable stats - using RTE_LOG_LEVEL instead Changes in v4: - created single config option for all port statistics Maciej Gajdzica (13): port: added structures for port stats and config option port: added port_ethdev_reader stats port: added port_ethdev_writer stats port: added port_ethdev_writer_nodrop stats port: added port_frag stats port: added port_ras stats port: added port_ring_reader stats port: added port_ring_writer stats port: added port_ring_writer_nodrop stats port: added port_sched_reader stats port: added port_sched_writer stats port: added port_source stats port: added port_sink stats config/common_bsdapp |1 + config/common_linuxapp |1 + lib/librte_port/rte_port.h | 60 +++-- lib/librte_port/rte_port_ethdev.c | 110 +- lib/librte_port/rte_port_frag.c| 36 ++ lib/librte_port/rte_port_ras.c | 38 +++ lib/librte_port/rte_port_ring.c| 114 +++- lib/librte_port/rte_port_sched.c | 96 +-- lib/librte_port/rte_port_source_sink.c | 98 +-- 9 files changed, 537 insertions(+), 17 deletions(-) -- 1.7.9.5
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
In __rte_pktmbuf_prefree_seg(), there was an optimization to avoid using a costly atomic operation when updating the mbuf reference counter if its value is 1. Indeed, it means that we are the only owner of the mbuf, and therefore nobody can change it at the same time. We can generalize this optimization directly in rte_mbuf_refcnt_update() so the other callers of this function, like rte_pktmbuf_attach(), can also take advantage of this optimization. Signed-off-by: Olivier Matz --- lib/librte_mbuf/rte_mbuf.h | 57 +++--- 1 file changed, 28 insertions(+), 29 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index ab6de67..6c9cfd6 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -426,21 +426,6 @@ if (!(exp)) { \ #ifdef RTE_MBUF_REFCNT_ATOMIC /** - * Adds given value to an mbuf's refcnt and returns its new value. - * @param m - * Mbuf to update - * @param value - * Value to add/subtract - * @return - * Updated value - */ -static inline uint16_t -rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value) -{ - return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value)); -} - -/** * Reads the value of an mbuf's refcnt. * @param m * Mbuf to read @@ -466,6 +451,33 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value) rte_atomic16_set(&m->refcnt_atomic, new_value); } +/** + * Adds given value to an mbuf's refcnt and returns its new value. + * @param m + * Mbuf to update + * @param value + * Value to add/subtract + * @return + * Updated value + */ +static inline uint16_t +rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value) +{ + /* +* The atomic_add is an expensive operation, so we don't want to +* call it in the case where we know we are the uniq holder of +* this mbuf (i.e. ref_cnt == 1). Otherwise, an atomic +* operation has to be used because concurrent accesses on the +* reference counter can occur. +*/ + if (likely(rte_mbuf_refcnt_read(m) == 1)) { + rte_mbuf_refcnt_set(m, 1 + value); + return 1 + value; + } + + return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value)); +} + #else /* ! RTE_MBUF_REFCNT_ATOMIC */ /** @@ -895,20 +907,7 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) { __rte_mbuf_sanity_check(m, 0); - /* -* Check to see if this is the last reference to the mbuf. -* Note: the double check here is deliberate. If the ref_cnt is "atomic" -* the call to "refcnt_update" is a very expensive operation, so we -* don't want to call it in the case where we know we are the holder -* of the last reference to this mbuf i.e. ref_cnt == 1. -* If however, ref_cnt != 1, it's still possible that we may still be -* the final decrementer of the count, so we need to check that -* result also, to make sure the mbuf is freed properly. -*/ - if (likely (rte_mbuf_refcnt_read(m) == 1) || - likely (rte_mbuf_refcnt_update(m, -1) == 0)) { - - rte_mbuf_refcnt_set(m, 0); + if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) { /* if this is an indirect mbuf, then * - detach mbuf -- 2.1.4
[dpdk-dev] [PATCH v2] eal:Fix log messages always being printed from rte_eal_cpu_init
The RTE_LOG(DEBUG, ...) messages in rte_eal_cpu_init() are printed even when the log level on the command line was set to INFO or lower. The problem is the rte_eal_cpu_init() routine was called before the command line args are scanned. Setting --log-level=7 now correctly does not print the messages from the rte_eal_cpu_init() routine. Signed-off-by: Keith Wiles --- lib/librte_eal/bsdapp/eal/eal.c | 42 ++- lib/librte_eal/linuxapp/eal/eal.c | 42 ++- 2 files changed, 74 insertions(+), 10 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 43e8a47..0112617 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -306,6 +306,38 @@ eal_get_hugepage_mem_size(void) return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; } +/* Parse the arguments for --log-level only */ +static void +eal_log_level_parse(int argc, char **argv) +{ + int opt; + char **argvopt; + int option_index; + + argvopt = argv; + + eal_reset_internal_config(&internal_config); + + while ((opt = getopt_long(argc, argvopt, eal_short_options, + eal_long_options, &option_index)) != EOF) { + + int ret; + + /* getopt is not happy, stop right now */ + if (opt == '?') + break; + + ret = (opt == OPT_LOG_LEVEL_NUM)? + eal_parse_common_option(opt, optarg, &internal_config) : 0; + + /* common parser is not happy */ + if (ret < 0) + break; + } + + optind = 0; /* reset getopt lib */ +} + /* Parse the argument given in the command line of the application */ static int eal_parse_args(int argc, char **argv) @@ -317,8 +349,6 @@ eal_parse_args(int argc, char **argv) argvopt = argv; - eal_reset_internal_config(&internal_config); - while ((opt = getopt_long(argc, argvopt, eal_short_options, eal_long_options, &option_index)) != EOF) { @@ -447,6 +477,11 @@ rte_eal_init(int argc, char **argv) if (rte_eal_log_early_init() < 0) rte_panic("Cannot init early logs\n"); + eal_log_level_parse(argc, argv); + + /* set log level as early as possible */ + rte_set_log_level(internal_config.log_level); + if (rte_eal_cpu_init() < 0) rte_panic("Cannot detect lcores\n"); @@ -454,9 +489,6 @@ rte_eal_init(int argc, char **argv) if (fctret < 0) exit(1); - /* set log level as early as possible */ - rte_set_log_level(internal_config.log_level); - if (internal_config.no_hugetlbfs == 0 && internal_config.process_type != RTE_PROC_SECONDARY && eal_hugepage_info_init() < 0) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index bd770cf..4f8d0d9 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -499,6 +499,38 @@ eal_get_hugepage_mem_size(void) return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; } +/* Parse the arguments for --log-level only */ +static void +eal_log_level_parse(int argc, char **argv) +{ + int opt; + char **argvopt; + int option_index; + + argvopt = argv; + + eal_reset_internal_config(&internal_config); + + while ((opt = getopt_long(argc, argvopt, eal_short_options, + eal_long_options, &option_index)) != EOF) { + + int ret; + + /* getopt is not happy, stop right now */ + if (opt == '?') + break; + + ret = (opt == OPT_LOG_LEVEL_NUM)? + eal_parse_common_option(opt, optarg, &internal_config) : 0; + + /* common parser is not happy */ + if (ret < 0) + break; + } + + optind = 0; /* reset getopt lib */ +} + /* Parse the argument given in the command line of the application */ static int eal_parse_args(int argc, char **argv) @@ -511,8 +543,6 @@ eal_parse_args(int argc, char **argv) argvopt = argv; - eal_reset_internal_config(&internal_config); - while ((opt = getopt_long(argc, argvopt, eal_short_options, eal_long_options, &option_index)) != EOF) { @@ -717,6 +747,11 @@ rte_eal_init(int argc, char **argv) if (rte_eal_log_early_init() < 0) rte_panic("Cannot init early logs\n"); + eal_log_level_parse(argc, argv); + + /* set log level as early as possible */ + rte_set_log_level(internal_config.log_level); + if (rte_eal_cpu_init() < 0) rte_panic("Cannot detect lcores\n"); @@ -724,9 +759,6 @@ rte_eal_init(int argc, char **argv) if (fctret < 0)
[dpdk-dev] [RFC PATCH V4] ixgbe: changes to support PCI Port Hotplug
This patch depends on the Port Hotplug Framework. It implements the eth_dev_uninit functions for rte_ixgbe_pmd and rte_ixgbevf_pmd. Changes in V4: Release rx and tx queues in dev_uninit() functions. Replace TRUE and FALSE with 1 and 0. Changes in V3: Rebased to use drivers/net/ixgbe directory. Changes in V2: Added call to dev_close() in dev_uninit() functions. Removed input parameter checks from dev_uninit() functions. Signed-off-by: Bernard Iremonger --- drivers/net/ixgbe/ixgbe_ethdev.c | 107 -- drivers/net/ixgbe/ixgbe_ethdev.h |2 + drivers/net/ixgbe/ixgbe_pf.c | 22 3 files changed, 126 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 0d9f9b2..7f9d286 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -117,6 +117,7 @@ #define IXGBE_QUEUE_STAT_COUNTERS (sizeof(hw_stats->qprc) / sizeof(hw_stats->qprc[0])) static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev); +static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev); static int ixgbe_dev_configure(struct rte_eth_dev *dev); static int ixgbe_dev_start(struct rte_eth_dev *dev); static void ixgbe_dev_stop(struct rte_eth_dev *dev); @@ -183,6 +184,7 @@ static void ixgbe_dcb_init(struct ixgbe_hw *hw,struct ixgbe_dcb_config *dcb_conf /* For Virtual Function support */ static int eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev); +static int eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev); static int ixgbevf_dev_configure(struct rte_eth_dev *dev); static int ixgbevf_dev_start(struct rte_eth_dev *dev); static void ixgbevf_dev_stop(struct rte_eth_dev *dev); @@ -916,6 +918,57 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev) return 0; } +static int +eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev) +{ + struct rte_pci_device *pci_dev; + struct ixgbe_hw *hw; + unsigned i; + + PMD_INIT_FUNC_TRACE(); + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return -EPERM; + + hw = IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private); + pci_dev = eth_dev->pci_dev; + + if (hw->adapter_stopped == 0) + ixgbe_dev_close(eth_dev); + + eth_dev->dev_ops = NULL; + eth_dev->rx_pkt_burst = NULL; + eth_dev->tx_pkt_burst = NULL; + + /* Unlock any pending hardware semaphore */ + ixgbe_swfw_lock_reset(hw); + + /* disable uio intr before callback unregister */ + rte_intr_disable(&(pci_dev->intr_handle)); + rte_intr_callback_unregister(&(pci_dev->intr_handle), + ixgbe_dev_interrupt_handler, (void *)eth_dev); + + /* uninitialize PF if max_vfs not zero */ + ixgbe_pf_host_uninit(eth_dev); + + for (i = 0; i < eth_dev->data->nb_rx_queues; i++) { + ixgbe_dev_rx_queue_release(eth_dev->data->rx_queues[i]); + eth_dev->data->rx_queues[i] = NULL; + } + + for (i = 0; i < eth_dev->data->nb_tx_queues; i++) { + ixgbe_dev_tx_queue_release(eth_dev->data->tx_queues[i]); + eth_dev->data->tx_queues[i] = NULL; + } + + rte_free(eth_dev->data->mac_addrs); + eth_dev->data->mac_addrs = NULL; + + rte_free(eth_dev->data->hash_mac_addrs); + eth_dev->data->hash_mac_addrs = NULL; + + return 0; +} /* * Negotiate mailbox API version with the PF. @@ -1086,13 +1139,56 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev) return 0; } +/* Virtual Function device uninit */ + +static int +eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev) +{ + struct ixgbe_hw *hw; + unsigned i; + + PMD_INIT_FUNC_TRACE(); + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return -EPERM; + + hw = IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private); + + if (hw->adapter_stopped == 0) + ixgbevf_dev_close(eth_dev); + + eth_dev->dev_ops = NULL; + eth_dev->rx_pkt_burst = NULL; + eth_dev->tx_pkt_burst = NULL; + + /* Disable the interrupts for VF */ + ixgbevf_intr_disable(hw); + + for (i = 0; i < eth_dev->data->nb_rx_queues; i++) { + ixgbe_dev_rx_queue_release(eth_dev->data->rx_queues[i]); + eth_dev->data->rx_queues[i] = NULL; + } + + for (i = 0; i < eth_dev->data->nb_tx_queues; i++) { + ixgbe_dev_tx_queue_release(eth_dev->data->tx_queues[i]); + eth_dev->data->tx_queues[i] = NULL; + } + + rte_free(eth_dev->data->mac_addrs); + eth_dev->data->mac_addrs = NULL; + + return 0; +} + static struct eth_driver rte_ixgbe_pmd = { { .name = "rte_ixgbe_pmd", .id_table = pci_id_ixgbe_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | +
[dpdk-dev] [PATCH v2] vhost: provide vhost API to unregister vhost unix domain socket
On 6/8/15 11:38 AM, Xie, Huawei wrote: > On 6/5/2015 5:04 PM, Loftus, Ciara wrote: >> >>> -Original Message- >>> From: Xie, Huawei >>> Sent: Friday, June 05, 2015 4:26 AM >>> To: dev at dpdk.org >>> Cc: Loftus, Ciara; Xie, Huawei; Sun, Peng A >>> Subject: [PATCH v2] vhost: provide vhost API to unregister vhost unix domain >>> socket >>> >>> rte_vhost_driver_unregister will remove the listenfd from event list, and >>> then close it. >>> >>> Signed-off-by: Huawei Xie >>> Signed-off-by: Peng Sun >>> --- >>> lib/librte_vhost/rte_virtio_net.h| 3 ++ >>> lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 9 >>> lib/librte_vhost/vhost_user/vhost-net-user.c | 68 >>> +++- >>> lib/librte_vhost/vhost_user/vhost-net-user.h | 2 +- >>> 4 files changed, 69 insertions(+), 13 deletions(-) >>> >>> >> Acked-by: Ciara Loftus >> >> >> > Thomas: > Comments to this patch? After reading the patch, it looks straight forward. I want to compile and run OVS/DPDK with vhost-user patch linked with DPDK with this patch applied first. I will respond when that is complete. > This patch will remove the socket file and associated listen fd. > In future, I would also look at whether there is opportunity to attach a > id to each vhost user net interface from QEMU. > Currently DPDK OVS creates a socket file for each virtio device and use > the file path as the id for the port. > > /huawei > >
[dpdk-dev] [PATCH 4/4] app: replace dump_cfg with proc_info
2015-06-08 13:45, Tahhan, Maryam: > > > Extend dump_cfg to also display statistcs information for given DPDK > > > ports and rename the application to proc_info as it's now a utility > > > doing a little more than just dumping the memory information for DPDK. > > > > > > Signed-off-by: Maryam Tahhan > > > --- > > > app/Makefile | 2 +- > > > app/dump_cfg/Makefile | 45 - > > > app/dump_cfg/main.c| 92 - > > > app/proc_info/Makefile | 45 + > > > app/proc_info/main.c | 525 > > + > > > mk/rte.sdktest.mk | 4 +- > > > > It looks promising, thanks. > > Would you consider adding yourself as a maintainer of this app? > > Yep, I can do that. Should I add myself to the maintainers file in a separate > patch, or as a reworked version of this patch? In case there is no other comment, a separate patch is OK. Thanks
[dpdk-dev] Shared library build broken
I just noticed that shared library build is broking. I am building current master. I had to make this change to get it to build: -CONFIG_RTE_LIBRTE_PMD_BOND=y +CONFIG_RTE_LIBRTE_PMD_BOND=n One of the recent bonding commits broke some dependencies I think but I didn't investigate further. test_link_bonding.o: In function `test_add_slave_to_bonded_device': test_link_bonding.c:(.text+0x44a): undefined reference to `rte_eth_bond_slave_add' test_link_bonding.c:(.text+0x462): undefined reference to `rte_eth_bond_slaves_get' test_link_bonding.c:(.text+0x487): undefined reference to `rte_eth_bond_active_slaves_get --TFH
[dpdk-dev] [PATCH] doc: guidelines for library statistics
Signed-off-by: Cristian Dumitrescu --- doc/guides/guidelines/index.rst | 1 + doc/guides/guidelines/statistics.rst | 42 2 files changed, 43 insertions(+) create mode 100644 doc/guides/guidelines/statistics.rst diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst index b2b0a92..c01f958 --- a/doc/guides/guidelines/index.rst +++ b/doc/guides/guidelines/index.rst @@ -6,3 +6,4 @@ Guidelines :numbered: coding_style +statistics diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst new file mode 100644 index 000..32c6020 --- /dev/null +++ b/doc/guides/guidelines/statistics.rst @@ -0,0 +1,42 @@ +Library Statistics +== + +Description +--- + +This document describes the guidelines for DPDK library-level statistics counter support. This includes guidelines for turning library statistics on and off, requirements for preventing ABI changes when library statistics are turned on and off, etc. + +Motivation to allow the application to turn library statistics on and off +- + +It is highly recommended that each library provides statistics counters to allow the application to monitor the library-level run-time events. Typical counters are: number of packets received/dropped/transmitted, number of buffers allocated/freed, number of occurrences for specific events, etc. + +Since the resources consumed for library-level statistics counter collection have to be spent out of the application budget and the counters collected by some libraries might not be relevant for the current application, in order to avoid any unwanted waste of resources and/or performance for the application, the application is to decide at build time whether the collection of library-level statistics counters should be turned on or off for each library individually. + +Library-level statistics counters can be relevant or not for specific applications: +* For application A, counters maintained by library X are always relevant and the application needs to use them to implement certain features, as traffic accounting, logging, application-level statistics, etc. In this case, the application requires that collection of statistics counters for library X is always turned on; +* For application B, counters maintained by library X are only useful during the application debug stage and not relevant once debug phase is over. In this case, the application may decide to turn on the collection of library X statistics counters during the debug phase and later on turn them off; +* For application C, counters maintained by library X are not relevant at all. It might me that the application maintains its own set of statistics counters that monitor a different set of run-time events than library X (e.g. number of connection requests, number of active users, etc). It might also be that application uses multiple libraries (library X, library Y, etc) and it is interested in the statistics counters of library Y, but not in those of library X. In this case, the application may decide to turn the collection of statistics counters off for library X and on for library Y. + +The statistics collection consumes a certain amount of CPU resources (cycles, cache bandwidth, memory bandwidth, etc) that depends on: +* Number of libraries used by the current application that have statistics counters collection turned on; +* Number of statistics counters maintained by each library per object type instance (e.g. per port, table, pipeline, thread, etc); +* Number of instances created for each object type supported by each library; +* Complexity of the statistics logic collection for each counter: when only some occurrences of a specific event are valid, several conditional branches might involved in the decision of whether the current occurrence of the event should be counted or not (e.g. on the event of packet reception, only TCP packets with destination port within a certain range should be recorded), etc. + +Mechanism to allow the application to turn library statistics on and off + + +Each library that provides statistics counters should provide a single build time flag that decides whether the statistics counter collection is enabled or not for this library. This flag should be exposed as a variable within the DPDK configuration file. When this flag is set, all the counters supported by current library are collected; when this flag is cleared, none of the counters supported by the current library are collected: + + #DPDK file ?./config/common_linuxapp?, ?./config/common_bsdapp?, etc + CONFIG_RTE_LIBRTE__COLLECT_STATS=y/n + +The default value for this DPDK configuration file variable (either ?yes? or ?no?) is left at the decision of each library. + +Pr
[dpdk-dev] Intel X552/557 is not working.
Great! I will try it. Regards, -- Masafumi OE, NAOJ -Original Message- From: Masaru Oki [mailto:m-...@stratosphere.co.jp] Sent: Monday, June 8, 2015 11:32 AM To: Masafumi OE Cc: Subject: Re: [dpdk-dev] Intel X552/557 is not working. Hi, I made (unofficial, quick) patch. The code is mostly pulled from FreeBSD. My Ubuntu 14.04 on X10SDV-TLN4F is works fine. http://www.e-neta.jp/~oki/dpdk-ixgbe.diff 2015-06-08 11:19 GMT+09:00 Masafumi OE : > Hi, > > I'm trying to use X552/X557-AT 10GBASE-T NIC on Xeon-D 1540. However > it did not work properly. > Binding X552/557to PMD for ixgbe is fine but testpmd is not working on > X552/557 because th_ixgbe_dev_init() return Hardware Initialization > Failure:-3. > > Do you have any idea? > > -- > Supermicro X10SDV-TLN4F > Running on CentOS 7.0: > DPDK is getting via git. > -- > $ lspci -nn | grep X55 > 03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet > Connection X552/X557-AT 10GBASE-T [8086:15ad] > 03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet > Connection X552/X557-AT 10GBASE-T [8086:15ad] > > $ ./dpdk_nic_bind.py --status > > Network devices using DPDK-compatible driver > > :03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T' > drv=uio_pci_generic unused=vfio-pci > :03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T' > drv=uio_pci_generic unused=vfio-pci > > Network devices using kernel driver > === > :05:00.0 'I350 Gigabit Network Connection' if=eno1 drv=igb > unused=vfio-pci,uio_pci_generic *Active* > :05:00.1 'I350 Gigabit Network Connection' if=eno2 drv=igb > unused=vfio-pci,uio_pci_generic > > Other network devices > = > > > -- > $ sudo -s app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 > --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 > --txqflags=0xf01 > EAL: Detected lcore 0 as core 0 on socket 0 > EAL: Detected lcore 1 as core 1 on socket 0 > EAL: Detected lcore 2 as core 2 on socket 0 > EAL: Detected lcore 3 as core 3 on socket 0 > EAL: Detected lcore 4 as core 4 on socket 0 > EAL: Detected lcore 5 as core 5 on socket 0 > EAL: Detected lcore 6 as core 6 on socket 0 > EAL: Detected lcore 7 as core 7 on socket 0 > EAL: Detected lcore 8 as core 0 on socket 0 > EAL: Detected lcore 9 as core 1 on socket 0 > EAL: Detected lcore 10 as core 2 on socket 0 > EAL: Detected lcore 11 as core 3 on socket 0 > EAL: Detected lcore 12 as core 4 on socket 0 > EAL: Detected lcore 13 as core 5 on socket 0 > EAL: Detected lcore 14 as core 6 on socket 0 > EAL: Detected lcore 15 as core 7 on socket 0 > EAL: Support maximum 128 logical core(s) by configuration. > EAL: Detected 16 lcore(s) > EAL: VFIO modules not all loaded, skip VFIO support... > EAL: Setting up memory... > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7ff89320 (size = 0x20) > EAL: Ask a virtual area of 0xc0 bytes > EAL: Virtual area found at 0x7ff89240 (size = 0xc0) > EAL: Ask a virtual area of 0x700 bytes > EAL: Virtual area found at 0x7ff88b20 (size = 0x700) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7ff88ae0 (size = 0x20) > EAL: Requesting 64 pages of size 2MB from socket 0 > ^NEAL: TSC frequency is ~200 KHz > EAL: Master lcore 8 is ready (tid=9472d900;cpuset=[8]) > EAL: lcore 9 is ready (tid=8a5fe700;cpuset=[9]) > EAL: PCI device :03:00.0 on NUMA socket 0 > EAL: probe driver: 8086:15ad rte_ixgbe_pmd > EAL: PCI memory mapped at 0x7ff89300 > EAL: PCI memory mapped at 0x7ff8946f3000 > PMD: eth_ixgbe_dev_init(): Hardware Initialization Failure: -3 > EAL: Error - exiting with code: 1 > Cause: Requested device :03:00.0 cannot be used > > -- > Masafumi OE, NAOJ > >
[dpdk-dev] [PATCH v2] vhost: provide vhost API to unregister vhost unix domain socket
On 6/5/2015 5:04 PM, Loftus, Ciara wrote: > >> -Original Message- >> From: Xie, Huawei >> Sent: Friday, June 05, 2015 4:26 AM >> To: dev at dpdk.org >> Cc: Loftus, Ciara; Xie, Huawei; Sun, Peng A >> Subject: [PATCH v2] vhost: provide vhost API to unregister vhost unix domain >> socket >> >> rte_vhost_driver_unregister will remove the listenfd from event list, and >> then close it. >> >> Signed-off-by: Huawei Xie >> Signed-off-by: Peng Sun >> --- >> lib/librte_vhost/rte_virtio_net.h| 3 ++ >> lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 9 >> lib/librte_vhost/vhost_user/vhost-net-user.c | 68 >> +++- >> lib/librte_vhost/vhost_user/vhost-net-user.h | 2 +- >> 4 files changed, 69 insertions(+), 13 deletions(-) >> >> > Acked-by: Ciara Loftus > > > Thomas: Comments to this patch? This patch will remove the socket file and associated listen fd. In future, I would also look at whether there is opportunity to attach a id to each vhost user net interface from QEMU. Currently DPDK OVS creates a socket file for each virtio device and use the file path as the id for the port. /huawei
[dpdk-dev] [PATCH v1] app/test: fix pmd_perf issue in no NUMA case
Reported-by: Jayakumar, Muthurajan Signed-off-by: Cunming Liang --- app/test/test_pmd_perf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index 1fd6843..6f218f7 100644 --- a/app/test/test_pmd_perf.c +++ b/app/test/test_pmd_perf.c @@ -321,6 +321,19 @@ alloc_lcore(uint16_t socketid) return (uint16_t)-1; } +static int +get_socket_id(uint8_t port_id) +{ + int socket_id; + + socket_id = rte_eth_dev_socket_id(port_id); + if (socket_id < 0) + /* enforce using socket 0 when no NUMA support */ + socket_id = 0; + + return socket_id; +} + volatile uint64_t stop; uint64_t count; uint64_t drop; @@ -727,7 +740,7 @@ test_pmd_perf(void) num = 0; for (portid = 0; portid < nb_ports; portid++) { if (socketid == -1) { - socketid = rte_eth_dev_socket_id(portid); + socketid = get_socket_id(portid); slave_id = alloc_lcore(socketid); if (slave_id == (uint16_t)-1) { printf("No avail lcore to run test\n"); @@ -737,7 +750,7 @@ test_pmd_perf(void) slave_id, socketid); } - if (socketid != rte_eth_dev_socket_id(portid)) { + if (socketid != get_socket_id(portid)) { printf("Skip port %d\n", portid); continue; } @@ -818,7 +831,7 @@ test_pmd_perf(void) /* port tear down */ for (portid = 0; portid < nb_ports; portid++) { - if (socketid != rte_eth_dev_socket_id(portid)) + if (socketid != get_socket_id(portid)) continue; rte_eth_dev_stop(portid); -- 1.8.1.4
[dpdk-dev] [PATCH] eal:Fix log messages always being printed from rte_eal_cpu_init
On 6/8/15, 8:33 AM, "Wiles, Keith" wrote: > > >On 6/8/15, 6:09 AM, "Richardson, Bruce" >wrote: > >>On Sat, Jun 06, 2015 at 07:04:05PM -0500, Keith Wiles wrote: >>> The RTE_LOG(DEBUG, ...) messages in rte_eal_cpu_init() are printed >>> even when the log level on the command line was set to INFO or lower. >>> >>> The problem is the rte_eal_cpu_init() routine was called before >>> the command line args are scanned. Setting --log-level=7 now >>> correctly does not print the messages from the rte_eal_cpu_init() >>>routine. >>> >>> Signed-off-by: Keith Wiles >> >>This seems a good idea - make it easy to reduce the verbosity on startup >>if >>so desired. Some comments below. >> >>> --- >>> lib/librte_eal/bsdapp/eal/eal.c | 43 >>>++- >>> lib/librte_eal/linuxapp/eal/eal.c | 43 >>>++- >>> 2 files changed, 76 insertions(+), 10 deletions(-) >>> >>> diff --git a/lib/librte_eal/bsdapp/eal/eal.c >>>b/lib/librte_eal/bsdapp/eal/eal.c >>> index 43e8a47..ca10f2c 100644 >>> --- a/lib/librte_eal/bsdapp/eal/eal.c >>> +++ b/lib/librte_eal/bsdapp/eal/eal.c >>> @@ -306,6 +306,38 @@ eal_get_hugepage_mem_size(void) >>> return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; >>> } >>> >>> +/* Parse the arguments for --log-level only */ >>> +static void >>> +eal_log_level_parse(int argc, char **argv) >>> +{ >>> + int opt; >>> + char **argvopt; >>> + int option_index; >>> + >>> + argvopt = argv; >>> + >>> + eal_reset_internal_config(&internal_config); >>> + >>> + while ((opt = getopt_long(argc, argvopt, eal_short_options, >>> + eal_long_options, &option_index)) != EOF) { >>> + >>> + int ret; >>> + >>> + /* getopt is not happy, stop right now */ >>> + if (opt == '?') >>> + break; >>> + >>> + ret = (opt == OPT_LOG_LEVEL_NUM)? >>> + eal_parse_common_option(opt, optarg, &internal_config) >>> : 0; >>> + >>> + /* common parser is not happy */ >>> + if (ret < 0) >>> + break; >>> + } >>> + >>> + optind = 0; /* reset getopt lib */ >>> +} >>> + >>> /* Parse the argument given in the command line of the application */ >>> static int >>> eal_parse_args(int argc, char **argv) >>> @@ -317,8 +349,6 @@ eal_parse_args(int argc, char **argv) >>> >>> argvopt = argv; >>> >>> - eal_reset_internal_config(&internal_config); >>> - >>> while ((opt = getopt_long(argc, argvopt, eal_short_options, >>> eal_long_options, &option_index)) != EOF) { >>> >>> @@ -447,6 +477,12 @@ rte_eal_init(int argc, char **argv) >>> if (rte_eal_log_early_init() < 0) >>> rte_panic("Cannot init early logs\n"); >>> >>> + eal_log_level_parse(argc, argv); >>> + >>> + /* set log level as early as possible */ >>> + rte_set_log_level(internal_config.log_level); >>> + >>> + RTE_LOG(INFO, EAL, "DPDK Version %s\n", rte_version()); >> >>There is already the -v option to the EAL to print the DPDK version. Just >>add >>that flag to any command, as it has no other effects. I don't think we >>need to >>increase the verbosity of startup by always printing it. > >OK will remove, but it is one of the things you always need to know when >someone submits the startup messages. This way you do not have to put it >in the email or ask them to tell you. >> >>> if (rte_eal_cpu_init() < 0) >>> rte_panic("Cannot detect lcores\n"); >>> >>> @@ -454,9 +490,6 @@ rte_eal_init(int argc, char **argv) >>> if (fctret < 0) >>> exit(1); >>> >>> - /* set log level as early as possible */ >>> - rte_set_log_level(internal_config.log_level); >>> - >>> if (internal_config.no_hugetlbfs == 0 && >>> internal_config.process_type != RTE_PROC_SECONDARY && >>> eal_hugepage_info_init() < 0) >>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c >>>b/lib/librte_eal/linuxapp/eal/eal.c >>> index bd770cf..090ec99 100644 >>> --- a/lib/librte_eal/linuxapp/eal/eal.c >>> +++ b/lib/librte_eal/linuxapp/eal/eal.c >>> @@ -499,6 +499,38 @@ eal_get_hugepage_mem_size(void) >>> return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; >>> } >>> >>> +/* Parse the arguments for --log-level only */ >>> +static void >>> +eal_log_level_parse(int argc, char **argv) >>> +{ >>> + int opt; >>> + char **argvopt; >>> + int option_index; >>> + >>> + argvopt = argv; >>> + >>> + eal_reset_internal_config(&internal_config); >>> + >>> + while ((opt = getopt_long(argc, argvopt, eal_short_options, >>> + eal_long_options, &option_index)) != EOF) { >>> + >>> + int ret; >>> + >>> + /* getopt is not happy, stop right now */ >>> + if (opt == '?') >>> + break; >>> + >>> + ret = (opt == OPT_LOG_LEVEL_NUM)? >>> + eal_parse_common_option(opt, optarg, &internal_co
[dpdk-dev] [PATCH 4/4] app: replace dump_cfg with proc_info
> > Extend dump_cfg to also display statistcs information for given DPDK > > ports and rename the application to proc_info as it's now a utility > > doing a little more than just dumping the memory information for DPDK. > > > > Signed-off-by: Maryam Tahhan > > --- > > app/Makefile | 2 +- > > app/dump_cfg/Makefile | 45 - > > app/dump_cfg/main.c| 92 - > > app/proc_info/Makefile | 45 + > > app/proc_info/main.c | 525 > + > > mk/rte.sdktest.mk | 4 +- > > It looks promising, thanks. > Would you consider adding yourself as a maintainer of this app? Yep, I can do that. Should I add myself to the maintainers file in a separate patch, or as a reworked version of this patch? BR Maryam
[dpdk-dev] [PATCH] eal:Fix log messages always being printed from rte_eal_cpu_init
On 6/8/15, 6:09 AM, "Richardson, Bruce" wrote: >On Sat, Jun 06, 2015 at 07:04:05PM -0500, Keith Wiles wrote: >> The RTE_LOG(DEBUG, ...) messages in rte_eal_cpu_init() are printed >> even when the log level on the command line was set to INFO or lower. >> >> The problem is the rte_eal_cpu_init() routine was called before >> the command line args are scanned. Setting --log-level=7 now >> correctly does not print the messages from the rte_eal_cpu_init() >>routine. >> >> Signed-off-by: Keith Wiles > >This seems a good idea - make it easy to reduce the verbosity on startup >if >so desired. Some comments below. > >> --- >> lib/librte_eal/bsdapp/eal/eal.c | 43 >>++- >> lib/librte_eal/linuxapp/eal/eal.c | 43 >>++- >> 2 files changed, 76 insertions(+), 10 deletions(-) >> >> diff --git a/lib/librte_eal/bsdapp/eal/eal.c >>b/lib/librte_eal/bsdapp/eal/eal.c >> index 43e8a47..ca10f2c 100644 >> --- a/lib/librte_eal/bsdapp/eal/eal.c >> +++ b/lib/librte_eal/bsdapp/eal/eal.c >> @@ -306,6 +306,38 @@ eal_get_hugepage_mem_size(void) >> return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; >> } >> >> +/* Parse the arguments for --log-level only */ >> +static void >> +eal_log_level_parse(int argc, char **argv) >> +{ >> +int opt; >> +char **argvopt; >> +int option_index; >> + >> +argvopt = argv; >> + >> +eal_reset_internal_config(&internal_config); >> + >> +while ((opt = getopt_long(argc, argvopt, eal_short_options, >> + eal_long_options, &option_index)) != EOF) { >> + >> +int ret; >> + >> +/* getopt is not happy, stop right now */ >> +if (opt == '?') >> +break; >> + >> +ret = (opt == OPT_LOG_LEVEL_NUM)? >> +eal_parse_common_option(opt, optarg, &internal_config) >> : 0; >> + >> +/* common parser is not happy */ >> +if (ret < 0) >> +break; >> +} >> + >> +optind = 0; /* reset getopt lib */ >> +} >> + >> /* Parse the argument given in the command line of the application */ >> static int >> eal_parse_args(int argc, char **argv) >> @@ -317,8 +349,6 @@ eal_parse_args(int argc, char **argv) >> >> argvopt = argv; >> >> -eal_reset_internal_config(&internal_config); >> - >> while ((opt = getopt_long(argc, argvopt, eal_short_options, >>eal_long_options, &option_index)) != EOF) { >> >> @@ -447,6 +477,12 @@ rte_eal_init(int argc, char **argv) >> if (rte_eal_log_early_init() < 0) >> rte_panic("Cannot init early logs\n"); >> >> +eal_log_level_parse(argc, argv); >> + >> +/* set log level as early as possible */ >> +rte_set_log_level(internal_config.log_level); >> + >> +RTE_LOG(INFO, EAL, "DPDK Version %s\n", rte_version()); > >There is already the -v option to the EAL to print the DPDK version. Just >add >that flag to any command, as it has no other effects. I don't think we >need to >increase the verbosity of startup by always printing it. OK will remove, but it is one of the things you always need to know when someone submits the startup messages. This way you do not have to put it in the email or ask them to tell you. > >> if (rte_eal_cpu_init() < 0) >> rte_panic("Cannot detect lcores\n"); >> >> @@ -454,9 +490,6 @@ rte_eal_init(int argc, char **argv) >> if (fctret < 0) >> exit(1); >> >> -/* set log level as early as possible */ >> -rte_set_log_level(internal_config.log_level); >> - >> if (internal_config.no_hugetlbfs == 0 && >> internal_config.process_type != RTE_PROC_SECONDARY && >> eal_hugepage_info_init() < 0) >> diff --git a/lib/librte_eal/linuxapp/eal/eal.c >>b/lib/librte_eal/linuxapp/eal/eal.c >> index bd770cf..090ec99 100644 >> --- a/lib/librte_eal/linuxapp/eal/eal.c >> +++ b/lib/librte_eal/linuxapp/eal/eal.c >> @@ -499,6 +499,38 @@ eal_get_hugepage_mem_size(void) >> return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; >> } >> >> +/* Parse the arguments for --log-level only */ >> +static void >> +eal_log_level_parse(int argc, char **argv) >> +{ >> +int opt; >> +char **argvopt; >> +int option_index; >> + >> +argvopt = argv; >> + >> +eal_reset_internal_config(&internal_config); >> + >> +while ((opt = getopt_long(argc, argvopt, eal_short_options, >> + eal_long_options, &option_index)) != EOF) { >> + >> +int ret; >> + >> +/* getopt is not happy, stop right now */ >> +if (opt == '?') >> +break; >> + >> +ret = (opt == OPT_LOG_LEVEL_NUM)? >> +eal_parse_common_option(opt, optarg, &internal_config) >> : 0; >> + >> +/* common parser is not happy */ >> +if (ret < 0) >> +break; >> +} >>
[dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue
RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1. The usrs should make sure understand the impact before turning on the feature. There are two abi changes required in this interrupt patch set. They're 1) struct rte_intr_handle; 2) struct rte_intr_conf. Signed-off-by: Cunming Liang --- v9 Acked-by: vincent jardin drivers/net/e1000/igb_ethdev.c | 28 - drivers/net/ixgbe/ixgbe_ethdev.c | 41 - examples/l3fwd-power/main.c| 3 +- .../bsdapp/eal/include/exec-env/rte_interrupts.h | 7 +++ lib/librte_eal/linuxapp/eal/eal_interrupts.c | 12 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +- lib/librte_ether/rte_ethdev.c | 2 + lib/librte_ether/rte_ethdev.h | 32 +- 8 files changed, 182 insertions(+), 11 deletions(-) diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index bbd7b74..6f29222 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -96,7 +96,9 @@ static int eth_igb_flow_ctrl_get(struct rte_eth_dev *dev, static int eth_igb_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf); static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev); +#ifdef RTE_EAL_RX_INTR static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev); +#endif static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev); static int eth_igb_interrupt_action(struct rte_eth_dev *dev); static void eth_igb_interrupt_handler(struct rte_intr_handle *handle, @@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +#ifdef RTE_EAL_RX_INTR static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction, uint8_t queue, uint8_t msix_vector); +#endif static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev); +#ifdef RTE_EAL_RX_INTR static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector, uint8_t index, uint8_t offset); +#endif /* * Define VF Stats MACRO for Non "cleared on read" register @@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev) struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle; +#ifdef RTE_EAL_RX_INTR uint32_t intr_vector = 0; +#endif int ret, mask; uint32_t ctrl_ext; @@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev) /* configure PF module if SRIOV enabled */ igb_pf_host_configure(dev); +#ifdef RTE_EAL_RX_INTR /* check and configure queue intr-vector mapping */ if (dev->data->dev_conf.intr_conf.rxq != 0) intr_vector = dev->data->nb_rx_queues; @@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev) return -ENOMEM; } } +#endif /* confiugre msix for rx interrupt */ eth_igb_configure_msix_intr(dev); @@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev) " no intr multiplex\n"); } +#ifdef RTE_EAL_RX_INTR /* check if rxq interrupt is enabled */ if (dev->data->dev_conf.intr_conf.rxq != 0) eth_igb_rxq_interrupt_setup(dev); +#endif /* enable uio/vfio intr/eventfd mapping */ rte_intr_enable(intr_handle); @@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev) } filter_info->twotuple_mask = 0; +#ifdef RTE_EAL_RX_INTR /* Clean datapath event and queue/vec mapping */ rte_intr_efd_disable(intr_handle); if (intr_handle->intr_vec != NULL) { rte_free(intr_handle->intr_vec); intr_handle->intr_vec = NULL; } +#endif } static void @@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev) { struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct rte_eth_link link; +#ifdef RTE_EAL_RX_INTR struct rte_pci_device *pci_dev; +#endif eth_igb_stop(dev); e1000_phy_hw_reset(hw); @@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev) igb_dev_clear_queues(dev); +#ifdef RTE_EAL_RX_INTR pci_dev = dev->pci_dev; if (pci_dev->intr_handle.intr_vec) { rte_free(pci_dev->intr_handle.intr_vec); pci_dev->intr_handle.intr_vec = NULL; } +#endif memset(&link, 0, sizeof(link)); rte_igb_dev_atomic_write_link_status(dev, &link); @@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
[dpdk-dev] [PATCH v12 13/14] l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch
Demonstrate how to handle per rx queue interrupt in a NAPI-like implementation in usersapce. PDK polling thread mainly works in polling mode and switch to interrupt mode only if there is no any packet received in recent polls. Usersapce interrupt notification generally takes a lot more cycles than kernel, so one-shot interrupt is used here to guarantee minimum overhead and DPDK polling thread returns to polling mode immediately once it receives an interrupt notificaiton for incoming packet. Signed-off-by: Danny Zhou Signed-off-by: Cunming Liang --- v7 changes - using new APIs - demo multiple port/queue pair wait on the same epoll instance v6 changes - Split event fd add and wait v5 changes - Change invoked function name and parameter to accomodate EAL change v3 changes - Add spinlock to ensure thread safe when accessing interrupt mask register v2 changes - Remove unused function which is for debug purpose examples/l3fwd-power/main.c | 207 +++- 1 file changed, 165 insertions(+), 42 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 6ac342b..538bb93 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -74,12 +74,14 @@ #include #include #include +#include +#include #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1 #define MAX_PKT_BURST 32 -#define MIN_ZERO_POLL_COUNT 5 +#define MIN_ZERO_POLL_COUNT 10 /* around 100ms at 2 Ghz */ #define TIMER_RESOLUTION_CYCLES 2ULL @@ -153,6 +155,9 @@ static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT; /* ethernet addresses of ports */ static struct ether_addr ports_eth_addr[RTE_MAX_ETHPORTS]; +/* ethernet addresses of ports */ +static rte_spinlock_t locks[RTE_MAX_ETHPORTS]; + /* mask of enabled ports */ static uint32_t enabled_port_mask = 0; /* Ports set in promiscuous mode off by default. */ @@ -185,6 +190,9 @@ struct lcore_rx_queue { #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS #define MAX_RX_QUEUE_PER_PORT 128 +#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16 + + #define MAX_LCORE_PARAMS 1024 struct lcore_params { uint8_t port_id; @@ -211,7 +219,7 @@ static uint16_t nb_lcore_params = sizeof(lcore_params_array_default) / static struct rte_eth_conf port_conf = { .rxmode = { - .mq_mode= ETH_MQ_RX_RSS, + .mq_mode = ETH_MQ_RX_RSS, .max_rx_pkt_len = ETHER_MAX_LEN, .split_hdr_size = 0, .header_split = 0, /**< Header Split disabled */ @@ -223,11 +231,15 @@ static struct rte_eth_conf port_conf = { .rx_adv_conf = { .rss_conf = { .rss_key = NULL, - .rss_hf = ETH_RSS_IP, + .rss_hf = ETH_RSS_UDP, }, }, .txmode = { - .mq_mode = ETH_DCB_NONE, + .mq_mode = ETH_MQ_TX_NONE, + }, + .intr_conf = { + .lsc = 1, + .rxq = 1, /**< rxq interrupt feature enabled */ }, }; @@ -399,19 +411,22 @@ power_timer_cb(__attribute__((unused)) struct rte_timer *tim, /* accumulate total execution time in us when callback is invoked */ sleep_time_ratio = (float)(stats[lcore_id].sleep_time) / (float)SCALING_PERIOD; - /** * check whether need to scale down frequency a step if it sleep a lot. */ - if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) - rte_power_freq_down(lcore_id); + if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) { + if (rte_power_freq_down) + rte_power_freq_down(lcore_id); + } else if ( (unsigned)(stats[lcore_id].nb_rx_processed / - stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) + stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) { /** * scale down a step if average packet per iteration less * than expectation. */ - rte_power_freq_down(lcore_id); + if (rte_power_freq_down) + rte_power_freq_down(lcore_id); + } /** * initialize another timer according to current frequency to ensure @@ -704,22 +719,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, } -#define SLEEP_GEAR1_THRESHOLD100 -#define SLEEP_GEAR2_THRESHOLD1000 +#define MINIMUM_SLEEP_TIME 1 +#define SUSPEND_THRESHOLD 300 static inline uint32_t power_idle_heuristic(uint32_t zero_rx_packet_count) { - /* If zero count is less than 100, use it as the sleep time in us */ - if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD) - return zero_rx_packet_count; - /* If zero count is less than 1000, sleep time should be 100 us */ - else if ((zero_rx_pa
[dpdk-dev] [PATCH v12 12/14] igb: enable rx queue interrupts for PF
The patch does below for igb PF: - Setup NIC to generate MSI-X interrupts - Set the IVAR register to map interrupt causes to vectors - Implement interrupt enable/disable functions Signed-off-by: Danny Zhou Signed-off-by: Cunming Liang --- v9 changes - move queue-vec mapping init from dev_configure to dev_start - fix link interrupt not working issue in vfio-msix v8 changes - add vfio-msi/vfio-legacy and uio-legacy support v7 changes - add condition check when intr vector is not enabled v6 changes - fill queue-vector mapping table v5 changes - Rebase the patchset onto the HEAD v3 changes - Remove unnecessary variables in e1000_mac_info - Remove spinlok from PMD v2 changes - Consolidate review comments related to coding style drivers/net/e1000/igb_ethdev.c | 285 - 1 file changed, 252 insertions(+), 33 deletions(-) diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index e4b370d..bbd7b74 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -96,6 +96,7 @@ static int eth_igb_flow_ctrl_get(struct rte_eth_dev *dev, static int eth_igb_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf); static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev); +static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev); static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev); static int eth_igb_interrupt_action(struct rte_eth_dev *dev); static void eth_igb_interrupt_handler(struct rte_intr_handle *handle, @@ -194,6 +195,16 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev, enum rte_filter_op filter_op, void *arg); +static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, + uint16_t queue_id); +static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev, + uint16_t queue_id); +static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction, + uint8_t queue, uint8_t msix_vector); +static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev); +static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector, + uint8_t index, uint8_t offset); + /* * Define VF Stats MACRO for Non "cleared on read" register */ @@ -253,6 +264,8 @@ static const struct eth_dev_ops eth_igb_ops = { .vlan_tpid_set= eth_igb_vlan_tpid_set, .vlan_offload_set = eth_igb_vlan_offload_set, .rx_queue_setup = eth_igb_rx_queue_setup, + .rx_queue_intr_enable = eth_igb_rx_queue_intr_enable, + .rx_queue_intr_disable = eth_igb_rx_queue_intr_disable, .rx_queue_release = eth_igb_rx_queue_release, .rx_queue_count = eth_igb_rx_queue_count, .rx_descriptor_done = eth_igb_rx_descriptor_done, @@ -584,12 +597,6 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev) eth_dev->data->port_id, pci_dev->id.vendor_id, pci_dev->id.device_id); - rte_intr_callback_register(&(pci_dev->intr_handle), - eth_igb_interrupt_handler, (void *)eth_dev); - - /* enable uio intr after callback register */ - rte_intr_enable(&(pci_dev->intr_handle)); - /* enable support intr */ igb_intr_enable(eth_dev); @@ -752,7 +759,9 @@ eth_igb_start(struct rte_eth_dev *dev) { struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); - int ret, i, mask; + struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle; + uint32_t intr_vector = 0; + int ret, mask; uint32_t ctrl_ext; PMD_INIT_FUNC_TRACE(); @@ -792,6 +801,27 @@ eth_igb_start(struct rte_eth_dev *dev) /* configure PF module if SRIOV enabled */ igb_pf_host_configure(dev); + /* check and configure queue intr-vector mapping */ + if (dev->data->dev_conf.intr_conf.rxq != 0) + intr_vector = dev->data->nb_rx_queues; + + if (rte_intr_efd_enable(intr_handle, intr_vector)) + return -1; + + if (rte_intr_dp_is_en(intr_handle)) { + intr_handle->intr_vec = + rte_zmalloc("intr_vec", + dev->data->nb_rx_queues * sizeof(int), 0); + if (intr_handle->intr_vec == NULL) { + PMD_INIT_LOG(ERR, "Failed to allocate %d rx_queues" +" intr_vec\n", dev->data->nb_rx_queues); + return -ENOMEM; + } + } + + /* confiugre msix for rx interrupt */ + eth_igb_configure_msix_intr(dev); + /* Configure for OS presence */ igb_init_manageability(hw); @@ -819,33 +849,9 @@ eth_igb_start(struct rte_eth_dev *dev) igb_vmdq_vlan_hw_filter_enable
[dpdk-dev] [PATCH v12 11/14] ixgbe: enable rx queue interrupts for both PF and VF
The patch does below things for ixgbe PF and VF: - Setup NIC to generate MSI-X interrupts - Set the IVAR register to map interrupt causes to vectors - Implement interrupt enable/disable functions Signed-off-by: Danny Zhou Signed-off-by: Yong Liu Signed-off-by: Cunming Liang --- v10 changes - return an actual error code rather than -1 v9 changes - move queue-vec mapping init from dev_configure to dev_start v8 changes - add vfio-msi/vfio-legacy and uio-legacy support v7 changes - add condition check when intr vector is not enabled v6 changes - fill queue-vector mapping table v5 changes - Rebase the patchset onto the HEAD v3 changes - Remove spinlok from PMD v2 changes - Consolidate review comments related to coding style drivers/net/ixgbe/ixgbe_ethdev.c | 484 ++- drivers/net/ixgbe/ixgbe_ethdev.h | 4 + 2 files changed, 476 insertions(+), 12 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 0d9f9b2..bcec971 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -82,6 +82,9 @@ */ #define IXGBE_FC_LO0x40 +/* Default minimum inter-interrupt interval for EITR configuration */ +#define IXGBE_MIN_INTER_INTERRUPT_INTERVAL_DEFAULT0x79E + /* Timer value included in XOFF frames. */ #define IXGBE_FC_PAUSE 0x680 @@ -171,6 +174,7 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev, uint16_t reta_size); static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev); static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev); +static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev); static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev); static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev); static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle, @@ -183,11 +187,14 @@ static void ixgbe_dcb_init(struct ixgbe_hw *hw,struct ixgbe_dcb_config *dcb_conf /* For Virtual Function support */ static int eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev); +static int ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev); +static int ixgbevf_dev_interrupt_action(struct rte_eth_dev *dev); static int ixgbevf_dev_configure(struct rte_eth_dev *dev); static int ixgbevf_dev_start(struct rte_eth_dev *dev); static void ixgbevf_dev_stop(struct rte_eth_dev *dev); static void ixgbevf_dev_close(struct rte_eth_dev *dev); static void ixgbevf_intr_disable(struct ixgbe_hw *hw); +static void ixgbevf_intr_enable(struct ixgbe_hw *hw); static void ixgbevf_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats); static void ixgbevf_dev_stats_reset(struct rte_eth_dev *dev); @@ -197,6 +204,15 @@ static void ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t queue, int on); static void ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int mask); static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on); +static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle, + void *param); +static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, + uint16_t queue_id); +static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, +uint16_t queue_id); +static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction, +uint8_t queue, uint8_t msix_vector); +static void ixgbevf_configure_msix(struct rte_eth_dev *dev); /* For Eth VMDQ APIs support */ static int ixgbe_uc_hash_table_set(struct rte_eth_dev *dev, struct @@ -214,6 +230,14 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev, static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev, uint8_t rule_id); +static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, + uint16_t queue_id); +static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, + uint16_t queue_id); +static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction, + uint8_t queue, uint8_t msix_vector); +static void ixgbe_configure_msix(struct rte_eth_dev *dev); + static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx, uint16_t tx_rate); static int ixgbe_set_vf_rate_limit(struct rte_eth_dev *dev, uint16_t vf, @@ -262,7 +286,7 @@ static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu); */ #define UPDATE_VF_STAT(reg, last, cur) \ { \ - u32 latest = IXGBE_READ_REG(hw, reg); \ + uint32_t latest = IXGBE_READ_REG(hw, reg); \ cur += latest - last; \ last = latest; \ } @@ -343,6 +367,8 @@ sta
[dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions
The patch adds two dev_ops functions to enable and disable rx queue interrupts. In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set. Signed-off-by: Danny Zhou Signed-off-by: Cunming Liang --- v9 changes - remove unnecessary check after rte_eth_dev_is_valid_port. the same as http://www.dpdk.org/dev/patchwork/patch/4784 v8 changes - add addtion check for EEXIT v7 changes - remove rx_intr_vec_get - add rx_intr_ctl and rx_intr_ctl_q v6 changes - add rx_intr_vec_get to retrieve the vector num of the queue. v5 changes - Rebase the patchset onto the HEAD v4 changes - Export interrupt enable/disable functions for shared libraries - Put new functions at the end of eth_dev_ops to avoid breaking ABI v3 changes - Add return value for interrupt enable/disable functions lib/librte_ether/rte_ethdev.c | 107 + lib/librte_ether/rte_ethdev.h | 104 lib/librte_ether/rte_ether_version.map | 4 ++ 3 files changed, 215 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 5a94654..27a87f5 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev, } rte_spinlock_unlock(&rte_eth_dev_cb_lock); } + +int +rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data) +{ + uint32_t vec; + struct rte_eth_dev *dev; + struct rte_intr_handle *intr_handle; + uint16_t qid; + int rc; + + if (!rte_eth_dev_is_valid_port(port_id)) { + PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id); + return -ENODEV; + } + + dev = &rte_eth_devices[port_id]; + intr_handle = &dev->pci_dev->intr_handle; + if (!intr_handle->intr_vec) { + PMD_DEBUG_TRACE("RX Intr vector unset\n"); + return -EPERM; + } + + for (qid = 0; qid < dev->data->nb_rx_queues; qid++) { + vec = intr_handle->intr_vec[qid]; + rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data); + if (rc && rc != -EEXIST) { + PMD_DEBUG_TRACE("p %u q %u rx ctl error" + " op %d epfd %d vec %u\n", + port_id, qid, op, epfd, vec); + } + } + + return 0; +} + +int +rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id, + int epfd, int op, void *data) +{ + uint32_t vec; + struct rte_eth_dev *dev; + struct rte_intr_handle *intr_handle; + int rc; + + if (!rte_eth_dev_is_valid_port(port_id)) { + PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id); + return -ENODEV; + } + + dev = &rte_eth_devices[port_id]; + if (queue_id >= dev->data->nb_rx_queues) { + PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id); + return -EINVAL; + } + + intr_handle = &dev->pci_dev->intr_handle; + if (!intr_handle->intr_vec) { + PMD_DEBUG_TRACE("RX Intr vector unset\n"); + return -EPERM; + } + + vec = intr_handle->intr_vec[queue_id]; + rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data); + if (rc && rc != -EEXIST) { + PMD_DEBUG_TRACE("p %u q %u rx ctl error" + " op %d epfd %d vec %u\n", + port_id, queue_id, op, epfd, vec); + return rc; + } + + return 0; +} + +int +rte_eth_dev_rx_intr_enable(uint8_t port_id, + uint16_t queue_id) +{ + struct rte_eth_dev *dev; + + if (!rte_eth_dev_is_valid_port(port_id)) { + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); + return -ENODEV; + } + + dev = &rte_eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP); + return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id); +} + +int +rte_eth_dev_rx_intr_disable(uint8_t port_id, + uint16_t queue_id) +{ + struct rte_eth_dev *dev; + + if (!rte_eth_dev_is_valid_port(port_id)) { + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); + return -ENODEV; + } + + dev = &rte_eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP); + return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id); +} + #ifdef RTE_NIC_BYPASS int rte_eth_dev_bypass_init(uint8_t port_id) { diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 16dbe00..c199d32 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -830,6 +830,8 @@ struct rte_eth_fdir { struct rte_intr_conf {
[dpdk-dev] [PATCH v12 09/14] eal/bsd: fix inappropriate linuxapp referred in bsd
Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h index 5ae64af..ba4640a 100644 --- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h @@ -35,8 +35,8 @@ #error "don't include this file directly, please include generic " #endif -#ifndef _RTE_LINUXAPP_INTERRUPTS_H_ -#define _RTE_LINUXAPP_INTERRUPTS_H_ +#ifndef _RTE_BSDAPP_INTERRUPTS_H_ +#define _RTE_BSDAPP_INTERRUPTS_H_ #include @@ -129,4 +129,4 @@ rte_intr_allow_others(struct rte_intr_handle *intr_handle) return 1; } -#endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */ +#endif /* _RTE_BSDAPP_INTERRUPTS_H_ */ -- 1.8.1.4
[dpdk-dev] [PATCH v12 08/14] eal/bsd: dummy for new intr definition
To make bsd compiling happy with new intr changes. Signed-off-by: Cunming Liang --- v12 changes - fix unused variables compiling warning v8 changes - add stub for new function v7 changes - remove stub 'linux only' function from source file lib/librte_eal/bsdapp/eal/eal_interrupts.c | 30 + .../bsdapp/eal/include/exec-env/rte_interrupts.h | 78 ++ lib/librte_eal/bsdapp/eal/rte_eal_version.map | 5 ++ 3 files changed, 113 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_interrupts.c b/lib/librte_eal/bsdapp/eal/eal_interrupts.c index cb7d4f1..ee3d428 100644 --- a/lib/librte_eal/bsdapp/eal/eal_interrupts.c +++ b/lib/librte_eal/bsdapp/eal/eal_interrupts.c @@ -69,3 +69,33 @@ rte_eal_intr_init(void) return 0; } +int +rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, + int epfd, int op, unsigned int vec, void *data) +{ + RTE_SET_USED(intr_handle); + RTE_SET_USED(epfd); + RTE_SET_USED(op); + RTE_SET_USED(vec); + RTE_SET_USED(data); + + return -ENOTSUP; +} + +int +rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd) +{ + RTE_SET_USED(intr_handle); + RTE_SET_USED(nb_efd); + + return 0; +} + +void +rte_intr_efd_disable(struct rte_intr_handle *intr_handle) +{ + RTE_SET_USED(intr_handle); + + return; +} + diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h index 87a9cf6..5ae64af 100644 --- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h @@ -38,6 +38,8 @@ #ifndef _RTE_LINUXAPP_INTERRUPTS_H_ #define _RTE_LINUXAPP_INTERRUPTS_H_ +#include + enum rte_intr_handle_type { RTE_INTR_HANDLE_UNKNOWN = 0, RTE_INTR_HANDLE_UIO, /**< uio device handle */ @@ -49,6 +51,82 @@ enum rte_intr_handle_type { struct rte_intr_handle { int fd; /**< file descriptor */ enum rte_intr_handle_type type; /**< handle type */ + int max_intr;/**< max interrupt requested */ + uint32_t nb_efd; /**< number of available efds */ + int *intr_vec; /**< intr vector number array */ }; +/** + * @param intr_handle + * Pointer to the interrupt handle. + * @param epfd + * Epoll instance fd which the intr vector associated to. + * @param op + * The operation be performed for the vector. + * Operation type of {ADD, DEL}. + * @param vec + * RX intr vector number added to the epoll instance wait list. + * @param data + * User raw data. + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, + int epfd, int op, unsigned int vec, void *data); + +/** + * It enables the fastpath event fds if it's necessary. + * It creates event fds when multi-vectors allowed, + * otherwise it multiplexes the single event fds. + * + * @param intr_handle + * Pointer to the interrupt handle. + * @param nb_vec + * Number of intrrupt vector trying to enable. + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd); + +/** + * It disable the fastpath event fds. + * It deletes registered eventfds and closes the open fds. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +void +rte_intr_efd_disable(struct rte_intr_handle *intr_handle); + +/** + * The fastpath interrupt is enabled or not. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +static inline int +rte_intr_dp_is_en(struct rte_intr_handle *intr_handle) +{ + RTE_SET_USED(intr_handle); + return 0; +} + +/** + * The interrupt handle instance allows other cause or not. + * Other cause stands for none fastpath interrupt. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +static inline int +rte_intr_allow_others(struct rte_intr_handle *intr_handle) +{ + RTE_SET_USED(intr_handle); + return 1; +} + #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */ diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 67b6a6c..a74671b 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -53,8 +53,13 @@ DPDK_2.0 { rte_hexdump; rte_intr_callback_register; rte_intr_callback_unregister; + rte_intr_allow_others; rte_intr_disable; + rte_intr_dp_is_en; + rte_intr_efd_enable; + rte_intr_efd_disable; rte_intr_enable; + rte_intr_rx_ctl; rte_log; rte_log_add_in_history; rte_log_cur_msg_loglevel; -- 1.8.1.4
[dpdk-dev] [PATCH v12 07/14] eal/linux: fix lsc read error in uio_pci_generic
The new UIO generic handle type was introduced by patch. http://dpdk.org/ml/archives/dev/2015-April/017008.html When using uio_pci_generic and turning on lsc interrupt, it complains fd read error. The root cause is the 'count' size of read is not correct. Reported-by: Yong Liu Signed-off-by: Cunming Liang --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 5519e7c..d7a5403 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -678,6 +678,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) /* set the length to be read dor different handle type */ switch (src->intr_handle.type) { case RTE_INTR_HANDLE_UIO: + case RTE_INTR_HANDLE_UIO_INTX: bytes_read = sizeof(buf.uio_intr_count); break; case RTE_INTR_HANDLE_ALARM: -- 1.8.1.4
[dpdk-dev] [PATCH v12 06/14] eal/linux: standalone intr event fd create support
The patch exposes intr event fd create and release for PMD. The device driver can assign the number of event associated with interrupt vector. It also provides misc functions to check 1) allows other slowpath intr(e.g. lsc); 2) intr event on fastpath is enabled or not. Signed-off-by: Cunming Liang --- v11 changes - typo cleanup lib/librte_eal/linuxapp/eal/eal_interrupts.c | 57 ++ .../linuxapp/eal/include/exec-env/rte_interrupts.h | 51 +++ lib/librte_eal/linuxapp/eal/rte_eal_version.map| 4 ++ 3 files changed, 112 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index d35c874..5519e7c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -68,6 +69,7 @@ #include "eal_vfio.h" #define EAL_INTR_EPOLL_WAIT_FOREVER (-1) +#define NB_OTHER_INTR 1 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */ @@ -1110,3 +1112,58 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd, return rc; } + +int +rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd) +{ + uint32_t i; + int fd; + uint32_t n = RTE_MIN(nb_efd, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID); + + if (intr_handle->type == RTE_INTR_HANDLE_VFIO_MSIX) { + for (i = 0; i < n; i++) { + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + if (fd < 0) { + RTE_LOG(ERR, EAL, + "cannot setup eventfd," + "error %i (%s)\n", + errno, strerror(errno)); + return -1; + } + intr_handle->efds[i] = fd; + } + intr_handle->nb_efd = n; + intr_handle->max_intr = NB_OTHER_INTR + n; + } else { + intr_handle->efds[0] = intr_handle->fd; + intr_handle->nb_efd = RTE_MIN(nb_efd, 1U); + intr_handle->max_intr = NB_OTHER_INTR; + } + + return 0; +} + +void +rte_intr_efd_disable(struct rte_intr_handle *intr_handle) +{ + uint32_t i; + struct rte_epoll_event *rev; + + for (i = 0; i < intr_handle->nb_efd; i++) { + rev = &intr_handle->elist[i]; + if (rev->status == RTE_EPOLL_INVALID) + continue; + if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) { + /* force free if the entry valid */ + eal_epoll_data_safe_free(rev); + rev->status = RTE_EPOLL_INVALID; + } + } + + if (intr_handle->max_intr > intr_handle->nb_efd) { + for (i = 0; i < intr_handle->nb_efd; i++) + close(intr_handle->efds[i]); + } + intr_handle->nb_efd = 0; + intr_handle->max_intr = 0; +} diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h index 3e93a27..912cc50 100644 --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h @@ -166,4 +166,55 @@ int rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd, int op, unsigned int vec, void *data); +/** + * It enables the fastpath event fds if it's necessary. + * It creates event fds when multi-vectors allowed, + * otherwise it multiplexes the single event fds. + * + * @param intr_handle + * Pointer to the interrupt handle. + * @param nb_vec + * Number of interrupt vector trying to enable. + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd); + +/** + * It disable the fastpath event fds. + * It deletes registered eventfds and closes the open fds. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +void +rte_intr_efd_disable(struct rte_intr_handle *intr_handle); + +/** + * The fastpath interrupt is enabled or not. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +static inline int +rte_intr_dp_is_en(struct rte_intr_handle *intr_handle) +{ + return !(!intr_handle->nb_efd); +} + +/** + * The interrupt handle instance allows other cause or not. + * Other cause stands for none fastpath interrupt. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +static inline int +rte_intr_allow_others(struct rte_intr_handle *intr_handle) +{ + return !!(intr_handle->max_intr - intr_handle->nb_efd); +} + #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */ diff --git a/lib/librte_eal/linuxapp
[dpdk-dev] [PATCH v12 05/14] eal/linux: add interrupt vectors handling on VFIO
This patch does below: - Create VFIO eventfds for each interrupt vector (move to next) - Assign per interrupt vector's eventfd to VFIO by ioctl Signed-off-by: Danny Zhou Signed-off-by: Cunming Liang --- v8 changes - move eventfd creation out of the setup_interrupts to a standalone function v7 changes - cleanup unnecessary code change - split event and intr operation to other patches lib/librte_eal/linuxapp/eal/eal_interrupts.c | 50 1 file changed, 13 insertions(+), 37 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index fe1210b..d35c874 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -128,6 +128,9 @@ static pthread_t intr_thread; #ifdef VFIO_PRESENT #define IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int)) +/* irq set buffer length for queue interrupts and LSC interrupt */ +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \ + sizeof(int) * (RTE_MAX_RXTX_INTR_VEC_ID + 1)) /* enable legacy (INTx) interrupts */ static int @@ -245,23 +248,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) { intr_handle->fd); return -1; } - - /* manually trigger interrupt to enable it */ - memset(irq_set, 0, len); - len = sizeof(struct vfio_irq_set); - irq_set->argsz = len; - irq_set->count = 1; - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER; - irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; - irq_set->start = 0; - - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); - - if (ret) { - RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n", - intr_handle->fd); - return -1; - } return 0; } @@ -294,7 +280,7 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) { static int vfio_enable_msix(struct rte_intr_handle *intr_handle) { int len, ret; - char irq_set_buf[IRQ_SET_BUF_LEN]; + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; struct vfio_irq_set *irq_set; int *fd_ptr; @@ -302,12 +288,18 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) { irq_set = (struct vfio_irq_set *) irq_set_buf; irq_set->argsz = len; - irq_set->count = 1; + if (!intr_handle->max_intr) + intr_handle->max_intr = 1; + else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID) + intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1; + + irq_set->count = intr_handle->max_intr; irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER; irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; irq_set->start = 0; fd_ptr = (int *) &irq_set->data; - *fd_ptr = intr_handle->fd; + memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds)); + fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd; ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); @@ -317,22 +309,6 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) { return -1; } - /* manually trigger interrupt to enable it */ - memset(irq_set, 0, len); - len = sizeof(struct vfio_irq_set); - irq_set->argsz = len; - irq_set->count = 1; - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER; - irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; - irq_set->start = 0; - - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); - - if (ret) { - RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd %d\n", - intr_handle->fd); - return -1; - } return 0; } @@ -340,7 +316,7 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) { static int vfio_disable_msix(struct rte_intr_handle *intr_handle) { struct vfio_irq_set *irq_set; - char irq_set_buf[IRQ_SET_BUF_LEN]; + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; int len, ret; len = sizeof(struct vfio_irq_set); -- 1.8.1.4
[dpdk-dev] [PATCH v12 04/14] eal/linux: fix comments typo on vfio msi
Signed-off-by: Cunming Liang --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index cfe389c..fe1210b 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -219,7 +219,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) { return 0; } -/* enable MSI-X interrupts */ +/* enable MSI interrupts */ static int vfio_enable_msi(struct rte_intr_handle *intr_handle) { int len, ret; @@ -265,7 +265,7 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) { return 0; } -/* disable MSI-X interrupts */ +/* disable MSI interrupts */ static int vfio_disable_msi(struct rte_intr_handle *intr_handle) { struct vfio_irq_set *irq_set; -- 1.8.1.4
[dpdk-dev] [PATCH v12 03/14] eal/linux: add API to set rx interrupt event monitor
The patch adds 'rte_intr_rx_ctl' to add or delete interrupt vector events monitor on specified epoll instance. Signed-off-by: Cunming Liang --- v12 changes: - fix awkward line split in using RTE_LOG v10 changes: - add RTE_INTR_HANDLE_UIO_INTX for uio_pci_generic v8 changes - fix EWOULDBLOCK and EINTR processing - add event status check v7 changes - rename rte_intr_rx_set to rte_intr_rx_ctl. - rte_intr_rx_ctl uses rte_epoll_ctl to register epoll event instance. - the intr rx event instance includes a intr process callback. v6 changes - split rte_intr_wait_rx_pkt into two function, wait and set. - rewrite rte_intr_rx_wait/rte_intr_rx_set to remove queue visibility on eal. - rte_intr_rx_wait to support multiplexing. - allow epfd as input to support flexible event fd combination. lib/librte_eal/linuxapp/eal/eal_interrupts.c | 101 + .../linuxapp/eal/include/exec-env/rte_interrupts.h | 20 lib/librte_eal/linuxapp/eal/rte_eal_version.map| 1 + 3 files changed, 122 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index dc327a4..cfe389c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -897,6 +897,49 @@ rte_eal_intr_init(void) return -ret; } +static void +eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle) +{ + union rte_intr_read_buffer buf; + int bytes_read = 1; + + switch (intr_handle->type) { + case RTE_INTR_HANDLE_UIO: + case RTE_INTR_HANDLE_UIO_INTX: + bytes_read = sizeof(buf.uio_intr_count); + break; +#ifdef VFIO_PRESENT + case RTE_INTR_HANDLE_VFIO_MSIX: + case RTE_INTR_HANDLE_VFIO_MSI: + case RTE_INTR_HANDLE_VFIO_LEGACY: + bytes_read = sizeof(buf.vfio_intr_count); + break; +#endif + default: + bytes_read = 1; + RTE_LOG(INFO, EAL, "unexpected intr type\n"); + break; + } + + /** +* read out to clear the ready-to-be-read flag +* for epoll_wait. +*/ + do { + bytes_read = read(fd, &buf, bytes_read); + if (bytes_read < 0) { + if (errno == EINTR || errno == EWOULDBLOCK || + errno == EAGAIN) + continue; + RTE_LOG(ERR, EAL, + "Error reading from fd %d: %s\n", + fd, strerror(errno)); + } else if (bytes_read == 0) + RTE_LOG(ERR, EAL, "Read nothing from fd %d\n", fd); + return; + } while (1); +} + static int eal_epoll_process_event(struct epoll_event *evs, unsigned int n, struct rte_epoll_event *events) @@ -1033,3 +1076,61 @@ rte_epoll_ctl(int epfd, int op, int fd, return 0; } + +int +rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd, + int op, unsigned int vec, void *data) +{ + struct rte_epoll_event *rev; + struct rte_epoll_data *epdata; + int epfd_op; + int rc = 0; + + if (!intr_handle || intr_handle->nb_efd == 0 || + vec >= intr_handle->nb_efd) { + RTE_LOG(ERR, EAL, "Wrong intr vector number.\n"); + return -EPERM; + } + + switch (op) { + case RTE_INTR_EVENT_ADD: + epfd_op = EPOLL_CTL_ADD; + rev = &intr_handle->elist[vec]; + if (rev->status != RTE_EPOLL_INVALID) { + RTE_LOG(INFO, EAL, "Event already been added.\n"); + return -EEXIST; + } + + /* attach to intr vector fd */ + epdata = &rev->epdata; + epdata->event = EPOLLIN | EPOLLPRI | EPOLLET; + epdata->data = data; + epdata->cb_fun = (rte_intr_event_cb_t)eal_intr_proc_rxtx_intr; + epdata->cb_arg = (void *)intr_handle; + rc = rte_epoll_ctl(epfd, epfd_op, intr_handle->efds[vec], rev); + if (!rc) + RTE_LOG(DEBUG, EAL, + "efd %d associated with vec %d added on epfd %d" + "\n", rev->fd, vec, epfd); + else + rc = -EPERM; + break; + case RTE_INTR_EVENT_DEL: + epfd_op = EPOLL_CTL_DEL; + rev = &intr_handle->elist[vec]; + if (rev->status == RTE_EPOLL_INVALID) { + RTE_LOG(INFO, EAL, "Event does not exist.\n"); + return -EPERM; + } + + rc = rte_epoll_ctl(rev->epfd, epfd_op, rev->fd, rev); + if (rc) + rc = -EPERM; + break; + default: + RTE_LOG(ERR, EAL,
[dpdk-dev] [PATCH v12 02/14] eal/linux: add rte_epoll_wait/ctl support
The patch adds 'rte_epoll_wait' and 'rte_epoll_ctl' for async event wakeup. It defines 'struct rte_epoll_event' as the event param. The 'op' uses the same enum as epoll_wait/ctl does. The epoll event support to carry a raw user data and to register a callback which is executed during wakeup. Signed-off-by: Cunming Liang --- v11 changes - cleanup spelling error v9 changes - rework on coding style v8 changes - support delete event in safety during the wakeup execution - add EINTR process during epoll_wait v7 changes - split v6[4/8] into two patches, one for epoll event(this one) another for rx intr(next patch) - introduce rte_epoll_event definition - rte_epoll_wait/ctl for more generic RTE epoll API v6 changes - split rte_intr_wait_rx_pkt into two function, wait and set. - rewrite rte_intr_rx_wait/rte_intr_rx_set to remove queue visibility on eal. - rte_intr_rx_wait to support multiplexing. - allow epfd as input to support flexible event fd combination. lib/librte_eal/linuxapp/eal/eal_interrupts.c | 138 + .../linuxapp/eal/include/exec-env/rte_interrupts.h | 82 +++- lib/librte_eal/linuxapp/eal/rte_eal_version.map| 3 + 3 files changed, 220 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 3a84b3c..dc327a4 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -69,6 +69,8 @@ #define EAL_INTR_EPOLL_WAIT_FOREVER (-1) +static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */ + /** * union for pipe fds. */ @@ -895,3 +897,139 @@ rte_eal_intr_init(void) return -ret; } +static int +eal_epoll_process_event(struct epoll_event *evs, unsigned int n, + struct rte_epoll_event *events) +{ + unsigned int i, count = 0; + struct rte_epoll_event *rev; + + for (i = 0; i < n; i++) { + rev = evs[i].data.ptr; + if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID, +RTE_EPOLL_EXEC)) + continue; + + events[count].status= RTE_EPOLL_VALID; + events[count].fd= rev->fd; + events[count].epfd = rev->epfd; + events[count].epdata.event = rev->epdata.event; + events[count].epdata.data = rev->epdata.data; + if (rev->epdata.cb_fun) + rev->epdata.cb_fun(rev->fd, + rev->epdata.cb_arg); + + rte_compiler_barrier(); + rev->status = RTE_EPOLL_VALID; + count++; + } + return count; +} + +static inline int +eal_init_tls_epfd(void) +{ + int pfd = epoll_create(255); + + if (pfd < 0) { + RTE_LOG(ERR, EAL, + "Cannot create epoll instance\n"); + return -1; + } + return pfd; +} + +int +rte_intr_tls_epfd(void) +{ + if (RTE_PER_LCORE(_epfd) == -1) + RTE_PER_LCORE(_epfd) = eal_init_tls_epfd(); + + return RTE_PER_LCORE(_epfd); +} + +int +rte_epoll_wait(int epfd, struct rte_epoll_event *events, + int maxevents, int timeout) +{ + struct epoll_event evs[maxevents]; + int rc; + + if (!events) { + RTE_LOG(ERR, EAL, "rte_epoll_event can't be NULL\n"); + return -1; + } + + /* using per thread epoll fd */ + if (epfd == RTE_EPOLL_PER_THREAD) + epfd = rte_intr_tls_epfd(); + + while (1) { + rc = epoll_wait(epfd, evs, maxevents, timeout); + if (likely(rc > 0)) { + /* epoll_wait has at least one fd ready to read */ + rc = eal_epoll_process_event(evs, rc, events); + break; + } else if (rc < 0) { + if (errno == EINTR) + continue; + /* epoll_wait fail */ + RTE_LOG(ERR, EAL, "epoll_wait returns with fail %s\n", + strerror(errno)); + rc = -1; + break; + } + } + + return rc; +} + +static inline void +eal_epoll_data_safe_free(struct rte_epoll_event *ev) +{ + while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID, + RTE_EPOLL_INVALID)) + while (ev->status != RTE_EPOLL_VALID) + rte_pause(); + memset(&ev->epdata, 0, sizeof(ev->epdata)); + ev->fd = -1; + ev->epfd = -1; +} + +int +rte_epoll_ctl(int epfd, int op, int fd, + struct rte_epoll_event *event) +{ + struct epoll_event ev; + + if (!event) { + RTE_LOG(ERR, EAL, "rte_epoll_event can't be NU
[dpdk-dev] [PATCH v12 01/14] eal/linux: add interrupt vectors support in intr_handle
The patch adds interrupt vectors support in rte_intr_handle. 'vec_en' is set when interrupt vectors are detected and associated event fds are set. Those event fds are stored in efds[]. 'intr_vec' is reserved for device driver to initialize the vector mapping table. When the event fds add to a specified epoll instance, 'eptrs' will hold the rte_epoll_event object pointer. Signed-off-by: Danny Zhou Signed-off-by: Cunming Liang --- v7 changes: - add eptrs[], it's used to store the register rte_epoll_event instances. - add vec_en, to log the vector capability status. v6 changes: - add mapping table between irq vector number and queue id. v5 changes: - Create this new patch file for changed struct rte_intr_handle that other patches depend on, to avoid breaking git bisect. lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h index bdeb3fc..9c86a15 100644 --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h @@ -38,6 +38,8 @@ #ifndef _RTE_LINUXAPP_INTERRUPTS_H_ #define _RTE_LINUXAPP_INTERRUPTS_H_ +#define RTE_MAX_RXTX_INTR_VEC_ID 32 + enum rte_intr_handle_type { RTE_INTR_HANDLE_UNKNOWN = 0, RTE_INTR_HANDLE_UIO, /**< uio device handle */ @@ -49,6 +51,8 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_MAX }; +struct rte_epoll_event; + /** Handle for interrupts. */ struct rte_intr_handle { union { @@ -58,6 +62,12 @@ struct rte_intr_handle { }; int fd; /**< interrupt event file descriptor */ enum rte_intr_handle_type type; /**< handle type */ + uint32_t max_intr; /**< max interrupt requested */ + uint32_t nb_efd; /**< number of available efds */ + int efds[RTE_MAX_RXTX_INTR_VEC_ID]; /**< intr vectors/efds mapping */ + struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID]; +/**< intr vector epoll event ptr */ + int *intr_vec; /**< intr vector number array */ }; #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */ -- 1.8.1.4
[dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD
v12 changes - bsd cleanup for unused variable warning - fix awkward line split in debug message v11 changes - typo cleanup and check kernel style v10 changes - code rework to return actual error code - bug fix for lsc when using uio_pci_generic v9 changes - code rework to fix open comment - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix - new patch to turn off the feature by default so as to avoid v2.1 abi broken v8 changes - remove condition check for only vfio-msix - add multiplex intr support when only one intr vector allowed - lsc and rxq interrupt runtime enable decision - add safe event delete while the event wakeup execution happens v7 changes - decouple epoll event and intr operation - add condition check in the case intr vector is disabled - renaming some APIs v6 changes - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'. - rewrite rte_intr_rx_wait/rte_intr_rx_set. - using vector number instead of queue_id as interrupt API params. - patch reorder and split. v5 changes - Rebase the patchset onto the HEAD - Isolate ethdev from EAL for new-added wait-for-rx interrupt function - Export wait-for-rx interrupt function for shared libraries - Split-off a new patch file for changed struct rte_intr_handle that other patches depend on, to avoid breaking git bisect - Change sample applicaiton to accomodate EAL function spec change accordingly v4 changes - Export interrupt enable/disable functions for shared libraries - Adjust position of new-added structure fields and functions to avoid breaking ABI v3 changes - Add return value for interrupt enable/disable functions - Move spinlok from PMD to L3fwd-power - Remove unnecessary variables in e1000_mac_info - Fix miscelleous review comments v2 changes - Fix compilation issue in Makefile for missed header file. - Consolidate internal and community review comments of v1 patch set. The patch series introduce low-latency one-shot rx interrupt into DPDK with polling and interrupt mode switch control example. DPDK userspace interrupt notification and handling mechanism is based on UIO with below limitation: 1) It is designed to handle LSC interrupt only with inefficient suspended pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread which then wakes up DPDK polling thread). In this way, it introduces non-deterministic wakeup latency for DPDK polling thread as well as packet latency if it is used to handle Rx interrupt. 2) UIO only supports a single interrupt vector which has to been shared by LSC interrupt and interrupts assigned to dedicated rx queues. This patchset includes below features: 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only). 2) Build on top of the VFIO mechanism instead of UIO, so it could support up to 64 interrupt vectors for rx queue interrupts. 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in user space. 4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt switch algorithms in L3fwd-power example. Known limitations: 1) It does not work for UIO due to a single interrupt eventfd shared by LSC and rx queue interrupt handlers causes a mess. [FIXED] 2) LSC interrupt is not supported by VF driver, so it is by default disabled in L3fwd-power now. Feel free to turn in on if you want to support both LSC and rx queue interrupts on a PF. Cunming Liang (14): eal/linux: add interrupt vectors support in intr_handle eal/linux: add rte_epoll_wait/ctl support eal/linux: add API to set rx interrupt event monitor eal/linux: fix comments typo on vfio msi eal/linux: add interrupt vectors handling on VFIO eal/linux: standalone intr event fd create support eal/linux: fix lsc read error in uio_pci_generic eal/bsd: dummy for new intr definition eal/bsd: fix inappropriate linuxapp referred in bsd ethdev: add rx intr enable, disable and ctl functions ixgbe: enable rx queue interrupts for both PF and VF igb: enable rx queue interrupts for PF l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch abi: fix v2.1 abi broken issue drivers/net/e1000/igb_ethdev.c | 311 ++-- drivers/net/ixgbe/ixgbe_ethdev.c | 519 - drivers/net/ixgbe/ixgbe_ethdev.h | 4 + examples/l3fwd-power/main.c| 206 ++-- lib/librte_eal/bsdapp/eal/eal_interrupts.c | 30 ++ .../bsdapp/eal/include/exec-env/rte_interrupts.h | 91 +++- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 5 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 361 -- .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 + lib/librte_eal/linuxapp/eal/rte_eal_version.map| 8 + lib/librte_ether/rte_ethdev.c
[dpdk-dev] [PATCH 1/4] ixgbe: expose extended error statistics
> + stats->idrop = hw_stats->mngpdc + > + hw_stats->fcoerpdc + > + total_qbrc; Should use qprdc instead of total_qbrc
[dpdk-dev] [PATCH] eal:Fix log messages always being printed from rte_eal_cpu_init
On Sat, Jun 06, 2015 at 07:04:05PM -0500, Keith Wiles wrote: > The RTE_LOG(DEBUG, ...) messages in rte_eal_cpu_init() are printed > even when the log level on the command line was set to INFO or lower. > > The problem is the rte_eal_cpu_init() routine was called before > the command line args are scanned. Setting --log-level=7 now > correctly does not print the messages from the rte_eal_cpu_init() routine. > > Signed-off-by: Keith Wiles This seems a good idea - make it easy to reduce the verbosity on startup if so desired. Some comments below. > --- > lib/librte_eal/bsdapp/eal/eal.c | 43 > ++- > lib/librte_eal/linuxapp/eal/eal.c | 43 > ++- > 2 files changed, 76 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c > index 43e8a47..ca10f2c 100644 > --- a/lib/librte_eal/bsdapp/eal/eal.c > +++ b/lib/librte_eal/bsdapp/eal/eal.c > @@ -306,6 +306,38 @@ eal_get_hugepage_mem_size(void) > return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; > } > > +/* Parse the arguments for --log-level only */ > +static void > +eal_log_level_parse(int argc, char **argv) > +{ > + int opt; > + char **argvopt; > + int option_index; > + > + argvopt = argv; > + > + eal_reset_internal_config(&internal_config); > + > + while ((opt = getopt_long(argc, argvopt, eal_short_options, > + eal_long_options, &option_index)) != EOF) { > + > + int ret; > + > + /* getopt is not happy, stop right now */ > + if (opt == '?') > + break; > + > + ret = (opt == OPT_LOG_LEVEL_NUM)? > + eal_parse_common_option(opt, optarg, &internal_config) > : 0; > + > + /* common parser is not happy */ > + if (ret < 0) > + break; > + } > + > + optind = 0; /* reset getopt lib */ > +} > + > /* Parse the argument given in the command line of the application */ > static int > eal_parse_args(int argc, char **argv) > @@ -317,8 +349,6 @@ eal_parse_args(int argc, char **argv) > > argvopt = argv; > > - eal_reset_internal_config(&internal_config); > - > while ((opt = getopt_long(argc, argvopt, eal_short_options, > eal_long_options, &option_index)) != EOF) { > > @@ -447,6 +477,12 @@ rte_eal_init(int argc, char **argv) > if (rte_eal_log_early_init() < 0) > rte_panic("Cannot init early logs\n"); > > + eal_log_level_parse(argc, argv); > + > + /* set log level as early as possible */ > + rte_set_log_level(internal_config.log_level); > + > + RTE_LOG(INFO, EAL, "DPDK Version %s\n", rte_version()); There is already the -v option to the EAL to print the DPDK version. Just add that flag to any command, as it has no other effects. I don't think we need to increase the verbosity of startup by always printing it. > if (rte_eal_cpu_init() < 0) > rte_panic("Cannot detect lcores\n"); > > @@ -454,9 +490,6 @@ rte_eal_init(int argc, char **argv) > if (fctret < 0) > exit(1); > > - /* set log level as early as possible */ > - rte_set_log_level(internal_config.log_level); > - > if (internal_config.no_hugetlbfs == 0 && > internal_config.process_type != RTE_PROC_SECONDARY && > eal_hugepage_info_init() < 0) > diff --git a/lib/librte_eal/linuxapp/eal/eal.c > b/lib/librte_eal/linuxapp/eal/eal.c > index bd770cf..090ec99 100644 > --- a/lib/librte_eal/linuxapp/eal/eal.c > +++ b/lib/librte_eal/linuxapp/eal/eal.c > @@ -499,6 +499,38 @@ eal_get_hugepage_mem_size(void) > return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX; > } > > +/* Parse the arguments for --log-level only */ > +static void > +eal_log_level_parse(int argc, char **argv) > +{ > + int opt; > + char **argvopt; > + int option_index; > + > + argvopt = argv; > + > + eal_reset_internal_config(&internal_config); > + > + while ((opt = getopt_long(argc, argvopt, eal_short_options, > + eal_long_options, &option_index)) != EOF) { > + > + int ret; > + > + /* getopt is not happy, stop right now */ > + if (opt == '?') > + break; > + > + ret = (opt == OPT_LOG_LEVEL_NUM)? > + eal_parse_common_option(opt, optarg, &internal_config) > : 0; > + > + /* common parser is not happy */ > + if (ret < 0) > + break; > + } > + > + optind = 0; /* reset getopt lib */ > +} > + This function looks duplicated for linux and bsd. Can we move it to one of the common files instead? Regards, /Bruce
[dpdk-dev] [PATCH] examples/distributor: fix missing "; " in debug macro
On Fri, Jun 05, 2015 at 10:45:04PM +0200, Thomas Monjalon wrote: > 2015-06-05 17:01, Bruce Richardson: > > The macro to turn on additional debug output when the app was compiled > > with "-DDEBUG" was missing a ";". > > It shows that such dead code is almost never tested. > It would be saner if this command would return no result: > git grep 'ifdef.*DEBUG' examples > examples/distributor/main.c:#ifdef DEBUG > examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG > examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG > examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG > examples/l3fwd-acl/main.c:#ifdef L3FWDACL_DEBUG > examples/packet_ordering/main.c:#ifdef DEBUG > examples/vhost/main.c:#ifdef DEBUG > examples/vhost/main.h:#ifdef DEBUG > examples/vhost_xen/main.c:#ifdef DEBUG > examples/vhost_xen/main.h:#ifdef DEBUG > > There is no good reason to not use CONFIG_RTE_LOG_LEVEL to trigger debug > build. > I agree and disagree. I agree it would be good if we had a standard way of setting up a DEBUG build that would make it easier to test and pick up on this sort of things. I disagree that the compile time log level is the way to do this. The log level at compile time specifies the default log level only, the actual log level is controllable at runtime. Having the default log level also affect what kind of build is done, e.g. with -O0 rather than -O3, introduces an unnecessary dependency. Setting the default log level to 5 and changing it to 9 at runtime should be the same as setting the default to 9. /Bruce
[dpdk-dev] [PATCH] mk: remove "u" modifier from "ar" command
On Mon, Jun 08, 2015 at 10:13:30AM +0200, Olivier MATZ wrote: > Hi Bruce, > > On 06/05/2015 01:05 PM, Bruce Richardson wrote: > > On Fedora 22, the "ar" binary operates by default in deterministic mode, > > making the "u" parameter irrelevant, and leading to warning messages > > getting printed in the build output like below. > > > > INSTALL-LIB librte_kvargs.a > > ar: `u' modifier ignored since `D' is the default (see `U') > > > > There are two options to remove these warnings: > > * add in the "U" flag to make "ar" non-deterministic again > > * remove the "u" flag to have all objects always updated > > Indeed, I think that removing 'u' won't have any impact in this case, > as we always regenerate the full archive without updating it. > However, why not explicitly use 'D' to have the same behavior across > distributions? > > Regards, > Olivier > Good question. I didn't bother adding in the "D" flag as I didn't see the need. [Basically, I asked "why" instead of "why not" :-)] However, if folks think it's worthwhile doing, I don't think doing a V2 of this patch would tax me unduly :-) /Bruce > > > > > This patch takes the second approach. > > > > Signed-off-by: Bruce Richardson > > --- > > mk/rte.lib.mk | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk > > index 0d7482d..6bd67aa 100644 > > --- a/mk/rte.lib.mk > > +++ b/mk/rte.lib.mk > > @@ -70,7 +70,7 @@ else > > _CPU_LDFLAGS := $(CPU_LDFLAGS) > > endif > > > > -O_TO_A = $(AR) crus $(LIB) $(OBJS-y) > > +O_TO_A = $(AR) crs $(LIB) $(OBJS-y) > > O_TO_A_STR = $(subst ','\'',$(O_TO_A)) #'# fix syntax highlight > > O_TO_A_DISP = $(if $(V),"$(O_TO_A_STR)"," AR $(@)") > > O_TO_A_CMD = "cmd_$@ = $(O_TO_A_STR)" > >
[dpdk-dev] Running testpmd over KNI
On Fri, Jun 05, 2015 at 10:20:09AM -0700, Navneet Rao wrote: > Hi Bruce: > > Actually I want to use the TESTPMD app as a packet-generator/checker driving > the KNI-enabled NICs. > Is there an easy way to packet generate/check than testpmd? > > Please see attached. > > Thanks > -Navneet > What part of KNI are you looking to test, under what conditions. Do you just want to check the throughput of traffic going from userspace to kernel and back out again? /Bruce
[dpdk-dev] [PATCHv2 8/8] acl: add new test-cases into UT
Add several new test cases for ACL to cover different build configurations. Signed-off-by: Konstantin Ananyev --- app/test/test_acl.c | 431 +++- 1 file changed, 423 insertions(+), 8 deletions(-) diff --git a/app/test/test_acl.c b/app/test/test_acl.c index 6a032f9..b4a107d 100644 --- a/app/test/test_acl.c +++ b/app/test/test_acl.c @@ -47,6 +47,8 @@ #define LEN RTE_ACL_MAX_CATEGORIES +RTE_ACL_RULE_DEF(acl_ipv4vlan_rule, RTE_ACL_IPV4VLAN_NUM_FIELDS); + struct rte_acl_param acl_param = { .name = "acl_ctx", .socket_id = SOCKET_ID_ANY, @@ -62,6 +64,15 @@ struct rte_acl_ipv4vlan_rule acl_rule = { .dst_port_high = UINT16_MAX, }; +const uint32_t ipv4_7tuple_layout[RTE_ACL_IPV4VLAN_NUM] = { + offsetof(struct ipv4_7tuple, proto), + offsetof(struct ipv4_7tuple, vlan), + offsetof(struct ipv4_7tuple, ip_src), + offsetof(struct ipv4_7tuple, ip_dst), + offsetof(struct ipv4_7tuple, port_src), +}; + + /* byteswap to cpu or network order */ static void bswap_test_data(struct ipv4_7tuple *data, int len, int to_be) @@ -195,13 +206,6 @@ test_classify_buid(struct rte_acl_ctx *acx, const struct rte_acl_ipv4vlan_rule *rules, uint32_t num) { int ret; - const uint32_t layout[RTE_ACL_IPV4VLAN_NUM] = { - offsetof(struct ipv4_7tuple, proto), - offsetof(struct ipv4_7tuple, vlan), - offsetof(struct ipv4_7tuple, ip_src), - offsetof(struct ipv4_7tuple, ip_dst), - offsetof(struct ipv4_7tuple, port_src), - }; /* add rules to the context */ ret = rte_acl_ipv4vlan_add_rules(acx, rules, num); @@ -212,7 +216,8 @@ test_classify_buid(struct rte_acl_ctx *acx, } /* try building the context */ - ret = rte_acl_ipv4vlan_build(acx, layout, RTE_ACL_MAX_CATEGORIES); + ret = rte_acl_ipv4vlan_build(acx, ipv4_7tuple_layout, + RTE_ACL_MAX_CATEGORIES); if (ret != 0) { printf("Line %i: Building ACL context failed!\n", __LINE__); return ret; @@ -412,6 +417,414 @@ test_build_ports_range(void) return ret; } +static void +convert_rule(const struct rte_acl_ipv4vlan_rule *ri, + struct acl_ipv4vlan_rule *ro) +{ + ro->data = ri->data; + + ro->field[RTE_ACL_IPV4VLAN_PROTO_FIELD].value.u8 = ri->proto; + ro->field[RTE_ACL_IPV4VLAN_VLAN1_FIELD].value.u16 = ri->vlan; + ro->field[RTE_ACL_IPV4VLAN_VLAN2_FIELD].value.u16 = ri->domain; + ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].value.u32 = ri->src_addr; + ro->field[RTE_ACL_IPV4VLAN_DST_FIELD].value.u32 = ri->dst_addr; + ro->field[RTE_ACL_IPV4VLAN_SRCP_FIELD].value.u16 = ri->src_port_low; + ro->field[RTE_ACL_IPV4VLAN_DSTP_FIELD].value.u16 = ri->dst_port_low; + + ro->field[RTE_ACL_IPV4VLAN_PROTO_FIELD].mask_range.u8 = ri->proto_mask; + ro->field[RTE_ACL_IPV4VLAN_VLAN1_FIELD].mask_range.u16 = ri->vlan_mask; + ro->field[RTE_ACL_IPV4VLAN_VLAN2_FIELD].mask_range.u16 = + ri->domain_mask; + ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].mask_range.u32 = + ri->src_mask_len; + ro->field[RTE_ACL_IPV4VLAN_DST_FIELD].mask_range.u32 = ri->dst_mask_len; + ro->field[RTE_ACL_IPV4VLAN_SRCP_FIELD].mask_range.u16 = + ri->src_port_high; + ro->field[RTE_ACL_IPV4VLAN_DSTP_FIELD].mask_range.u16 = + ri->dst_port_high; +} + +/* + * Convert IPV4 source and destination from RTE_ACL_FIELD_TYPE_MASK to + * RTE_ACL_FIELD_TYPE_BITMASK. + */ +static void +convert_rule_1(const struct rte_acl_ipv4vlan_rule *ri, + struct acl_ipv4vlan_rule *ro) +{ + uint32_t v; + + convert_rule(ri, ro); + v = ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].mask_range.u32; + ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].mask_range.u32 = + RTE_ACL_MASKLEN_TO_BITMASK(v, sizeof(v)); + v = ro->field[RTE_ACL_IPV4VLAN_DST_FIELD].mask_range.u32; + ro->field[RTE_ACL_IPV4VLAN_DST_FIELD].mask_range.u32 = + RTE_ACL_MASKLEN_TO_BITMASK(v, sizeof(v)); +} + +/* + * Convert IPV4 source and destination from RTE_ACL_FIELD_TYPE_MASK to + * RTE_ACL_FIELD_TYPE_RANGE. + */ +static void +convert_rule_2(const struct rte_acl_ipv4vlan_rule *ri, + struct acl_ipv4vlan_rule *ro) +{ + uint32_t hi, lo, mask; + + convert_rule(ri, ro); + + mask = ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].mask_range.u32; + mask = RTE_ACL_MASKLEN_TO_BITMASK(mask, sizeof(mask)); + lo = ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].value.u32 & mask; + hi = lo + ~mask; + ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].value.u32 = lo; + ro->field[RTE_ACL_IPV4VLAN_SRC_FIELD].mask_range.u32 = hi; + + mask = ro->field[RTE_ACL_IPV4VLAN_DST_FIELD].mask_range.u32; + mask = RTE_ACL_MASKLEN_TO_BITMASK(mask, sizeof(mask)); + lo = ro->field[RT
[dpdk-dev] [PATCHv2 7/8] acl: fix ambiguity between ACL rules in UT.
Some test rules had equal priority for the same category. That can cause an ambiguity in build trie and test results. Specify different priority value for each rule from the same category. Signed-off-by: Konstantin Ananyev --- app/test/test_acl.h | 52 ++-- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/app/test/test_acl.h b/app/test/test_acl.h index 4af457d..4e8ff34 100644 --- a/app/test/test_acl.h +++ b/app/test/test_acl.h @@ -105,7 +105,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets traveling to 192.168.0.0/16 */ { .data = {.userdata = 1, .category_mask = ACL_ALLOW_MASK, - .priority = 2}, + .priority = 230}, .dst_addr = IPv4(192,168,0,0), .dst_mask_len = 16, .src_port_low = 0, @@ -116,7 +116,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets traveling to 192.168.1.0/24 */ { .data = {.userdata = 2, .category_mask = ACL_ALLOW_MASK, - .priority = 3}, + .priority = 330}, .dst_addr = IPv4(192,168,1,0), .dst_mask_len = 24, .src_port_low = 0, @@ -127,7 +127,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets traveling to 192.168.1.50 */ { .data = {.userdata = 3, .category_mask = ACL_DENY_MASK, - .priority = 2}, + .priority = 230}, .dst_addr = IPv4(192,168,1,50), .dst_mask_len = 32, .src_port_low = 0, @@ -140,7 +140,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets traveling from 10.0.0.0/8 */ { .data = {.userdata = 4, .category_mask = ACL_ALLOW_MASK, - .priority = 2}, + .priority = 240}, .src_addr = IPv4(10,0,0,0), .src_mask_len = 8, .src_port_low = 0, @@ -151,7 +151,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets traveling from 10.1.1.0/24 */ { .data = {.userdata = 5, .category_mask = ACL_ALLOW_MASK, - .priority = 3}, + .priority = 340}, .src_addr = IPv4(10,1,1,0), .src_mask_len = 24, .src_port_low = 0, @@ -162,7 +162,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets traveling from 10.1.1.1 */ { .data = {.userdata = 6, .category_mask = ACL_DENY_MASK, - .priority = 2}, + .priority = 240}, .src_addr = IPv4(10,1,1,1), .src_mask_len = 32, .src_port_low = 0, @@ -175,7 +175,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets with lower 7 bytes of VLAN tag equal to 0x64 */ { .data = {.userdata = 7, .category_mask = ACL_ALLOW_MASK, - .priority = 2}, + .priority = 260}, .vlan = 0x64, .vlan_mask = 0x7f, .src_port_low = 0, @@ -186,7 +186,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets with VLAN tags that have 0x5 in them */ { .data = {.userdata = 8, .category_mask = ACL_ALLOW_MASK, - .priority = 2}, + .priority = 260}, .vlan = 0x5, .vlan_mask = 0x5, .src_port_low = 0, @@ -197,7 +197,7 @@ struct rte_acl_ipv4vlan_rule acl_test_rules[] = { /* matches all packets with VLAN tag 5 */ { .d
[dpdk-dev] [PATCHv2 6/8] acl: cleanup remove unused code from acl_bld.c
Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl_bld.c | 310 --- 1 file changed, 310 deletions(-) diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index 4d8a62f..e6f4530 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -120,10 +120,6 @@ static int acl_merge_trie(struct acl_build_context *context, struct rte_acl_node *node_a, struct rte_acl_node *node_b, uint32_t level, struct rte_acl_node **node_c); -static int acl_merge(struct acl_build_context *context, - struct rte_acl_node *node_a, struct rte_acl_node *node_b, - int move, int a_subset, int level); - static void acl_deref_ptr(struct acl_build_context *context, struct rte_acl_node *node, int index); @@ -415,58 +411,6 @@ acl_intersect_type(const struct rte_acl_bitset *a_bits, } /* - * Check if all bits in the bitset are on - */ -static int -acl_full(struct rte_acl_node *node) -{ - uint32_t n; - bits_t all_bits = -1; - - for (n = 0; n < RTE_ACL_BIT_SET_SIZE; n++) - all_bits &= node->values.bits[n]; - return all_bits == -1; -} - -/* - * Check if all bits in the bitset are off - */ -static int -acl_empty(struct rte_acl_node *node) -{ - uint32_t n; - - if (node->ref_count == 0) { - for (n = 0; n < RTE_ACL_BIT_SET_SIZE; n++) { - if (0 != node->values.bits[n]) - return 0; - } - return 1; - } else { - return 0; - } -} - -/* - * Compute intersection of A and B - * return 1 if there is an intersection else 0. - */ -static int -acl_intersect(struct rte_acl_bitset *a_bits, - struct rte_acl_bitset *b_bits, - struct rte_acl_bitset *intersect) -{ - uint32_t n; - bits_t all_bits = 0; - - for (n = 0; n < RTE_ACL_BIT_SET_SIZE; n++) { - intersect->bits[n] = a_bits->bits[n] & b_bits->bits[n]; - all_bits |= intersect->bits[n]; - } - return all_bits != 0; -} - -/* * Duplicate a node */ static struct rte_acl_node * @@ -534,63 +478,6 @@ acl_deref_ptr(struct acl_build_context *context, } /* - * Exclude bitset from a node pointer - * returns 0 if poiter was deref'd - * 1 otherwise. - */ -static int -acl_exclude_ptr(struct acl_build_context *context, - struct rte_acl_node *node, - int index, - struct rte_acl_bitset *b_bits) -{ - int retval = 1; - - /* -* remove bitset from node pointer and deref -* if the bitset becomes empty. -*/ - if (!acl_exclude(&node->ptrs[index].values, - &node->ptrs[index].values, - b_bits)) { - acl_deref_ptr(context, node, index); - node->ptrs[index].ptr = NULL; - retval = 0; - } - - /* exclude bits from the composite bits for the node */ - acl_exclude(&node->values, &node->values, b_bits); - return retval; -} - -/* - * Remove a bitset from src ptr and move remaining ptr to dst - */ -static int -acl_move_ptr(struct acl_build_context *context, - struct rte_acl_node *dst, - struct rte_acl_node *src, - int index, - struct rte_acl_bitset *b_bits) -{ - int rc; - - if (b_bits != NULL) - if (!acl_exclude_ptr(context, src, index, b_bits)) - return 0; - - /* add src pointer to dst node */ - rc = acl_add_ptr(context, dst, src->ptrs[index].ptr, - &src->ptrs[index].values); - if (rc < 0) - return rc; - - /* remove ptr from src */ - acl_exclude_ptr(context, src, index, &src->ptrs[index].values); - return 1; -} - -/* * acl_exclude rte_acl_bitset from src and copy remaining pointer to dst */ static int @@ -650,203 +537,6 @@ acl_compact_node_ptrs(struct rte_acl_node *node_a) } } -/* - * acl_merge helper routine. - */ -static int -acl_merge_intersect(struct acl_build_context *context, - struct rte_acl_node *node_a, uint32_t idx_a, - struct rte_acl_node *node_b, uint32_t idx_b, - int next_move, int level, - struct rte_acl_bitset *intersect_ptr) -{ - struct rte_acl_node *node_c; - - /* Duplicate A for intersection */ - node_c = acl_dup_node(context, node_a->ptrs[idx_a].ptr); - - /* Remove intersection from A */ - acl_exclude_ptr(context, node_a, idx_a, intersect_ptr); - - /* -* Added link from A to C for all transitions -* in the intersection -*/ - if (acl_add_ptr(context, node_a, node_c, intersect_ptr) < 0) - return -1; - - /* merge B->node into C */ - return acl_merge(context, node_c, node_b->ptrs[idx_b].ptr, next_move, - 0, level + 1); -} - - -/* - * Merge the children of nodes A and B together. - * - * if match node - * For each
[dpdk-dev] [PATCHv2 5/8] acl: code dedup - introduce a new macro
Introduce new RTE_ACL_MASKLEN_TO_BITMASK macro, that will be used in several places inside librte_acl and it's UT. Simplify and cleanup build_trie() code a bit. Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl_bld.c | 16 +++- lib/librte_acl/rte_acl.h | 3 +++ 2 files changed, 6 insertions(+), 13 deletions(-) diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index d89c66a..4d8a62f 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -1262,19 +1262,9 @@ build_trie(struct acl_build_context *context, struct rte_acl_build_rule *head, * all higher bits. */ uint64_t mask; - - if (fld->mask_range.u32 == 0) { - mask = 0; - - /* -* arithmetic right shift for the length of -* the mask less the msb. -*/ - } else { - mask = -1 << - (rule->config->defs[n].size * - CHAR_BIT - fld->mask_range.u32); - } + mask = RTE_ACL_MASKLEN_TO_BITMASK( + fld->mask_range.u32, + rule->config->defs[n].size); /* gen a mini-trie for this field */ merge = acl_gen_mask_trie(context, diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h index 8d9bbe5..bd8f892 100644 --- a/lib/librte_acl/rte_acl.h +++ b/lib/librte_acl/rte_acl.h @@ -122,6 +122,9 @@ enum { #defineRTE_ACL_INVALID_USERDATA0 +#defineRTE_ACL_MASKLEN_TO_BITMASK(v, s)\ +((v) == 0 ? (v) : (typeof(v))((uint64_t)-1 << ((s) * CHAR_BIT - (v + /** * Miscellaneous data for ACL rule. */ -- 2.4.2
[dpdk-dev] [PATCHv2 4/8] acl: fix avoid unneeded trie splitting for subset of rules.
When rebuilding a trie for limited rule-set, don't try to split the rule-set even further. Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl_bld.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index 45ee065..d89c66a 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -97,6 +97,7 @@ struct acl_build_context { struct rte_acl_build_rule *build_rules; struct rte_acl_config cfg; int32_t node_max; + int32_t cur_node_max; uint32_t node; uint32_t num_nodes; uint32_t category_mask; @@ -1337,7 +1338,7 @@ build_trie(struct acl_build_context *context, struct rte_acl_build_rule *head, return NULL; node_count = context->num_nodes - node_count; - if (node_count > context->node_max) { + if (node_count > context->cur_node_max) { *last = prev; return trie; } @@ -1536,7 +1537,7 @@ acl_build_index(const struct rte_acl_config *config, uint32_t *data_index) static struct rte_acl_build_rule * build_one_trie(struct acl_build_context *context, struct rte_acl_build_rule *rule_sets[RTE_ACL_MAX_TRIES], - uint32_t n) + uint32_t n, int32_t node_max) { struct rte_acl_build_rule *last; struct rte_acl_config *config; @@ -1553,6 +1554,8 @@ build_one_trie(struct acl_build_context *context, context->data_indexes[n]); context->tries[n].data_index = context->data_indexes[n]; + context->cur_node_max = node_max; + context->bld_tries[n].trie = build_trie(context, rule_sets[n], &last, &context->tries[n].count); @@ -1587,7 +1590,7 @@ acl_build_tries(struct acl_build_context *context, num_tries = n + 1; - last = build_one_trie(context, rule_sets, n); + last = build_one_trie(context, rule_sets, n, context->node_max); if (context->bld_tries[n].trie == NULL) { RTE_LOG(ERR, ACL, "Build of %u-th trie failed\n", n); return -ENOMEM; @@ -1618,8 +1621,11 @@ acl_build_tries(struct acl_build_context *context, head = head->next) head->config = config; - /* Rebuild the trie for the reduced rule-set. */ - last = build_one_trie(context, rule_sets, n); + /* +* Rebuild the trie for the reduced rule-set. +* Don't try to split it any further. +*/ + last = build_one_trie(context, rule_sets, n, INT32_MAX); if (context->bld_tries[n].trie == NULL || last != NULL) { RTE_LOG(ERR, ACL, "Build of %u-th trie failed\n", n); return -ENOMEM; -- 2.4.2
[dpdk-dev] [PATCHv2 3/8] acl: add function to check build input parameters
Move check for build confg parameter into a separate function. Simplify acl_calc_wildness() function. Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl_bld.c | 107 --- 1 file changed, 54 insertions(+), 53 deletions(-) diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index ff3ba8b..45ee065 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -1350,7 +1350,7 @@ build_trie(struct acl_build_context *context, struct rte_acl_build_rule *head, return trie; } -static int +static void acl_calc_wildness(struct rte_acl_build_rule *head, const struct rte_acl_config *config) { @@ -1362,10 +1362,10 @@ acl_calc_wildness(struct rte_acl_build_rule *head, for (n = 0; n < config->num_fields; n++) { double wild = 0; - uint64_t msk_val = - RTE_LEN2MASK(CHAR_BIT * config->defs[n].size, + uint32_t bit_len = CHAR_BIT * config->defs[n].size; + uint64_t msk_val = RTE_LEN2MASK(bit_len, typeof(msk_val)); - double size = CHAR_BIT * config->defs[n].size; + double size = bit_len; int field_index = config->defs[n].field_index; const struct rte_acl_field *fld = rule->f->field + field_index; @@ -1382,54 +1382,15 @@ acl_calc_wildness(struct rte_acl_build_rule *head, break; case RTE_ACL_FIELD_TYPE_RANGE: - switch (rule->config->defs[n].size) { - case sizeof(uint8_t): - wild = ((double)fld->mask_range.u8 - - fld->value.u8) / UINT8_MAX; - break; - case sizeof(uint16_t): - wild = ((double)fld->mask_range.u16 - - fld->value.u16) / UINT16_MAX; - break; - case sizeof(uint32_t): - wild = ((double)fld->mask_range.u32 - - fld->value.u32) / UINT32_MAX; - break; - case sizeof(uint64_t): - wild = ((double)fld->mask_range.u64 - - fld->value.u64) / UINT64_MAX; - break; - default: - RTE_LOG(ERR, ACL, - "%s(rule: %u) invalid %u-th " - "field, type: %hhu, " - "unknown size: %hhu\n", - __func__, - rule->f->data.userdata, - n, - rule->config->defs[n].type, - rule->config->defs[n].size); - return -EINVAL; - } + wild = (fld->mask_range.u64 & msk_val) - + (fld->value.u64 & msk_val); + wild = wild / msk_val; break; - - default: - RTE_LOG(ERR, ACL, - "%s(rule: %u) invalid %u-th " - "field, unknown type: %hhu\n", - __func__, - rule->f->data.userdata, - n, - rule->config->defs[n].type); - return -EINVAL; - } rule->wildness[field_index] = (uint32_t)(wild * 100); } } - - return 0; } static void @@ -1602,7 +1563,6 @@ static int acl_build_tries(struct acl_build_context *context, struct rte_acl_build_rule *head) { - int32_t rc; uint32_t n, num_tries; struct rte_acl_config *config; struct rte_acl_build_rule *last; @@ -1621,9 +1581,7 @@ acl_build_tries(struct acl_build_context *context, context->tries[0].type = RTE_ACL_FULL_TRIE; /* calc wildness of each field of each rule */ - rc = acl_calc_wildness(head, config); - if (rc != 0) - return rc; + acl_calc_wildness(head, config); for (n =
[dpdk-dev] [PATCHv2 2/8] acl: code cleanup - use global EAL macro, instead of creating a local copy
use global RTE_LEN2MASK macro, instead of LEN2MASK. Signed-off-by: Konstantin Ananyev --- app/test-acl/main.c| 3 ++- lib/librte_acl/acl_bld.c | 3 ++- lib/librte_acl/rte_acl.c | 3 ++- lib/librte_acl/rte_acl.h | 2 +- lib/librte_acl/rte_acl_osdep.h | 2 -- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/app/test-acl/main.c b/app/test-acl/main.c index 524c43a..be3d773 100644 --- a/app/test-acl/main.c +++ b/app/test-acl/main.c @@ -739,7 +739,8 @@ add_cb_rules(FILE *f, struct rte_acl_ctx *ctx) return rc; } - v.data.category_mask = LEN2MASK(RTE_ACL_MAX_CATEGORIES); + v.data.category_mask = RTE_LEN2MASK(RTE_ACL_MAX_CATEGORIES, + typeof(v.data.category_mask)); v.data.priority = RTE_ACL_MAX_PRIORITY - n; v.data.userdata = n; diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index aee6ed5..ff3ba8b 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -1772,7 +1772,8 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx *ctx, bcx->pool.alignment = ACL_POOL_ALIGN; bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN; bcx->cfg = *cfg; - bcx->category_mask = LEN2MASK(bcx->cfg.num_categories); + bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories, + typeof(bcx->category_mask)); bcx->node_max = node_max; rc = sigsetjmp(bcx->pool.fail, 0); diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c index b6ddeeb..a54d531 100644 --- a/lib/librte_acl/rte_acl.c +++ b/lib/librte_acl/rte_acl.c @@ -271,7 +271,8 @@ acl_add_rules(struct rte_acl_ctx *ctx, const void *rules, uint32_t num) static int acl_check_rule(const struct rte_acl_rule_data *rd) { - if ((rd->category_mask & LEN2MASK(RTE_ACL_MAX_CATEGORIES)) == 0 || + if ((RTE_LEN2MASK(RTE_ACL_MAX_CATEGORIES, typeof(rd->category_mask)) & + rd->category_mask) == 0 || rd->priority > RTE_ACL_MAX_PRIORITY || rd->priority < RTE_ACL_MIN_PRIORITY || rd->userdata == RTE_ACL_INVALID_USERDATA) diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h index 3a93730..8d9bbe5 100644 --- a/lib/librte_acl/rte_acl.h +++ b/lib/librte_acl/rte_acl.h @@ -115,7 +115,7 @@ struct rte_acl_field { enum { RTE_ACL_TYPE_SHIFT = 29, - RTE_ACL_MAX_INDEX = LEN2MASK(RTE_ACL_TYPE_SHIFT), + RTE_ACL_MAX_INDEX = RTE_LEN2MASK(RTE_ACL_TYPE_SHIFT, uint32_t), RTE_ACL_MAX_PRIORITY = RTE_ACL_MAX_INDEX, RTE_ACL_MIN_PRIORITY = 0, }; diff --git a/lib/librte_acl/rte_acl_osdep.h b/lib/librte_acl/rte_acl_osdep.h index 81fdefb..41f7e3d 100644 --- a/lib/librte_acl/rte_acl_osdep.h +++ b/lib/librte_acl/rte_acl_osdep.h @@ -56,8 +56,6 @@ * Common defines. */ -#defineLEN2MASK(ln)((uint32_t)(((uint64_t)1 << (ln)) - 1)) - #define DIM(x) RTE_DIM(x) #include -- 2.4.2
[dpdk-dev] [PATCHv2 1/8] acl: fix invalid rule wildness calculation for bitmask field type
Signed-off-by: Konstantin Ananyev --- lib/librte_acl/acl_bld.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c index 3801843..aee6ed5 100644 --- a/lib/librte_acl/acl_bld.c +++ b/lib/librte_acl/acl_bld.c @@ -1362,6 +1362,9 @@ acl_calc_wildness(struct rte_acl_build_rule *head, for (n = 0; n < config->num_fields; n++) { double wild = 0; + uint64_t msk_val = + RTE_LEN2MASK(CHAR_BIT * config->defs[n].size, + typeof(msk_val)); double size = CHAR_BIT * config->defs[n].size; int field_index = config->defs[n].field_index; const struct rte_acl_field *fld = rule->f->field + @@ -1369,8 +1372,8 @@ acl_calc_wildness(struct rte_acl_build_rule *head, switch (rule->config->defs[n].type) { case RTE_ACL_FIELD_TYPE_BITMASK: - wild = (size - __builtin_popcount( - fld->mask_range.u8)) / + wild = (size - __builtin_popcountll( + fld->mask_range.u64 & msk_val)) / size; break; -- 2.4.2
[dpdk-dev] [PATCHv2 0/8] acl: various fixes and cleanups
Several fixes and code cleanups for the librte_acl. New test-cases for acl UT. Konstantin Ananyev (8): acl: fix invalid rule wildness calculation for bitmask field type acl: code cleanup - use global EAL macro, instead of creating a local copy acl: add function to check build input parameters acl: fix avoid unneeded trie splitting for subset of rules. acl: code dedup - introduce a new macro acl: cleanup remove unused code from acl_bld.c acl: fix ambiguity between ACL rules in UT. acl: add new test-cases into UT app/test-acl/main.c| 3 +- app/test/test_acl.c| 431 +- app/test/test_acl.h| 52 ++--- lib/librte_acl/acl_bld.c | 455 +++-- lib/librte_acl/rte_acl.c | 3 +- lib/librte_acl/rte_acl.h | 5 +- lib/librte_acl/rte_acl_osdep.h | 2 - 7 files changed, 530 insertions(+), 421 deletions(-) -- 2.4.2
[dpdk-dev] Intel X552/557 is not working.
Hi, I made (unofficial, quick) patch. The code is mostly pulled from FreeBSD. My Ubuntu 14.04 on X10SDV-TLN4F is works fine. http://www.e-neta.jp/~oki/dpdk-ixgbe.diff 2015-06-08 11:19 GMT+09:00 Masafumi OE : > Hi, > > I'm trying to use X552/X557-AT 10GBASE-T NIC on Xeon-D 1540. However it did > not work properly. > Binding X552/557to PMD for ixgbe is fine but testpmd is not working on > X552/557 because th_ixgbe_dev_init() return Hardware Initialization > Failure:-3. > > Do you have any idea? > > -- > Supermicro X10SDV-TLN4F > Running on CentOS 7.0: > DPDK is getting via git. > -- > $ lspci -nn | grep X55 > 03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection > X552/X557-AT 10GBASE-T [8086:15ad] > 03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection > X552/X557-AT 10GBASE-T [8086:15ad] > > $ ./dpdk_nic_bind.py --status > > Network devices using DPDK-compatible driver > > :03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T' > drv=uio_pci_generic unused=vfio-pci > :03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T' > drv=uio_pci_generic unused=vfio-pci > > Network devices using kernel driver > === > :05:00.0 'I350 Gigabit Network Connection' if=eno1 drv=igb > unused=vfio-pci,uio_pci_generic *Active* > :05:00.1 'I350 Gigabit Network Connection' if=eno2 drv=igb > unused=vfio-pci,uio_pci_generic > > Other network devices > = > > > -- > $ sudo -s app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 > --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 > --txqflags=0xf01 > EAL: Detected lcore 0 as core 0 on socket 0 > EAL: Detected lcore 1 as core 1 on socket 0 > EAL: Detected lcore 2 as core 2 on socket 0 > EAL: Detected lcore 3 as core 3 on socket 0 > EAL: Detected lcore 4 as core 4 on socket 0 > EAL: Detected lcore 5 as core 5 on socket 0 > EAL: Detected lcore 6 as core 6 on socket 0 > EAL: Detected lcore 7 as core 7 on socket 0 > EAL: Detected lcore 8 as core 0 on socket 0 > EAL: Detected lcore 9 as core 1 on socket 0 > EAL: Detected lcore 10 as core 2 on socket 0 > EAL: Detected lcore 11 as core 3 on socket 0 > EAL: Detected lcore 12 as core 4 on socket 0 > EAL: Detected lcore 13 as core 5 on socket 0 > EAL: Detected lcore 14 as core 6 on socket 0 > EAL: Detected lcore 15 as core 7 on socket 0 > EAL: Support maximum 128 logical core(s) by configuration. > EAL: Detected 16 lcore(s) > EAL: VFIO modules not all loaded, skip VFIO support... > EAL: Setting up memory... > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7ff89320 (size = 0x20) > EAL: Ask a virtual area of 0xc0 bytes > EAL: Virtual area found at 0x7ff89240 (size = 0xc0) > EAL: Ask a virtual area of 0x700 bytes > EAL: Virtual area found at 0x7ff88b20 (size = 0x700) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7ff88ae0 (size = 0x20) > EAL: Requesting 64 pages of size 2MB from socket 0 > ^NEAL: TSC frequency is ~200 KHz > EAL: Master lcore 8 is ready (tid=9472d900;cpuset=[8]) > EAL: lcore 9 is ready (tid=8a5fe700;cpuset=[9]) > EAL: PCI device :03:00.0 on NUMA socket 0 > EAL: probe driver: 8086:15ad rte_ixgbe_pmd > EAL: PCI memory mapped at 0x7ff89300 > EAL: PCI memory mapped at 0x7ff8946f3000 > PMD: eth_ixgbe_dev_init(): Hardware Initialization Failure: -3 > EAL: Error - exiting with code: 1 > Cause: Requested device :03:00.0 cannot be used > > -- > Masafumi OE, NAOJ > >
[dpdk-dev] Intel X552/557 is not working.
Hi, I'm trying to use X552/X557-AT 10GBASE-T NIC on Xeon-D 1540. However it did not work properly. Binding X552/557to PMD for ixgbe is fine but testpmd is not working on X552/557 because th_ixgbe_dev_init() return Hardware Initialization Failure:-3. Do you have any idea? -- Supermicro X10SDV-TLN4F Running on CentOS 7.0: DPDK is getting via git. -- $ lspci -nn | grep X55 03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T [8086:15ad] 03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T [8086:15ad] $ ./dpdk_nic_bind.py --status Network devices using DPDK-compatible driver :03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T' drv=uio_pci_generic unused=vfio-pci :03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T' drv=uio_pci_generic unused=vfio-pci Network devices using kernel driver === :05:00.0 'I350 Gigabit Network Connection' if=eno1 drv=igb unused=vfio-pci,uio_pci_generic *Active* :05:00.1 'I350 Gigabit Network Connection' if=eno2 drv=igb unused=vfio-pci,uio_pci_generic Other network devices = -- $ sudo -s app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 2 on socket 0 EAL: Detected lcore 3 as core 3 on socket 0 EAL: Detected lcore 4 as core 4 on socket 0 EAL: Detected lcore 5 as core 5 on socket 0 EAL: Detected lcore 6 as core 6 on socket 0 EAL: Detected lcore 7 as core 7 on socket 0 EAL: Detected lcore 8 as core 0 on socket 0 EAL: Detected lcore 9 as core 1 on socket 0 EAL: Detected lcore 10 as core 2 on socket 0 EAL: Detected lcore 11 as core 3 on socket 0 EAL: Detected lcore 12 as core 4 on socket 0 EAL: Detected lcore 13 as core 5 on socket 0 EAL: Detected lcore 14 as core 6 on socket 0 EAL: Detected lcore 15 as core 7 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 16 lcore(s) EAL: VFIO modules not all loaded, skip VFIO support... EAL: Setting up memory... EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7ff89320 (size = 0x20) EAL: Ask a virtual area of 0xc0 bytes EAL: Virtual area found at 0x7ff89240 (size = 0xc0) EAL: Ask a virtual area of 0x700 bytes EAL: Virtual area found at 0x7ff88b20 (size = 0x700) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7ff88ae0 (size = 0x20) EAL: Requesting 64 pages of size 2MB from socket 0 ^NEAL: TSC frequency is ~200 KHz EAL: Master lcore 8 is ready (tid=9472d900;cpuset=[8]) EAL: lcore 9 is ready (tid=8a5fe700;cpuset=[9]) EAL: PCI device :03:00.0 on NUMA socket 0 EAL: probe driver: 8086:15ad rte_ixgbe_pmd EAL: PCI memory mapped at 0x7ff89300 EAL: PCI memory mapped at 0x7ff8946f3000 PMD: eth_ixgbe_dev_init(): Hardware Initialization Failure: -3 EAL: Error - exiting with code: 1 Cause: Requested device :03:00.0 cannot be used -- Masafumi OE, NAOJ
[dpdk-dev] [PATCH v6 4/4] lib_vhost: Remove unnecessary vring descriptor length updating
Remove these unnecessary vring descriptor length updating, vhost should not change them. virtio in front end should assign value to desc.len for both rx and tx. Signed-off-by: root --- lib/librte_vhost/vhost_rxtx.c | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index aaf77ed..07bc16c 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -290,7 +290,6 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, if (vb_avail == 0) { uint32_t desc_idx = vq->buf_vec[vec_idx].desc_idx; - vq->desc[desc_idx].len = vq->vhost_hlen; if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) { @@ -374,7 +373,6 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, */ uint32_t desc_idx = vq->buf_vec[vec_idx].desc_idx; - vq->desc[desc_idx].len = vb_offset; if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) { @@ -409,26 +407,13 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, /* * This whole packet completes. */ - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; - vq->desc[desc_idx].len = vb_offset; - - while (vq->desc[desc_idx].flags & - VRING_DESC_F_NEXT) { - desc_idx = vq->desc[desc_idx].next; -vq->desc[desc_idx].len = 0; - } - /* Update used ring with desc information */ vq->used->ring[cur_idx & (vq->size - 1)].id = vq->buf_vec[vec_idx].desc_idx; vq->used->ring[cur_idx & (vq->size - 1)].len = entry_len; - entry_len = 0; - cur_idx++; entry_success++; - seg_avail = 0; - cpy_len = RTE_MIN(vb_avail, seg_avail); + break; } } } -- 1.8.4.2
[dpdk-dev] [PATCH v6 3/4] lib_vhost: Extract function
Extract codes into 2 common functions: update_secure_len which is used to accumulate the buffer len in the vring descriptors. and fill_buf_vec which is used to fill struct buf_vec. Changes in v5 - merge fill_buf_vec into update_secure_len - do both tasks in one-time loop Signed-off-by: root --- lib/librte_vhost/vhost_rxtx.c | 85 ++- 1 file changed, 36 insertions(+), 49 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 1f145bf..aaf77ed 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -436,6 +436,34 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, return entry_success; } +static inline void __attribute__((always_inline)) +update_secure_len(struct vhost_virtqueue *vq, uint32_t id, + uint32_t *secure_len, uint32_t *vec_idx) +{ + uint16_t wrapped_idx = id & (vq->size - 1); + uint32_t idx = vq->avail->ring[wrapped_idx]; + uint8_t next_desc; + uint32_t len = *secure_len; + uint32_t vec_id = *vec_idx; + + do { + next_desc = 0; + len += vq->desc[idx].len; + vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; + vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; + vq->buf_vec[vec_id].desc_idx = idx; + vec_id++; + + if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { + idx = vq->desc[idx].next; + next_desc = 1; + } + } while (next_desc); + + *secure_len = len; + *vec_idx = vec_id; +} + /* * This function works for mergeable RX. */ @@ -445,8 +473,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, { struct vhost_virtqueue *vq; uint32_t pkt_idx = 0, entry_success = 0; - uint16_t avail_idx, res_cur_idx; - uint16_t res_base_idx, res_end_idx; + uint16_t avail_idx; + uint16_t res_base_idx, res_cur_idx; uint8_t success = 0; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", @@ -462,17 +490,16 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, return 0; for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { - uint32_t secure_len = 0; - uint16_t need_cnt; - uint32_t vec_idx = 0; uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen; - uint16_t i, id; do { /* * As many data cores may want access to available * buffers, they need to be reserved. */ + uint32_t secure_len = 0; + uint32_t vec_idx = 0; + res_base_idx = vq->last_used_idx_res; res_cur_idx = res_base_idx; @@ -486,22 +513,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, dev->device_fh); return pkt_idx; } else { - uint16_t wrapped_idx = - (res_cur_idx) & (vq->size - 1); - uint32_t idx = - vq->avail->ring[wrapped_idx]; - uint8_t next_desc; - - do { - next_desc = 0; - secure_len += vq->desc[idx].len; - if (vq->desc[idx].flags & - VRING_DESC_F_NEXT) { - idx = vq->desc[idx].next; - next_desc = 1; - } - } while (next_desc); - + update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx); res_cur_idx++; } } while (pkt_len > secure_len); @@ -512,33 +524,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, res_cur_idx); } while (success == 0); - id = res_base_idx; - need_cnt = res_cur_idx - res_base_idx; - - for (i = 0; i < need_cnt; i++, id++) { - uint16_t wrapped_idx = id & (vq->size - 1); - uint32_t idx = vq->avail->ring[wrapped_idx]; - uint8_t next_desc; - do { - next_desc = 0; -
[dpdk-dev] [PATCH v6 2/4] lib_vhost: Refine code style
Remove unnecessary new line. Signed-off-by: root --- lib/librte_vhost/vhost_rxtx.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index b887e0b..1f145bf 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -265,8 +265,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, * (guest physical addr -> vhost virtual addr) */ vq = dev->virtqueue[VIRTIO_RXQ]; - vb_addr = - gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); vb_hdr_addr = vb_addr; /* Prefetch buffer address. */ @@ -284,8 +283,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, seg_avail = rte_pktmbuf_data_len(pkt); vb_offset = vq->vhost_hlen; - vb_avail = - vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + vb_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; entry_len = vq->vhost_hlen; @@ -308,8 +306,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, } vec_idx++; - vb_addr = - gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); /* Prefetch buffer address. */ rte_prefetch0((void *)(uintptr_t)vb_addr); -- 1.8.4.2
[dpdk-dev] [PATCH v6 1/4] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors
Vring enqueue need consider the 2 cases: 1. use separate descriptors to contain virtio header and actual data, e.g. the first descriptor is for virtio header, and then followed by descriptors for actual data. 2. virtio header and some data are put together in one descriptor, e.g. the first descriptor contain both virtio header and part of actual data, and then followed by more descriptors for rest of packet data, current DPDK based virtio-net pmd implementation is this case; So does vring dequeue, it should not assume vring descriptor is chained or not chained, it should use desc->flags to check whether it is chained or not. this patch also fixes TX corrupt issue when vhost co-work with virtio-net driver which uses one single vring descriptor(header and data are in one descriptor) for virtio tx process on default. Changes in v6 - move desc->len change to here to increase code readability Changes in v5 - support virtio header with partial data in first descriptor and then followed by descriptor for rest data Changes in v4 - remove unnecessary check for mbuf 'next' pointer - refine packet copying completeness check Changes in v3 - support scattered mbuf, check the mbuf has 'next' pointer or not and copy all segments to vring buffer. Changes in v2 - drop the uncompleted packet - refine code logic Signed-off-by: root --- lib/librte_vhost/vhost_rxtx.c | 90 ++- 1 file changed, 71 insertions(+), 19 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 4809d32..b887e0b 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -46,7 +46,8 @@ * This function adds buffers to the virtio devices RX virtqueue. Buffers can * be received from the physical port or from another virtio device. A packet * count is returned to indicate the number of packets that are succesfully - * added to the RX queue. This function works when mergeable is disabled. + * added to the RX queue. This function works when the mbuf is scattered, but + * it doesn't support the mergeable feature. */ static inline uint32_t __attribute__((always_inline)) virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, @@ -59,7 +60,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; uint64_t buff_addr = 0; uint64_t buff_hdr_addr = 0; - uint32_t head[MAX_PKT_BURST], packet_len = 0; + uint32_t head[MAX_PKT_BURST]; uint32_t head_idx, packet_success = 0; uint16_t avail_idx, res_cur_idx; uint16_t res_base_idx, res_end_idx; @@ -113,6 +114,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, rte_prefetch0(&vq->desc[head[packet_success]]); while (res_cur_idx != res_end_idx) { + uint32_t offset = 0, vb_offset = 0; + uint32_t pkt_len, len_to_cpy, data_len, total_copied = 0; + uint8_t hdr = 0, uncompleted_pkt = 0; + /* Get descriptor from available ring */ desc = &vq->desc[head[packet_success]]; @@ -125,39 +130,81 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, /* Copy virtio_hdr to packet and increment buffer address */ buff_hdr_addr = buff_addr; - packet_len = rte_pktmbuf_data_len(buff) + vq->vhost_hlen; /* * If the descriptors are chained the header and data are * placed in separate buffers. */ - if (desc->flags & VRING_DESC_F_NEXT) { - desc->len = vq->vhost_hlen; + if ((desc->flags & VRING_DESC_F_NEXT) && + (desc->len == vq->vhost_hlen)) { desc = &vq->desc[desc->next]; /* Buffer address translation. */ buff_addr = gpa_to_vva(dev, desc->addr); - desc->len = rte_pktmbuf_data_len(buff); } else { - buff_addr += vq->vhost_hlen; - desc->len = packet_len; + vb_offset += vq->vhost_hlen; + hdr = 1; } + pkt_len = rte_pktmbuf_pkt_len(buff); + data_len = rte_pktmbuf_data_len(buff); + len_to_cpy = RTE_MIN(data_len, + hdr ? desc->len - vq->vhost_hlen : desc->len); + while (total_copied < pkt_len) { + /* Copy mbuf data to buffer */ + rte_memcpy((void *)(uintptr_t)(buff_addr + vb_offset), + (const void *)(rte_pktmbuf_mtod(buff, const char *) + offset), + len_to_cpy); + PRINT_PACKET(dev, (uintptr_t)(buff_addr + vb_offset), + len_to_cpy, 0); + +
[dpdk-dev] [PATCH v6 0/4] Fix vhost enqueue/dequeue issue
Fix enqueue/dequeue can't handle chained vring descriptors; Remove unnecessary vring descriptor length updating; Add support copying scattered mbuf to vring; Changchun Ouyang (4): lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors lib_vhost: Refine code style lib_vhost: Extract function lib_vhost: Remove unnecessary vring descriptor length updating lib/librte_vhost/vhost_rxtx.c | 201 +++--- 1 file changed, 111 insertions(+), 90 deletions(-) -- 1.8.4.2
[dpdk-dev] [PATCH v3 10/10] examples/tep_termination:add the configuration for encapsulation and the decapsulation
The two flags are enabled by default. Sometimes we want to know the performance influence of the encapsulation and decapsulation operations, and I think we should add the two configuration options. Signed-off-by: Jijiang Liu --- examples/tep_termination/main.c| 36 examples/tep_termination/vxlan_setup.c | 13 +- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index b827174..70058e5 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -113,6 +113,8 @@ #define CMD_LINE_OPT_TX_CHECKSUM "tx-checksum" #define CMD_LINE_OPT_TSO_SEGSZ "tso-segsz" #define CMD_LINE_OPT_FILTER_TYPE "filter-type" +#define CMD_LINE_OPT_ENCAP "encap" +#define CMD_LINE_OPT_DECAP "decap" #define CMD_LINE_OPT_RX_RETRY "rx-retry" #define CMD_LINE_OPT_RX_RETRY_DELAY "rx-retry-delay" #define CMD_LINE_OPT_RX_RETRY_NUM "rx-retry-num" @@ -146,6 +148,12 @@ uint8_t tx_checksum; /* TCP segment size */ uint16_t tso_segsz = 0; +/* enable/disable decapsulation */ +uint8_t rx_decap = 1; + +/* enable/disable encapsulation */ +uint8_t tx_encap = 1; + /* RX filter type for tunneling packet */ uint8_t filter_idx = 1; @@ -274,6 +282,8 @@ tep_termination_usage(const char *prgname) " --nb-devices[1-64]: The number of virtIO device\n" " --tx-checksum [0|1]: inner Tx checksum offload\n" " --tso-segsz [0-N]: TCP segment size\n" + " --decap [0|1]: tunneling packet decapsulation\n" + " --encap [0|1]: tunneling packet encapsulation\n" " --filter-type[1-3]: filter type for tunneling packet\n" " 1: Inner MAC and tenent ID\n" " 2: Inner MAC and VLAN, and tenent ID\n" @@ -305,6 +315,8 @@ tep_termination_parse_args(int argc, char **argv) {CMD_LINE_OPT_UDP_PORT, required_argument, NULL, 0}, {CMD_LINE_OPT_TX_CHECKSUM, required_argument, NULL, 0}, {CMD_LINE_OPT_TSO_SEGSZ, required_argument, NULL, 0}, + {CMD_LINE_OPT_DECAP, required_argument, NULL, 0}, + {CMD_LINE_OPT_ENCAP, required_argument, NULL, 0}, {CMD_LINE_OPT_FILTER_TYPE, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_DELAY, required_argument, NULL, 0}, @@ -400,6 +412,30 @@ tep_termination_parse_args(int argc, char **argv) burst_rx_retry_num = ret; } + /* Enable/disable encapsulation on RX. */ + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_DECAP, + sizeof(CMD_LINE_OPT_DECAP))) { + ret = parse_num_opt(optarg, 1); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for decap [0|1]\n"); + tep_termination_usage(prgname); + return -1; + } else + rx_decap = ret; + } + + /* Enable/disable encapsulation on TX. */ + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_ENCAP, + sizeof(CMD_LINE_OPT_ENCAP))) { + ret = parse_num_opt(optarg, 1); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for encap [0|1]\n"); + tep_termination_usage(prgname); + return -1; + } else + tx_encap = ret; + } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_TX_CHECKSUM, sizeof(CMD_LINE_OPT_TX_CHECKSUM))) { ret = parse_num_opt(optarg, 1); diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c index 7b6182b..5ce5e79 100644 --- a/examples/tep_termination/vxlan_setup.c +++ b/examples/tep_termination/vxlan_setup.c @@ -78,6 +78,8 @@ extern uint16_t nb_devices; extern uint16_t udp_port; extern uint8_t ports[RTE_MAX_ETHPORTS]; extern uint8_t filter_idx; +extern uint8_t rx_decap; +extern uint8_t tx_encap; extern uint16_t tso_segsz; extern uint32_t enable_stats; extern struct device_statistics dev_statistics[MAX_DEVICES]; @@ -226,13 +228,20 @@ vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool) static int vxlan_rx_process(struct rte_mbuf *pkt) { - return dec
[dpdk-dev] [PATCH v3 09/10] examples/tep_termination:add bad Rx checksum statistics of inner IP and L4
The number of packets with bad RX IP and L4 checksum in inner header is recorded. Signed-off-by: Jijiang Liu --- examples/tep_termination/main.c| 10 +- examples/tep_termination/main.h|4 examples/tep_termination/vxlan_setup.c |8 3 files changed, 21 insertions(+), 1 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index e48f966..b827174 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -1002,7 +1002,7 @@ print_stats(void) { struct virtio_net_data_ll *dev_ll; uint64_t tx_dropped, rx_dropped; - uint64_t tx, tx_total, rx, rx_total; + uint64_t tx, tx_total, rx, rx_total, rx_ip_csum, rx_l4_csum; uint32_t device_fh; const char clr[] = { 27, '[', '2', 'J', '\0' }; const char top_left[] = { 27, '[', '1', ';', '1', 'H', '\0' }; @@ -1027,12 +1027,18 @@ print_stats(void) rx = rte_atomic64_read( &dev_statistics[device_fh].rx_atomic); rx_dropped = rx_total - rx; + rx_ip_csum = rte_atomic64_read( + &dev_statistics[device_fh].rx_bad_ip_csum); + rx_l4_csum = rte_atomic64_read( + &dev_statistics[device_fh].rx_bad_l4_csum); printf("\nStatistics for device %"PRIu32" --" "\nTX total:%"PRIu64"" "\nTX dropped: %"PRIu64"" "\nTX successful: %"PRIu64"" "\nRX total:%"PRIu64"" + "\nRX bad IP csum: %"PRIu64"" + "\nRX bad L4 csum: %"PRIu64"" "\nRX dropped: %"PRIu64"" "\nRX successful: %"PRIu64"", device_fh, @@ -1040,6 +1046,8 @@ print_stats(void) tx_dropped, tx, rx_total, + rx_ip_csum, + rx_l4_csum, rx_dropped, rx); diff --git a/examples/tep_termination/main.h b/examples/tep_termination/main.h index 74c3d98..5cf1157 100644 --- a/examples/tep_termination/main.h +++ b/examples/tep_termination/main.h @@ -69,6 +69,10 @@ struct device_statistics { uint64_t rx_total; uint64_t tx; rte_atomic64_t rx_atomic; + /**< Bad inner IP csum for tunneling pkt */ + rte_atomic64_t rx_bad_ip_csum; + /**< Bad inner L4 csum for tunneling pkt */ + rte_atomic64_t rx_bad_l4_csum; } __rte_cache_aligned; /** diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c index d987e2e..7b6182b 100644 --- a/examples/tep_termination/vxlan_setup.c +++ b/examples/tep_termination/vxlan_setup.c @@ -79,6 +79,8 @@ extern uint16_t udp_port; extern uint8_t ports[RTE_MAX_ETHPORTS]; extern uint8_t filter_idx; extern uint16_t tso_segsz; +extern uint32_t enable_stats; +extern struct device_statistics dev_statistics[MAX_DEVICES]; /* ethernet addresses of ports */ extern struct ether_addr ports_eth_addr[RTE_MAX_ETHPORTS]; @@ -414,6 +416,12 @@ vxlan_rx_pkts (struct virtio_net *dev, struct rte_mbuf **pkts_burst, uint32_t rx struct rte_mbuf *pkts_valid[rx_count]; for (i = 0; i < rx_count; i++) { + if (enable_stats) { + rte_atomic64_add(&dev_statistics[dev->device_fh].rx_bad_ip_csum, + (pkts_burst[i]->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0); + rte_atomic64_add(&dev_statistics[dev->device_fh].rx_bad_ip_csum, + (pkts_burst[i]->ol_flags & PKT_RX_L4_CKSUM_BAD) != 0); + } ret = vxlan_rx_process(pkts_burst[i]); if (unlikely(ret < 0)) continue; -- 1.7.7.6
[dpdk-dev] [PATCH v3 08/10] examples/tep_termination:add TSO offload configuration
if the 'tso-segsz' is not 0, it means TSO offload is enabled. Signed-off-by: Jijiang Liu --- examples/tep_termination/main.c| 17 + examples/tep_termination/vxlan.c |8 examples/tep_termination/vxlan.h |1 + examples/tep_termination/vxlan_setup.c |8 4 files changed, 34 insertions(+), 0 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index 80f20b0..e48f966 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -111,6 +111,7 @@ #define CMD_LINE_OPT_NB_DEVICES "nb-devices" #define CMD_LINE_OPT_UDP_PORT "udp-port" #define CMD_LINE_OPT_TX_CHECKSUM "tx-checksum" +#define CMD_LINE_OPT_TSO_SEGSZ "tso-segsz" #define CMD_LINE_OPT_FILTER_TYPE "filter-type" #define CMD_LINE_OPT_RX_RETRY "rx-retry" #define CMD_LINE_OPT_RX_RETRY_DELAY "rx-retry-delay" @@ -142,6 +143,9 @@ uint16_t udp_port; /* enable/disable inner TX checksum */ uint8_t tx_checksum; +/* TCP segment size */ +uint16_t tso_segsz = 0; + /* RX filter type for tunneling packet */ uint8_t filter_idx = 1; @@ -269,6 +273,7 @@ tep_termination_usage(const char *prgname) " --udp-port: UDP destination port for VXLAN packet\n" " --nb-devices[1-64]: The number of virtIO device\n" " --tx-checksum [0|1]: inner Tx checksum offload\n" + " --tso-segsz [0-N]: TCP segment size\n" " --filter-type[1-3]: filter type for tunneling packet\n" " 1: Inner MAC and tenent ID\n" " 2: Inner MAC and VLAN, and tenent ID\n" @@ -299,6 +304,7 @@ tep_termination_parse_args(int argc, char **argv) {CMD_LINE_OPT_NB_DEVICES, required_argument, NULL, 0}, {CMD_LINE_OPT_UDP_PORT, required_argument, NULL, 0}, {CMD_LINE_OPT_TX_CHECKSUM, required_argument, NULL, 0}, + {CMD_LINE_OPT_TSO_SEGSZ, required_argument, NULL, 0}, {CMD_LINE_OPT_FILTER_TYPE, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_DELAY, required_argument, NULL, 0}, @@ -346,6 +352,17 @@ tep_termination_parse_args(int argc, char **argv) enable_retry = ret; } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_TSO_SEGSZ, + sizeof(CMD_LINE_OPT_TSO_SEGSZ))) { + ret = parse_num_opt(optarg, INT16_MAX); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for TCP segment size [0-N]\n"); + tep_termination_usage(prgname); + return -1; + } else + tso_segsz = ret; + } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_UDP_PORT, sizeof(CMD_LINE_OPT_UDP_PORT))) { ret = parse_num_opt(optarg, INT16_MAX); diff --git a/examples/tep_termination/vxlan.c b/examples/tep_termination/vxlan.c index c263999..bc000b8 100644 --- a/examples/tep_termination/vxlan.c +++ b/examples/tep_termination/vxlan.c @@ -47,6 +47,7 @@ extern uint8_t tx_checksum; extern struct vxlan_conf vxdev; extern struct ipv4_hdr app_ip_hdr[VXLAN_N_PORTS]; extern struct ether_hdr app_l2_hdr[VXLAN_N_PORTS]; +extern uint16_t tso_segsz; static uint16_t get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags) @@ -149,6 +150,11 @@ process_inner_cksums(struct ether_hdr *eth_hdr, union tunnel_offload_info *info) ol_flags |= PKT_TX_TCP_CKSUM; tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype, ol_flags); + if (tso_segsz != 0) { + ol_flags |= PKT_TX_TCP_SEG; + info->tso_segsz = tso_segsz; + info->l4_len = sizeof(struct tcp_hdr); + } } else if (l4_proto == IPPROTO_SCTP) { sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len); @@ -228,6 +234,7 @@ encapsulation(struct rte_mbuf *m, uint8_t queue_id) ol_flags |= process_inner_cksums(phdr, &tx_offload); m->l2_len = tx_offload.l2_len; m->l3_len = tx_offload.l3_len; + m->l4_len = tx_offload.l4_len; m->l2_len += ETHER_VXLAN_HLEN; } @@ -235,6 +242,7 @@ encapsulation(struct rte_mbuf *m, uint8_t queue_id) m->outer_l3_len = sizeof(struct ipv4_hdr); m->ol_flags |= ol_flags; + m->tso_segsz = tx_offload.tso_segsz; /*VXLAN HEADER*/ vxlan->vx_flags = rte_
[dpdk-dev] [PATCH v3 07/10] examples/tep_termination:add Tx checksum offload configuration for inner header
For UDP tunneling packet, the inner Tx checksum offload means inner IPv4 and inner L4(TCP/UDP/SCTP). Signed-off-by: Jijiang Liu --- examples/tep_termination/main.c | 17 examples/tep_termination/vxlan.c | 82 ++ 2 files changed, 99 insertions(+), 0 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index f92a677..80f20b0 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -110,6 +110,7 @@ #define CMD_LINE_OPT_NB_DEVICES "nb-devices" #define CMD_LINE_OPT_UDP_PORT "udp-port" +#define CMD_LINE_OPT_TX_CHECKSUM "tx-checksum" #define CMD_LINE_OPT_FILTER_TYPE "filter-type" #define CMD_LINE_OPT_RX_RETRY "rx-retry" #define CMD_LINE_OPT_RX_RETRY_DELAY "rx-retry-delay" @@ -138,6 +139,9 @@ struct vpool { /* UDP tunneling port */ uint16_t udp_port; +/* enable/disable inner TX checksum */ +uint8_t tx_checksum; + /* RX filter type for tunneling packet */ uint8_t filter_idx = 1; @@ -264,6 +268,7 @@ tep_termination_usage(const char *prgname) RTE_LOG(INFO, VHOST_CONFIG, "%s [EAL options] -- -p PORTMASK\n" " --udp-port: UDP destination port for VXLAN packet\n" " --nb-devices[1-64]: The number of virtIO device\n" + " --tx-checksum [0|1]: inner Tx checksum offload\n" " --filter-type[1-3]: filter type for tunneling packet\n" " 1: Inner MAC and tenent ID\n" " 2: Inner MAC and VLAN, and tenent ID\n" @@ -293,6 +298,7 @@ tep_termination_parse_args(int argc, char **argv) static struct option long_option[] = { {CMD_LINE_OPT_NB_DEVICES, required_argument, NULL, 0}, {CMD_LINE_OPT_UDP_PORT, required_argument, NULL, 0}, + {CMD_LINE_OPT_TX_CHECKSUM, required_argument, NULL, 0}, {CMD_LINE_OPT_FILTER_TYPE, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_DELAY, required_argument, NULL, 0}, @@ -377,6 +383,17 @@ tep_termination_parse_args(int argc, char **argv) burst_rx_retry_num = ret; } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_TX_CHECKSUM, + sizeof(CMD_LINE_OPT_TX_CHECKSUM))) { + ret = parse_num_opt(optarg, 1); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for tx-checksum [0|1]\n"); + tep_termination_usage(prgname); + return -1; + } else + tx_checksum = ret; + } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_FILTER_TYPE, sizeof(CMD_LINE_OPT_FILTER_TYPE))) { ret = parse_num_opt(optarg, 3); diff --git a/examples/tep_termination/vxlan.c b/examples/tep_termination/vxlan.c index cc4d508..c263999 100644 --- a/examples/tep_termination/vxlan.c +++ b/examples/tep_termination/vxlan.c @@ -43,10 +43,20 @@ #include "main.h" #include "vxlan.h" +extern uint8_t tx_checksum; extern struct vxlan_conf vxdev; extern struct ipv4_hdr app_ip_hdr[VXLAN_N_PORTS]; extern struct ether_hdr app_l2_hdr[VXLAN_N_PORTS]; +static uint16_t +get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags) +{ + if (ethertype == ETHER_TYPE_IPv4) + return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); + else /* assume ethertype == ETHER_TYPE_IPv6 */ + return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); +} + /** * Parse an ethernet header to fill the ethertype, outer_l2_len, outer_l3_len and * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan @@ -87,6 +97,68 @@ parse_ethernet(struct ether_hdr *eth_hdr, union tunnel_offload_info *info, } } +/** + * Calculate the checksum of a packet in hardware + */ +static uint64_t +process_inner_cksums(struct ether_hdr *eth_hdr, union tunnel_offload_info *info) +{ + void *l3_hdr = NULL; + uint8_t l4_proto; + uint16_t ethertype; + struct ipv4_hdr *ipv4_hdr; + struct ipv6_hdr *ipv6_hdr; + struct udp_hdr *udp_hdr; + struct tcp_hdr *tcp_hdr; + struct sctp_hdr *sctp_hdr; + uint64_t ol_flags = 0; + + info->l2_len = sizeof(struct ether_hdr); + ethertype = rte_be_to_cpu_16(eth_hdr->ether_type); + + if (ethertype == ETHER_TYPE_VLAN) { + struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1); + info->l2_len += sizeof(struct vlan_hdr); + ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto); +
[dpdk-dev] [PATCH v3 06/10] examples/tep_termination:add tunnel filter type configuration
The following filter types are supported for VXLAN: 1> Inner MAC&VLAN and tenent ID 2> Inner MAC and tenent ID, and Outer MAC 3> Inner MAC and tenent ID Signed-off-by: Jijiang Liu --- examples/tep_termination/main.c| 20 ++ examples/tep_termination/vxlan_setup.c | 63 +++- 2 files changed, 82 insertions(+), 1 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index ecea9e8..f92a677 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -110,6 +110,7 @@ #define CMD_LINE_OPT_NB_DEVICES "nb-devices" #define CMD_LINE_OPT_UDP_PORT "udp-port" +#define CMD_LINE_OPT_FILTER_TYPE "filter-type" #define CMD_LINE_OPT_RX_RETRY "rx-retry" #define CMD_LINE_OPT_RX_RETRY_DELAY "rx-retry-delay" #define CMD_LINE_OPT_RX_RETRY_NUM "rx-retry-num" @@ -137,6 +138,9 @@ struct vpool { /* UDP tunneling port */ uint16_t udp_port; +/* RX filter type for tunneling packet */ +uint8_t filter_idx = 1; + /* overlay packet operation */ struct ol_switch_ops overlay_options = { .port_configure = vxlan_port_init, @@ -260,6 +264,10 @@ tep_termination_usage(const char *prgname) RTE_LOG(INFO, VHOST_CONFIG, "%s [EAL options] -- -p PORTMASK\n" " --udp-port: UDP destination port for VXLAN packet\n" " --nb-devices[1-64]: The number of virtIO device\n" + " --filter-type[1-3]: filter type for tunneling packet\n" + " 1: Inner MAC and tenent ID\n" + " 2: Inner MAC and VLAN, and tenent ID\n" + " 3: Outer MAC, Inner MAC and tenent ID\n" " -p PORTMASK: Set mask for ports to be used by application\n" " --rx-retry [0|1]: disable/enable(default) retries on rx." "Enable retry if destintation queue is full\n" @@ -285,6 +293,7 @@ tep_termination_parse_args(int argc, char **argv) static struct option long_option[] = { {CMD_LINE_OPT_NB_DEVICES, required_argument, NULL, 0}, {CMD_LINE_OPT_UDP_PORT, required_argument, NULL, 0}, + {CMD_LINE_OPT_FILTER_TYPE, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_DELAY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_NUM, required_argument, NULL, 0}, @@ -368,6 +377,17 @@ tep_termination_parse_args(int argc, char **argv) burst_rx_retry_num = ret; } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_FILTER_TYPE, + sizeof(CMD_LINE_OPT_FILTER_TYPE))) { + ret = parse_num_opt(optarg, 3); + if ((ret == -1) || (ret == 0)) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for filter type [1-3]\n"); + tep_termination_usage(prgname); + return -1; + } else + filter_idx = ret - 1; + } + /* Enable/disable stats. */ if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_STATS, sizeof(CMD_LINE_OPT_STATS))) { diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c index fc4f9de..f24effc 100644 --- a/examples/tep_termination/vxlan_setup.c +++ b/examples/tep_termination/vxlan_setup.c @@ -71,9 +71,13 @@ #define RTE_TEST_RX_DESC_DEFAULT 1024 #define RTE_TEST_TX_DESC_DEFAULT 512 +/* Default inner VLAN ID */ +#define INNER_VLAN_ID 100 + extern uint16_t nb_devices; extern uint16_t udp_port; extern uint8_t ports[RTE_MAX_ETHPORTS]; +extern uint8_t filter_idx; /* ethernet addresses of ports */ extern struct ether_addr ports_eth_addr[RTE_MAX_ETHPORTS]; @@ -93,6 +97,11 @@ uint8_t vxlan_overlay_ips[2][4] = { {192, 168, 10, 1}, {192, 168, 30, 1} }; /* Remote VTEP MAC address */ uint8_t peer_mac[6] = {0x00, 0x11, 0x01, 0x00, 0x00, 0x01}; +/* VXLAN RX filter type */ +uint8_t tep_filter_type[] = {RTE_TUNNEL_FILTER_IMAC_TENID, + RTE_TUNNEL_FILTER_IMAC_IVLAN_TENID, + RTE_TUNNEL_FILTER_OMAC_TENID_IMAC,}; + /* Options for configuring ethernet port */ static const struct rte_eth_conf port_conf = { .rxmode = { @@ -224,12 +233,14 @@ vxlan_tx_process(uint8_t queue_id, struct rte_mbuf *pkt) int vxlan_link(struct vhost_dev *vdev, struct rte_mbuf *m) { - int i; + int i, ret; struct ether_hdr *pkt_hdr; struct virtio_net *dev = vdev->dev; uint64_t portid = dev->device_fh; struct ipv4_hdr *ip; + struct rte_eth_tunnel_filter_
[dpdk-dev] [PATCH v3 05/10] examples/tep_termination:add UDP port configuration for UDP tunneling packet
The port number of UDP tunneling packet is configurable, which has 16 entries in total for i40e. Signed-off-by: Jijiang Liu --- examples/tep_termination/main.c| 18 +- examples/tep_termination/vxlan_setup.c | 13 - 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index bb0f345..ecea9e8 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -109,6 +109,7 @@ #define MAC_ADDR_CMP 0xULL #define CMD_LINE_OPT_NB_DEVICES "nb-devices" +#define CMD_LINE_OPT_UDP_PORT "udp-port" #define CMD_LINE_OPT_RX_RETRY "rx-retry" #define CMD_LINE_OPT_RX_RETRY_DELAY "rx-retry-delay" #define CMD_LINE_OPT_RX_RETRY_NUM "rx-retry-num" @@ -133,6 +134,9 @@ struct vpool { uint32_t buf_size; } vpool_array[MAX_QUEUES+MAX_QUEUES]; +/* UDP tunneling port */ +uint16_t udp_port; + /* overlay packet operation */ struct ol_switch_ops overlay_options = { .port_configure = vxlan_port_init, @@ -254,6 +258,7 @@ static void tep_termination_usage(const char *prgname) { RTE_LOG(INFO, VHOST_CONFIG, "%s [EAL options] -- -p PORTMASK\n" + " --udp-port: UDP destination port for VXLAN packet\n" " --nb-devices[1-64]: The number of virtIO device\n" " -p PORTMASK: Set mask for ports to be used by application\n" " --rx-retry [0|1]: disable/enable(default) retries on rx." @@ -279,6 +284,7 @@ tep_termination_parse_args(int argc, char **argv) const char *prgname = argv[0]; static struct option long_option[] = { {CMD_LINE_OPT_NB_DEVICES, required_argument, NULL, 0}, + {CMD_LINE_OPT_UDP_PORT, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_DELAY, required_argument, NULL, 0}, {CMD_LINE_OPT_RX_RETRY_NUM, required_argument, NULL, 0}, @@ -325,6 +331,17 @@ tep_termination_parse_args(int argc, char **argv) enable_retry = ret; } + if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_UDP_PORT, + sizeof(CMD_LINE_OPT_UDP_PORT))) { + ret = parse_num_opt(optarg, INT16_MAX); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for UDP port [0-N]\n"); + tep_termination_usage(prgname); + return -1; + } else + udp_port = ret; + } + /* Specify the retries delay time (in useconds) on RX. */ if (!strncmp(long_option[option_index].name, CMD_LINE_OPT_RX_RETRY_DELAY, sizeof(CMD_LINE_OPT_RX_RETRY_DELAY))) { @@ -1074,7 +1091,6 @@ main(int argc, char *argv[]) rte_eal_remote_launch(switch_worker, mbuf_pool, lcore_id); } - rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_MRG_RXBUF); /* Register CUSE device to handle IOCTLs. */ diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c index e604b14..fc4f9de 100644 --- a/examples/tep_termination/vxlan_setup.c +++ b/examples/tep_termination/vxlan_setup.c @@ -72,6 +72,7 @@ #define RTE_TEST_TX_DESC_DEFAULT 512 extern uint16_t nb_devices; +extern uint16_t udp_port; extern uint8_t ports[RTE_MAX_ETHPORTS]; /* ethernet addresses of ports */ @@ -135,9 +136,12 @@ vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool) uint16_t rx_rings, tx_rings = (uint16_t)rte_lcore_count(); const uint16_t rx_ring_size = RTE_TEST_RX_DESC_DEFAULT; const uint16_t tx_ring_size = RTE_TEST_TX_DESC_DEFAULT; + struct rte_eth_udp_tunnel tunnel_udp; struct rte_eth_rxconf *rxconf; struct rte_eth_txconf *txconf; + struct vxlan_conf *pconf = &vxdev; + pconf->dst_port = udp_port; rte_eth_dev_info_get (port, &dev_info); if (dev_info.max_rx_queues > MAX_QUEUES) { @@ -180,6 +184,12 @@ vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool) if (retval < 0) return retval; + /* Configure UDP port for UDP tunneling */ + tunnel_udp.udp_port = udp_port; + tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN; + retval = rte_eth_dev_udp_tunnel_add(port, &tunnel_udp); + if (retval < 0) + return retval; rte_eth_macaddr_get(port, &ports_eth_addr[port]); RTE_LOG(INFO, PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8 " %02"PRIx8" %02"PRIx8" %02"PRIx8"\n", @@ -190,13 +200,1
[dpdk-dev] [PATCH v3 04/10] examples/tep_termination:implement VXLAN packet processing
To implement the following functions: 1> VXLAN port configuration 2> VXLAN tunnel setup 3> VXLAN tunnel destroying 4> VXLAN packet processing for Rx side 5> VXLAN packet processing for Tx side Signed-off-by: Jijiang Liu Signed-off-by: Thomas Long --- examples/tep_termination/Makefile |2 +- examples/tep_termination/main.c| 35 +++- examples/tep_termination/vxlan.c | 172 examples/tep_termination/vxlan.h | 15 ++ examples/tep_termination/vxlan_setup.c | 347 5 files changed, 560 insertions(+), 11 deletions(-) create mode 100644 examples/tep_termination/vxlan.c create mode 100644 examples/tep_termination/vxlan_setup.c diff --git a/examples/tep_termination/Makefile b/examples/tep_termination/Makefile index 75be3ac..2c79bd2 100644 --- a/examples/tep_termination/Makefile +++ b/examples/tep_termination/Makefile @@ -47,7 +47,7 @@ endif APP = tep_termination # all source are stored in SRCS-y -SRCS-y := main.c +SRCS-y := main.c vxlan_setup.c vxlan.c CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index e90396d..bb0f345 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -133,6 +133,16 @@ struct vpool { uint32_t buf_size; } vpool_array[MAX_QUEUES+MAX_QUEUES]; +/* overlay packet operation */ +struct ol_switch_ops overlay_options = { + .port_configure = vxlan_port_init, + .tunnel_setup = vxlan_link, + .tunnel_destroy = vxlan_unlink, + .tx_handle = vxlan_tx_pkts, + .rx_handle = vxlan_rx_pkts, + .param_handle = NULL, +}; + /* Enable stats. */ uint32_t enable_stats = 0; /* Enable retries on RX. */ @@ -311,9 +321,8 @@ tep_termination_parse_args(int argc, char **argv) "Invalid argument for rx-retry [0|1]\n"); tep_termination_usage(prgname); return -1; - } else { + } else enable_retry = ret; - } } /* Specify the retries delay time (in useconds) on RX. */ @@ -325,9 +334,8 @@ tep_termination_parse_args(int argc, char **argv) "Invalid argument for rx-retry-delay [0-N]\n"); tep_termination_usage(prgname); return -1; - } else { + } else burst_rx_delay_time = ret; - } } /* Specify the retries number on RX. */ @@ -339,9 +347,8 @@ tep_termination_parse_args(int argc, char **argv) "Invalid argument for rx-retry-num [0-N]\n"); tep_termination_usage(prgname); return -1; - } else { + } else burst_rx_retry_num = ret; - } } /* Enable/disable stats. */ @@ -353,9 +360,8 @@ tep_termination_parse_args(int argc, char **argv) "Invalid argument for stats [0..N]\n"); tep_termination_usage(prgname); return -1; - } else { + } else enable_stats = ret; - } } /* Set character device basename. */ @@ -447,6 +453,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m) if (unlikely(len == MAX_PKT_BURST)) { m_table = (struct rte_mbuf **)tx_q->m_table; + ret = overlay_options.tx_handle(ports[0], (uint16_t)tx_q->txq_id, + m_table, (uint16_t)tx_q->len); /* Free any buffers not handled by TX and update the port stats. */ if (unlikely(ret < len)) { do { @@ -507,6 +515,10 @@ switch_worker(__rte_unused void *arg) if (tx_q->len) { LOG_DEBUG(VHOST_DATA, "TX queue drained after timeout with burst size %u \n", tx_q->len); + ret = overlay_options.tx_handle(ports[0], + (uint16_t)tx_q->txq_id, + (struct rte_mbuf **)tx_q->m_table, + (uint16_t)tx_q->len);
[dpdk-dev] [PATCH v3 03/10] examples/tep_termination:add the pluggable structures for VXLAN packet processing
We are trying to create a framework for tunneling packet processing, so some common APIs are added here, which includes 1> tunnel port configuration 2> tunnel setup 3> tunnel destroying 4> tunneling packet processing for Rx side 5> tunneling packet processing for Tx side 6> tunnel parameter processing Signed-off-by: Jijiang Liu Signed-off-by: Thomas Long --- examples/tep_termination/main.c|1 + examples/tep_termination/vxlan_setup.h | 77 2 files changed, 78 insertions(+), 0 deletions(-) create mode 100644 examples/tep_termination/vxlan_setup.h diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index 30b3f14..e90396d 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -53,6 +53,7 @@ #include "main.h" #include "vxlan.h" +#include "vxlan_setup.h" /* the maximum number of external ports supported */ #define MAX_SUP_PORTS 1 diff --git a/examples/tep_termination/vxlan_setup.h b/examples/tep_termination/vxlan_setup.h new file mode 100644 index 000..0679876 --- /dev/null +++ b/examples/tep_termination/vxlan_setup.h @@ -0,0 +1,77 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef VXLAN_SETUP_H_ +#define VXLAN_SETUP_H_ + +typedef int (*ol_port_configure_t)(uint8_t port, + struct rte_mempool *mbuf_pool); + +typedef int (*ol_tunnel_setup_t)(struct vhost_dev *vdev, +struct rte_mbuf *m); + +typedef void (*ol_tunnel_destroy_t)(struct vhost_dev *vdev); + +typedef int (*ol_tx_handle_t)(uint8_t port_id, uint16_t queue_id, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts); + +typedef int (*ol_rx_handle_t)(struct virtio_net *dev, struct rte_mbuf **pkts, + uint32_t count); + +typedef int (*ol_param_handle)(struct virtio_net *dev); + +struct ol_switch_ops { + ol_port_configure_tport_configure; + ol_tunnel_setup_t tunnel_setup; + ol_tunnel_destroy_ttunnel_destroy; + ol_tx_handle_t tx_handle; + ol_rx_handle_t rx_handle; + ol_param_handleparam_handle; +}; + +int +vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool); + +int +vxlan_link(struct vhost_dev *vdev, struct rte_mbuf *m); + +void +vxlan_unlink(struct vhost_dev *vdev); + +int +vxlan_tx_pkts(uint8_t port_id, uint16_t queue_id, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int +vxlan_rx_pkts(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t count); + +#endif /* VXLAN_SETUP_H_ */ -- 1.7.7.6
[dpdk-dev] [PATCH v3 02/10] examples/tep_termination:define the basic VXLAN port information
Some basic VXLAN definations are added in this file, which includes VXLAN port information and VXLAN device structures. Signed-off-by: Jijiang Liu Signed-off-by: Thomas Long --- examples/tep_termination/main.c |1 + examples/tep_termination/vxlan.h | 60 ++ 2 files changed, 61 insertions(+), 0 deletions(-) create mode 100644 examples/tep_termination/vxlan.h diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index 2a376a6..30b3f14 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -52,6 +52,7 @@ #include #include "main.h" +#include "vxlan.h" /* the maximum number of external ports supported */ #define MAX_SUP_PORTS 1 diff --git a/examples/tep_termination/vxlan.h b/examples/tep_termination/vxlan.h new file mode 100644 index 000..8595eed --- /dev/null +++ b/examples/tep_termination/vxlan.h @@ -0,0 +1,60 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _VXLAN_H_ +#define _VXLAN_H_ + +#define PORT_MIN 49152 +#define PORT_MAX 65535 +#define PORT_RANGE ((PORT_MAX - PORT_MIN) + 1) + +#define VXLAN_N_PORTS 2 +#define VXLAN_HF_VNI 0x0800 +#define DEFAULT_VXLAN_PORT 4789 + +struct vxlan_port { + uint32_t vport_id; /**< VirtIO port id */ + uint32_t peer_ip;/**< remote VTEP IP address */ + struct ether_addr peer_mac; /**< remote VTEP MAC address */ + struct ether_addr vport_mac; /**< VirtIO port MAC address */ +} __rte_cache_aligned; + +struct vxlan_conf { + uint16_t dst_port; /**< VXLAN UDP destination port */ + uint32_t port_ip; /**< DPDK port IP address*/ + uint32_t in_key;/**< VLAN ID */ + uint32_t out_key; /**< VXLAN VNI */ + struct vxlan_port port[VXLAN_N_PORTS]; /**< VXLAN configuration */ +} __rte_cache_aligned; + +#endif /* _MAIN_H_ */ -- 1.7.7.6
[dpdk-dev] [PATCH v3 01/10] examples/tep_termination:initialize the VXLAN sample
This sample uses the basic virtio devices management function from the vHost example, which includes virtio device creation, destroying and maintenance. Signed-off-by: Jijiang Liu --- examples/Makefile |1 + examples/tep_termination/Makefile | 55 ++ examples/tep_termination/main.c | 1074 + examples/tep_termination/main.h | 125 + 4 files changed, 1255 insertions(+), 0 deletions(-) create mode 100644 examples/tep_termination/Makefile create mode 100644 examples/tep_termination/main.c create mode 100644 examples/tep_termination/main.h diff --git a/examples/Makefile b/examples/Makefile index e659f6f..d157e15 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -73,5 +73,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen DIRS-y += vmdq DIRS-y += vmdq_dcb DIRS-$(CONFIG_RTE_LIBRTE_POWER) += vm_power_manager +DIRS-y += tep_termination include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/tep_termination/Makefile b/examples/tep_termination/Makefile new file mode 100644 index 000..75be3ac --- /dev/null +++ b/examples/tep_termination/Makefile @@ -0,0 +1,55 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +ifneq ($(CONFIG_RTE_EXEC_ENV),"linuxapp") +$(error This application can only operate in a linuxapp environment, \ +please change the definition of the RTE_TARGET environment variable) +endif + +# binary name +APP = tep_termination + +# all source are stored in SRCS-y +SRCS-y := main.c + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c new file mode 100644 index 000..2a376a6 --- /dev/null +++ b/examples/tep_termination/main.c @@ -0,0 +1,1074 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + *
[dpdk-dev] [PATCH v3 00/10] Add a VXLAN sample
This VXLAN sample simulates a VXLAN Tunnel Endpoint (VTEP) termination in DPDK, which is used to demonstrate the offload and filtering capabilities of i40 NIC for VXLAN packet. And this sample uses the basic virtio devices management function from vHost example, and the US-vHost interface and tunnel filtering mechanism to direct the traffic to/from a specific VM. In addition, this sample is also designed to show how tunneling protocols can be handled. For the vHost interface, we do not need to support zero copy/inter VM packet transfer, etc. The approaches we took would be of benefit to you because we put a pluggable structure in place so that the application could be easily extended to support a new tunneling protocol. The software framework is as follows: |---| |---| | VM-1(VNI:100) | | VM-2(VNI:200)| | |--| |--| | | |--| |--| | | |vport0| |vport1| | | |vport0| |vport1| | |-|--|-|--|-| |-|--|-|--|-| Guests \ / |-\---/| | us-vHost interface | | |-||--| | | decap| | TEP| | encap | DPDK App | |-||--| | |||| |||| || |-||---| |tunnel filter|| IP/L4 Tx csum | |IP/L4 csum || TSO | |packet type || | NIC |packet recogn|| | |-||---| || || || /---\ VXLAN Tunnel The sample will support the followings: 1> Tunneling packet recognition. 2> The port of UDP tunneling is configurable 3> Directing incoming traffic to the correct queue based on the tunnel filter type such as inner MAC address and VNI. The VNI will be assigned from a static internal table based on the us-vHost device ID. Each device will receive a unique device ID. The inner MAC will be learned by the first packet transmitted from a device. 4> Decapsulation of Rx VXLAN traffic. This is a software only operation. 5> Encapsulation of Tx VXLAN traffic. This is a software only operation. 6> Tx outer IP, inner IP and L4 checksum offload 7> TSO support for tunneling packet The limitations: 1. No ARP support. 2. There are some duplicated source codes because I used the basic virtio device management function from VHOST sample. Considering that the current VHOST sample is quite complicated and huge enough, I think we shall have a separate sample for tunneling packet processing. 3. Currently, only the i40e NIC is tested in the sample, but other types of NICs will also be supported if they are able to support tunneling packet filter. v2 changes: Fixed an issue about the 'nb_ports' duplication in check_ports_num(). Removed the inaccurate comment in main.c Fixed an issue about TSO offload. v3 changes: Changed some viriable name that don't follow coding rules. Removed the limitation of VXLAN packet size due to TSO support. Removed useless 'll_root_used' variable in vxlan_setup.c file. Removed defination and use of '_htons'. Jijiang Liu (10): create VXLAN sample framework using virtio device management function add basic VXLAN structures addthe pluggable structures implement VXLAN packet processing add udp port configuration add filter type configuration add tx checksum offload configuration add TSO offload configuration add Rx checksum statistics add encapsulation and decapsulation flags examples/Makefile |1 + examples/tep_termination/Makefile | 55 ++ examples/tep_termination/main.c| 1205 examples/tep_termination/main.h| 129 examples/tep_termination/vxlan.c | 262 +++ examples/tep_termination/vxlan.h | 76 ++ examples/tep_termination/vxlan_setup.c | 444 examples/tep_termination/vxlan_setup.h | 77 ++ 8 files changed, 2249 insertions(+), 0 deletions(-) create mode 100644 examples/tep_termination/Makefile create mode 100644 examples/tep_termination/main.c create mode 100644 examples/tep_termination/main.h create mode 100644 examples/tep_termination/vxlan.c create mode 100644 examples/tep_termination/vxlan.h create mode 100644 examples/tep_termination/vxlan_setup.c create mode 100644 examples/tep_termination/vxlan_setup.h -- 1.7.7.6
[dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
On 29/05/15 20:23, Thomas Monjalon wrote: > 2015-05-27 11:15, Marc Sune: >> On 27/05/15 06:02, Thomas Monjalon wrote: >>> Why not starting with lower values? Some new drivers may be interested >>> by lower speed. >> Ok, but which values? 1Mbps FD/HD? Even lower than that? >> >> If you have some NIC(s) in mind with lower values, please point me to >> that and I will collect&add the missing speeds. > No sorry, I missed how low your first values were. > +#define ETH_SPEED_CAP_10M_HD (1 << 0) /*< 10 Mbps half-duplex> */ +#define ETH_SPEED_CAP_10M_FD (1 << 1) /*< 10 Mbps full-duplex> */ +#define ETH_SPEED_CAP_100M_HD (1 << 2) /*< 100 Mbps half-duplex> */ +#define ETH_SPEED_CAP_100M_FD (1 << 3) /*< 100 Mbps full-duplex> */ +#define ETH_SPEED_CAP_1G (1 << 4) /*< 1 Gbps > */ +#define ETH_SPEED_CAP_2_5G(1 << 5) /*< 2.5 Gbps > */ +#define ETH_SPEED_CAP_5G (1 << 6) /*< 5 Gbps > */ +#define ETH_SPEED_CAP_10G (1 << 7) /*< 10 Mbps > */ +#define ETH_SPEED_CAP_20G (1 << 8) /*< 20 Gbps > */ +#define ETH_SPEED_CAP_25G (1 << 9) /*< 25 Gbps > */ +#define ETH_SPEED_CAP_40G (1 << 10) /*< 40 Gbps > */ +#define ETH_SPEED_CAP_50G (1 << 11) /*< 50 Gbps > */ +#define ETH_SPEED_CAP_56G (1 << 12) /*< 56 Gbps > */ +#define ETH_SPEED_CAP_100G(1 << 13) /*< 100 Gbps > */ >>> We should note that rte_eth_link is using ETH_LINK_SPEED_* constants >>> which are not some bitmaps so we have to create these new constants. >> Yes, I can add that to the patch description (1/2). >> >>> Furthermore, rte_eth_link.link_speed is an uint16_t so it is limited >>> to 40G. Should we use some constant bitmaps here also? >> I also thought about converting link_speed into a bitmap to unify the >> constants before starting the patch (there is redundancy), but I wanted >> to be minimally invasive; changing link to a bitmap can break existing apps. >> >> I can also merge them if we think is a better idea. > Maybe. Someone against this idea? Me. I tried implementing this unified speed constantss, but the problem is that for the capabilities full-duplex/half-duplex speeds are unrolled (e.g. 100M_HD/100_FD). There is no generic 100M to set a specific speed, so if you want a fiex speed and duplex auto-negociation witht the current set of constants, it would look weird; e.g. link_speed=ETH_SPEED_100M_HD and then set link_duplex=ETH_LINK_AUTONEG_DUPLEX): 232 struct rte_eth_link { 233 uint16_t link_speed; /**< ETH_LINK_SPEED_[10, 100, 1000, 1] */ 234 uint16_t link_duplex; /**< ETH_LINK_[HALF_DUPLEX, FULL_DUPLEX] */ 235 uint8_t link_status : 1; /**< 1 -> link up, 0 -> link down */ 236 }__attribute__((aligned(8))); /**< aligned for atomic64 read/write */ There is another minor point, which is when setting the speed in rte_eth_conf: 840 struct rte_eth_conf { 841 uint16_t link_speed; 842 /**< ETH_LINK_SPEED_10[0|00|000], or 0 for autonegotation */ 0 is used for speed auto-negociation, but 0 is also used in the capabilities bitmap to indicate no PHY_MEDIA (virtual interface). I would have to define something like: 906 #define ETH_SPEED_NOT_PHY (0) /*< No phy media > */ 907 #define ETH_SPEED_AUTONEG (0) /*< Autonegociate speed > */ And use (only) NOT_PHY for a capabilities and _AUTONEG for rte_eth_conf. The options I see: a) add to the the list of the current speeds generic 10M/100M/1G speeds without HD/FD, and just use these speeds in rte_eth_conf. b) leave them separated. I would vote for b), since the a) is not completely clean. Opinions&other alternatives welcome. Marc > >>> What about removing _CAP suffix from your constants? >> I added the suffix to make clearer the distinction with link speeds. I >> can remove it if we merge both or if we consider it is not necessary. >> >>> [...] + uint32_t speed_capa; /**< Supported speeds bitmap (ETH_SPEED_CAP_). */ >>> If the constants are ETH_SPEED_CAP, why not wording this variable speed_cap? >> I followed the convention of the existing rx/tx offload capability bitmaps: >> >> marc at dev:~/git/bisdn/msune/xdpd/libs/dpdk/lib$ grep _capa\; * -R >> librte_ether/rte_ethdev.h:uint32_t rx_offload_capa; /**< Device RX >> offload capabilities. */ >> librte_ether/rte_ethdev.h:uint32_t tx_offload_capa; /**< Device TX >> offload capabilities. */ >> >> I am fine with speed_cap or speed_caps, but I think we should have some >> consistency on how we name bitmaps. > You're right. > >> If we would want to make the bitmaps more explicit, we could define some >> helper typedefs in EAL: >> >> typedef uint16_t bitmap16_t; >> typedef uint32_t bitmap32_t; >> typedef uint64_t bitmap64_t; >> >> and replace the bitmaps of the structs, again specially the ones used by >> the users. > No, if we want to show this variable is a bitmap, the variable name > may be changed, not the type. It would bring clarity when readin
[dpdk-dev] [PATCH] mk: remove "u" modifier from "ar" command
Hi Bruce, On 06/05/2015 01:05 PM, Bruce Richardson wrote: > On Fedora 22, the "ar" binary operates by default in deterministic mode, > making the "u" parameter irrelevant, and leading to warning messages > getting printed in the build output like below. > > INSTALL-LIB librte_kvargs.a > ar: `u' modifier ignored since `D' is the default (see `U') > > There are two options to remove these warnings: > * add in the "U" flag to make "ar" non-deterministic again > * remove the "u" flag to have all objects always updated Indeed, I think that removing 'u' won't have any impact in this case, as we always regenerate the full archive without updating it. However, why not explicitly use 'D' to have the same behavior across distributions? Regards, Olivier > > This patch takes the second approach. > > Signed-off-by: Bruce Richardson > --- > mk/rte.lib.mk | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk > index 0d7482d..6bd67aa 100644 > --- a/mk/rte.lib.mk > +++ b/mk/rte.lib.mk > @@ -70,7 +70,7 @@ else > _CPU_LDFLAGS := $(CPU_LDFLAGS) > endif > > -O_TO_A = $(AR) crus $(LIB) $(OBJS-y) > +O_TO_A = $(AR) crs $(LIB) $(OBJS-y) > O_TO_A_STR = $(subst ','\'',$(O_TO_A)) #'# fix syntax highlight > O_TO_A_DISP = $(if $(V),"$(O_TO_A_STR)"," AR $(@)") > O_TO_A_CMD = "cmd_$@ = $(O_TO_A_STR)" >
[dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
Hi Helin, On 06/02/2015 05:16 AM, Helin Zhang wrote: > As i40e hardware can be reconfigured to support QinQ stripping and > insertion, this patch set is to enable that together with using the > reserved 16 bits in 'struct rte_mbuf' for the second vlan tag. > Corresponding command is added in testpmd for testing. > Note that no need to rework vPMD, as nothings used in it changed. > > v2 changes: > * Added more commit logs of which commit it fix for. > * Fixed a typo. > * Kept the original RX/TX offload flags as they were, added new > flags after with new bit masks, for ABI compatibility. > * Supported double vlan stripping/insertion in examples/ipv4_multicast. Acked-by: Olivier Matz
[dpdk-dev] [RFC] af_packet: support port hotplug
-Original Message- From: Tetsuya Mukawa [mailto:muk...@igel.co.jp] Sent: Tuesday, March 17, 2015 3:43 AM To: Iremonger, Bernard Cc: John W. Linville; dev at dpdk.org Subject: Re: [dpdk-dev] [RFC] af_packet: support port hotplug On 2015/03/16 23:47, Iremonger, Bernard wrote: > >> -Original Message- >> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp] >> Sent: Monday, March 16, 2015 8:57 AM >> To: Iremonger, Bernard >> Cc: John W. Linville; dev at dpdk.org >> Subject: Re: [dpdk-dev] [RFC] af_packet: support port hotplug >> >>> @@ -835,10 +848,53 @@ rte_pmd_af_packet_devinit(const char *name, const >>> char *params) >>> return 0; >>> } >>> >>> +static int >>> +rte_pmd_af_packet_devuninit(const char *name) { >>> + struct rte_eth_dev *eth_dev = NULL; >>> + struct pmd_internals *internals; >>> + struct tpacket_req req; >>> + >>> + unsigned q; >>> + >>> + RTE_LOG(INFO, PMD, "Closing AF_PACKET ethdev on numa socket %u\n", >>> + rte_socket_id()); >>> + >>> + if (name == NULL) >>> + return -1; >> Hi Tetsuya, John, >> >> Before detaching a port, the port must be stopped and closed. >> The stop and close are only allowed for RTE_PROC_PRIMARY. >> Should there be a check for process_type here? >> >> if (rte_eal_process_type() != RTE_PROC_PRIMARY) >> return -EPERM; >> >> Regards, >> >> Bernard >> > Hi Bernard, > > I agree with stop() and close() are only called by primary > process, but it may not need to add like above. > Could you please check rte_ethdev.c? > > - struct rte_eth_dev_data *rte_eth_dev_data; This array is shared between > processes. > So we need to initialize of finalize carefully like you said. > > - struct rte_eth_dev rte_eth_devices[] This array is per process. > And 'data' variable of this structure indicates a pointer of > rte_eth_dev_data. > > All PMDs for physical NIC allocates like above when PMDs are initialized. > (Even when a process is secondary, initialization function of PMDs > will be called) But virtual device PMDs allocate rte_eth_dev_data and > overwrite 'data' > variable of rte_eth_devices while initialization. > > As a result, primary and secondary process has their own > 'rte_eth_dev_data' for a virtual device. > So I guess all processes need to free it not to leak memory. > > Thanks, > Tetsuya > Hi Tetsuya, In rte_ethdev.c both rte_eth_dev_stop() and rte_eth_dev_close() use the macro >> PROC_PRIMARY_OR_RET(). So for secondary processes both functions return without doing anything. Maybe this check should be added to rte_eth_dev_attach() and rte_eth_dev_detach() ? For the Physical/Virtual Functions of the NIC a lot of the finalization is done in the dev->dev_ops->dev_stop() and dev->dev_ops->dev_close() functions. To complete the finalization dev->the dev_uninit() function is >> called, this should probably do nothing for secondary processes as >> the dev_stop() and dev_close() functions will not have been executed. >>> Hi Bernard, >>> >>> Sorry for my English. >>> I meant 'virtual device PMD' as PMDs like pcap or af_packet PMDs. >>> Not a PMDs for virtual functions on NIC. >>> >>> For PMDs like a pcap and af_packet PMDs, all data structures are >>> allocated per processes. >>> (Actually I guess nothing is shared between primary and secondary >>> processes, because rte_eth_dev_data is overwritten by each >>> processes.) So we need to free per process data when detach() is called. >>> For the Physical/Virtual Functions of the NIC the dev_init() is called for both primary and >> secondary processes, however a subset of the function only is executed for >> secondary processes. >>> Because of above, we probably not be able to add >>> PROC_PRIMARY_OR_RET() to rte_eth_dev_detach(). >>> But I agree we should not call rte_eth_dev_detach() for secondary >>> process, if PMDs are like e1000 or ixgbe PMD. >> Correction: >> We should not process rte_eth_dev_detach() for secondary process, if >> PMDs are like e1000 or ixgbe PMD and if primary process hasn't called >> stop() and close() yet. >> >> Tetsuya >> >>> To work like above, how about changing drv_flags dynamically in >>> close() callback? >>> For example, when primary process calls rte_eth_dev_close(), a >>> callback of PMD will be called. >>> (In the case of e1000 PMD, eth_em_close() is the callback.) >>> >>> At that time, specify RTE_PCI_DRV_DETACHABLE flag to drv_flag in the >>> callback. >>> It means if primary process hasn't called close() yet, >>> rte_eth_dev_detach() will do nothing and return error. >>> How about doing like above? >>> >>> Regards, >>> Tetsuya > Hi Tetsuya, > For the e1000, igb and ixgbe PMD's it is p
[dpdk-dev] [PATCH v3 00/10] Add a VXLAN sample
Acked-by: Helin Zhang Thanks for the good example! > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu > Sent: Monday, June 8, 2015 11:02 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3 00/10] Add a VXLAN sample > > This VXLAN sample simulates a VXLAN Tunnel Endpoint (VTEP) termination in > DPDK, which is used to demonstrate the offload and filtering capabilities of > i40 > NIC for VXLAN packet. > > And this sample uses the basic virtio devices management function from vHost > example, and the US-vHost interface and tunnel filtering mechanism to direct > the > traffic to/from a specific VM. > > In addition, this sample is also designed to show how tunneling protocols can > be > handled. For the vHost interface, we do not need to support zero copy/inter VM > packet transfer, etc. The approaches we took would be of benefit to you > because > we put a pluggable structure in place so that the application could be easily > extended to support a new tunneling protocol. > > The software framework is as follows: > > >|---| |---| >| VM-1(VNI:100) | | VM-2(VNI:200)| >| |--| |--| | | |--| |--| | >| |vport0| |vport1| | | |vport0| |vport1| | >|-|--|-|--|-| |-|--|-|--|-| Guests > \ / > |-\---/| > | us-vHost interface | > | |-||--| | > | decap| | TEP| | encap | DPDK App > | |-||--| | > |||| > |||| > || > |-||---| > |tunnel filter|| IP/L4 Tx csum | > |IP/L4 csum || TSO | > |packet type || | NIC > |packet recogn|| | > |-||---| > || > || > || > /---\ > VXLAN Tunnel > > The sample will support the followings: > 1> Tunneling packet recognition. > > 2> The port of UDP tunneling is configurable > > 3> Directing incoming traffic to the correct queue based on the tunnel filter > type > such as inner MAC address and VNI. > > The VNI will be assigned from a static internal table based on the us-vHost > device ID. Each device will receive a unique device ID. The inner MAC will be > learned by the first packet transmitted from a device. > > 4> Decapsulation of Rx VXLAN traffic. This is a software only operation. > > 5> Encapsulation of Tx VXLAN traffic. This is a software only operation. > > 6> Tx outer IP, inner IP and L4 checksum offload > > 7> TSO support for tunneling packet > > The limitations: > 1. No ARP support. > 2. There are some duplicated source codes because I used the basic virtio > device management function from VHOST sample. Considering that the current > VHOST sample is quite complicated and huge enough, I think we shall have a > separate sample for tunneling packet processing. > 3. Currently, only the i40e NIC is tested in the sample, but other types of > NICs will > also be supported if they are able to support tunneling packet filter. > > v2 changes: > Fixed an issue about the 'nb_ports' duplication in check_ports_num(). > Removed the inaccurate comment in main.c > Fixed an issue about TSO offload. > > v3 changes: > Changed some viriable name that don't follow coding rules. > Removed the limitation of VXLAN packet size due to TSO support. > Removed useless 'll_root_used' variable in vxlan_setup.c file. > Removed defination and use of '_htons'. > > Jijiang Liu (10): > create VXLAN sample framework using virtio device management function > add basic VXLAN structures > addthe pluggable structures > implement VXLAN packet processing > add udp port configuration > add filter type configuration > add tx checksum offload configuration > add TSO offload configuration > add Rx checksum statistics > add encapsulation and decapsulation flags > > > examples/Makefile |1 + > examples/tep_termination/Makefile | 55 ++ > examples/tep_termination/main.c| 1205 > > examples/tep_termination/main.h| 129 > examples/tep_termination/vxlan.c | 262 +++ > examples/tep_termination/vxlan.h | 76 ++ > examples/tep_termination/vxlan_setup.c | 444 > examples/tep_termination/vxlan_setup.h | 77 ++ > 8 files changed, 2249 insertions(+), 0 deletions(-) create mode 100644 > examples/tep_termination/Makefile create mode 100644 > examples/tep_termination/main.c create mode 100644 > examples/tep_termination/main
[dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
Tested-by: Min Cao - OS: Fedora20 3.11.10-301 - GCC: gcc version 4.8.2 20131212 - CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz - NIC: Ethernet controller: Intel Corporation Device 1572 (rev 01) - Default x86_64-native-linuxapp-gcc configuration - Total 2 cases, 2 passed, 0 failed - Case: double vlan filter - Case: double vlan insertion > -Original Message- > From: Zhang, Helin > Sent: Tuesday, June 2, 2015 11:16 AM > To: dev at dpdk.org > Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; > Richardson, Bruce; olivier.matz at 6wind.com; Zhang, Helin > Subject: [PATCH v2 0/6] support i40e QinQ stripping and insertion > > As i40e hardware can be reconfigured to support QinQ stripping and > insertion, this patch set is to enable that together with using the > reserved 16 bits in 'struct rte_mbuf' for the second vlan tag. > Corresponding command is added in testpmd for testing. > Note that no need to rework vPMD, as nothings used in it changed. > > v2 changes: > * Added more commit logs of which commit it fix for. > * Fixed a typo. > * Kept the original RX/TX offload flags as they were, added new > flags after with new bit masks, for ABI compatibility. > * Supported double vlan stripping/insertion in examples/ipv4_multicast. > > Helin Zhang (6): > ixgbe: remove a discarded source line > mbuf: use the reserved 16 bits for double vlan > i40e: support double vlan stripping and insertion > i40evf: add supported offload capability flags > app/testpmd: add test cases for qinq stripping and insertion > examples/ipv4_multicast: support double vlan stripping and insertion > > app/test-pmd/cmdline.c| 78 + > app/test-pmd/config.c | 21 +- > app/test-pmd/flowgen.c| 4 +- > app/test-pmd/macfwd.c | 3 ++ > app/test-pmd/macswap.c| 3 ++ > app/test-pmd/rxonly.c | 3 ++ > app/test-pmd/testpmd.h| 6 ++- > app/test-pmd/txonly.c | 8 +++- > drivers/net/i40e/i40e_ethdev.c| 52 + > drivers/net/i40e/i40e_ethdev_vf.c | 13 +++ > drivers/net/i40e/i40e_rxtx.c | 81 > +-- > drivers/net/ixgbe/ixgbe_rxtx.c| 1 - > examples/ipv4_multicast/main.c| 1 + > lib/librte_ether/rte_ethdev.h | 2 + > lib/librte_mbuf/rte_mbuf.h| 10 - > 15 files changed, 243 insertions(+), 43 deletions(-) > > -- > 1.9.3
[dpdk-dev] [PATCH v1] app/test: fix pmd_perf issue in no NUMA case
Thank you Steve. Acked. Thanks M Jay http://www.dpdk.org -Original Message- From: Liang, Cunming Sent: Sunday, June 07, 2015 11:33 PM To: dev at dpdk.org Cc: Jayakumar, Muthurajan; Liang, Cunming Subject: [PATCH v1] app/test: fix pmd_perf issue in no NUMA case Reported-by: Jayakumar, Muthurajan Signed-off-by: Cunming Liang --- app/test/test_pmd_perf.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index 1fd6843..6f218f7 100644 --- a/app/test/test_pmd_perf.c +++ b/app/test/test_pmd_perf.c @@ -321,6 +321,19 @@ alloc_lcore(uint16_t socketid) return (uint16_t)-1; } +static int +get_socket_id(uint8_t port_id) +{ + int socket_id; + + socket_id = rte_eth_dev_socket_id(port_id); + if (socket_id < 0) + /* enforce using socket 0 when no NUMA support */ + socket_id = 0; + + return socket_id; +} + volatile uint64_t stop; uint64_t count; uint64_t drop; @@ -727,7 +740,7 @@ test_pmd_perf(void) num = 0; for (portid = 0; portid < nb_ports; portid++) { if (socketid == -1) { - socketid = rte_eth_dev_socket_id(portid); + socketid = get_socket_id(portid); slave_id = alloc_lcore(socketid); if (slave_id == (uint16_t)-1) { printf("No avail lcore to run test\n"); @@ -737,7 +750,7 @@ test_pmd_perf(void) slave_id, socketid); } - if (socketid != rte_eth_dev_socket_id(portid)) { + if (socketid != get_socket_id(portid)) { printf("Skip port %d\n", portid); continue; } @@ -818,7 +831,7 @@ test_pmd_perf(void) /* port tear down */ for (portid = 0; portid < nb_ports; portid++) { - if (socketid != rte_eth_dev_socket_id(portid)) + if (socketid != get_socket_id(portid)) continue; rte_eth_dev_stop(portid); -- 1.8.1.4
[dpdk-dev] l2fwd consumes 100% cpu
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Victor Detoni > Sent: Monday, June 08, 2015 9:06 AM > To: dev at dpdk.org > Subject: [dpdk-dev] l2fwd consumes 100% cpu > > hello, > > I'm looking for some documentation about this issue, but I could found yet. > So, I would like to know if there is some resource like a poll or select to > put on l2fwd_main_loop? The way to limit the cpu utilization is set a specific quota for your thread by cgroup. If you do like to sleep your thread when there's no traffic. There's a patch set to enable thread wait-up from epoll_wait by one shot rx interrupt. http://article.gmane.org/gmane.comp.networking.dpdk.devel/18942 > > It would be great to analyse cpu consumption, because I can see 100% of cpu > with no traffic pass through it. > > thanks > Victor