date:20210630

[dpdk-dev] [PATCH] net/mlx5: document workaround for ConnectX-4 with L2 encap

2021-06-30 Thread Dmitry Kozlyuk

ConnectX-4 and ConnectX-4 Lx NICs require all L2 headers of transmitted
packets to be inlined. By default only first 18 bytes are inlined,
which is insufficient if additional encapsulation is used, like Q-in-Q.
Thus, default settings caused such traffic to be dropepd on Tx.
Document a workaround to increase inlined data size in such cases.

Fixes: 505f1fe426d3 ("net/mlx5: add Tx devargs")
Cc: sta...@dpdk.org

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Viacheslav Ovsiienko 
---
 doc/guides/nics/mlx5.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index ebefbe607e..05a89d08f2 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -705,6 +705,13 @@ Driver options
   it is not recommended and may prevent NIC from sending packets over
   some configurations.
 
+  For ConnectX-4 and ConnectX-4 Lx NICs, automatically configured value
+  is insufficient for some traffic, because they require at least all L2 
headers
+  to be inlined. For example, Q-in-Q adds 4 bytes to default 18 bytes
+  of Ethernet and VLAN, thus ``txq_inline_min`` must be set to 22.
+  MPLS would add 4 bytes per label. Final value must account for all possible
+  L2 encapsulation headers used in particular environment.
+
   Please, note, this minimal data inlining disengages eMPW feature (Enhanced
   Multi-Packet Write), because last one does not support partial packet 
inlining.
   This is not very critical due to minimal data inlining is mostly required
-- 
2.25.1

[dpdk-dev] [PATCH 0/3] net/mlx5: add flow rule match for IPv4 IHL field

2021-06-30 Thread Gregory Etelson

Expand MLX5 PMD flows functionality with a match on IPV4 IHL field.
Allow testpmd to cofigure IPv4 IHL values in flow rule.

Gregory Etelson (3):
  common/mlx5: query for hardware capability to offload IPv4 IHL field
  net/mlx5: add flow rule match for IPv4 IHL field
  app/testpmd: add flow item to match on IPv4 version_ihl field

 app/test-pmd/cmdline_flow.c  | 12 ++-
 drivers/common/mlx5/mlx5_devx_cmds.c |  6 ++
 drivers/common/mlx5/mlx5_devx_cmds.h |  2 ++
 drivers/net/mlx5/mlx5_flow_dv.c  | 31 +---
 4 files changed, 42 insertions(+), 9 deletions(-)

-- 
2.31.1

[dpdk-dev] [PATCH 1/3] common/mlx5: query for hardware capability to offload IPv4 IHL field

2021-06-30 Thread Gregory Etelson

The patch queries MLX5 port hardware if it is capable to offload
IPv4 IHL field.

Signed-off-by: Gregory Etelson 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 6 ++
 drivers/common/mlx5/mlx5_devx_cmds.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c 
b/drivers/common/mlx5/mlx5_devx_cmds.c
index f5914bce32..9070691332 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -948,6 +948,12 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
(flow_table_nic_cap, hcattr,
 flow_table_properties_nic_receive.log_max_ft_sampler_num);
attr->pkt_integrity_match = mlx5_devx_query_pkt_integrity_match(hcattr);
+   attr->inner_ipv4_ihl = MLX5_GET
+   (flow_table_nic_cap, hcattr,
+ft_field_support_2_nic_receive.inner_ipv4_ihl);
+   attr->outer_ipv4_ihl = MLX5_GET
+   (flow_table_nic_cap, hcattr,
+ft_field_support_2_nic_receive.outer_ipv4_ihl);
/* Query HCA offloads for Ethernet protocol. */
memset(in, 0, sizeof(in));
memset(out, 0, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h 
b/drivers/common/mlx5/mlx5_devx_cmds.h
index f8a17b886b..034c40b49c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -148,6 +148,8 @@ struct mlx5_hca_attr {
uint32_t crypto_login:1; /* General obj type CRYPTO_LOGIN supported. */
uint32_t regexp_num_of_engines;
uint32_t log_max_ft_sampler_num:8;
+   uint32_t inner_ipv4_ihl:1;
+   uint32_t outer_ipv4_ihl:1;
uint32_t geneve_tlv_opt;
uint32_t cqe_compression:1;
uint32_t mini_cqe_resp_flow_tag:1;
-- 
2.31.1

[dpdk-dev] [PATCH 2/3] net/mlx5: add flow rule match for IPv4 IHL field

2021-06-30 Thread Gregory Etelson

Provide flow rules capability to match on IPv4 IHL field.
Minimal HCA firmware version requiredto offload IPv4 IHL is
xx_30_2000.

Signed-off-by: Gregory Etelson 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c5d4b01e57..155f686ad1 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2451,19 +2451,19 @@ flow_dv_validate_item_gtp_psc(const struct 
rte_flow_item *item,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_item_ipv4(const struct rte_flow_item *item,
-  uint64_t item_flags,
-  uint64_t last_item,
-  uint16_t ether_type,
-  struct rte_flow_error *error)
+flow_dv_validate_item_ipv4(struct rte_eth_dev *dev,
+  const struct rte_flow_item *item,
+  uint64_t item_flags, uint64_t last_item,
+  uint16_t ether_type, struct rte_flow_error *error)
 {
int ret;
+   struct mlx5_priv *priv = dev->data->dev_private;
const struct rte_flow_item_ipv4 *spec = item->spec;
const struct rte_flow_item_ipv4 *last = item->last;
const struct rte_flow_item_ipv4 *mask = item->mask;
rte_be16_t fragment_offset_spec = 0;
rte_be16_t fragment_offset_last = 0;
-   const struct rte_flow_item_ipv4 nic_ipv4_mask = {
+   struct rte_flow_item_ipv4 nic_ipv4_mask = {
.hdr = {
.src_addr = RTE_BE32(0x),
.dst_addr = RTE_BE32(0x),
@@ -2474,6 +2474,17 @@ flow_dv_validate_item_ipv4(const struct rte_flow_item 
*item,
},
};
 
+   if (mask && (mask->hdr.version_ihl & RTE_IPV4_HDR_IHL_MASK)) {
+   int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+   bool ihl_cap = !tunnel ? priv->config.hca_attr.outer_ipv4_ihl :
+  priv->config.hca_attr.inner_ipv4_ihl;
+   if (!ihl_cap)
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM,
+ item,
+ "IPV4 ihl offload not 
supported");
+   nic_ipv4_mask.hdr.version_ihl = mask->hdr.version_ihl;
+   }
ret = mlx5_flow_validate_item_ipv4(item, item_flags, last_item,
   ether_type, &nic_ipv4_mask,
   MLX5_ITEM_RANGE_ACCEPTED, error);
@@ -6771,7 +6782,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct 
rte_flow_attr *attr,
case RTE_FLOW_ITEM_TYPE_IPV4:
mlx5_flow_tunnel_ip_check(items, next_protocol,
  &item_flags, &tunnel);
-   ret = flow_dv_validate_item_ipv4(items, item_flags,
+   ret = flow_dv_validate_item_ipv4(dev, items, item_flags,
 last_item, ether_type,
 error);
if (ret < 0)
@@ -8154,7 +8165,7 @@ flow_dv_translate_item_ipv4(void *matcher, void *key,
void *headers_v;
char *l24_m;
char *l24_v;
-   uint8_t tos;
+   uint8_t tos, ihl_m, ihl_v;
 
if (inner) {
headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
@@ -8183,6 +8194,10 @@ flow_dv_translate_item_ipv4(void *matcher, void *key,
*(uint32_t *)l24_m = ipv4_m->hdr.src_addr;
*(uint32_t *)l24_v = ipv4_m->hdr.src_addr & ipv4_v->hdr.src_addr;
tos = ipv4_m->hdr.type_of_service & ipv4_v->hdr.type_of_service;
+   ihl_m = ipv4_m->hdr.version_ihl & RTE_IPV4_HDR_IHL_MASK;
+   ihl_v = ipv4_v->hdr.version_ihl & RTE_IPV4_HDR_IHL_MASK;
+   MLX5_SET(fte_match_set_lyr_2_4, headers_m, ipv4_ihl, ihl_m);
+   MLX5_SET(fte_match_set_lyr_2_4, headers_v, ipv4_ihl, ihl_m & ihl_v);
MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_ecn,
 ipv4_m->hdr.type_of_service);
MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_ecn, tos);
-- 
2.31.1

[dpdk-dev] [PATCH 3/3] app/testpmd: add flow item to match on IPv4 version_ihl field

2021-06-30 Thread Gregory Etelson

The new flow item allows PMD to offload IPv4 IHL field for matching,
if hardware supports that operation.

Signed-off-by: Gregory Etelson 
---
 app/test-pmd/cmdline_flow.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 1c587bb7b8..c1c7b9a9f9 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -173,6 +173,7 @@ enum index {
ITEM_VLAN_INNER_TYPE,
ITEM_VLAN_HAS_MORE_VLAN,
ITEM_IPV4,
+   ITEM_IPV4_VER_IHL,
ITEM_IPV4_TOS,
ITEM_IPV4_ID,
ITEM_IPV4_FRAGMENT_OFFSET,
@@ -1071,6 +1072,7 @@ static const enum index item_vlan[] = {
 };
 
 static const enum index item_ipv4[] = {
+   ITEM_IPV4_VER_IHL,
ITEM_IPV4_TOS,
ITEM_IPV4_ID,
ITEM_IPV4_FRAGMENT_OFFSET,
@@ -2567,6 +2569,13 @@ static const struct token token_list[] = {
.next = NEXT(item_ipv4),
.call = parse_vc,
},
+   [ITEM_IPV4_VER_IHL] = {
+   .name = "version_ihl",
+   .help = "match header length",
+   .next = NEXT(item_ipv4, NEXT_ENTRY(UNSIGNED), item_param),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_item_ipv4,
+hdr.version_ihl)),
+   },
[ITEM_IPV4_TOS] = {
.name = "tos",
.help = "type of service",
@@ -8123,7 +8132,8 @@ update_fields(uint8_t *buf, struct rte_flow_item *item, 
uint16_t next_proto)
break;
case RTE_FLOW_ITEM_TYPE_IPV4:
ipv4 = (struct rte_ipv4_hdr *)buf;
-   ipv4->version_ihl = 0x45;
+   if (!ipv4->version_ihl)
+   ipv4->version_ihl = RTE_IPV4_VHL_DEF;
if (next_proto && ipv4->next_proto_id == 0)
ipv4->next_proto_id = (uint8_t)next_proto;
break;
-- 
2.31.1

[dpdk-dev] [PATCH] net/mlx5: fix pattern expansion in RSS flow rules

2021-06-30 Thread Gregory Etelson

Flow rule pattern may be implicitly expanded by the PMD if the rule
has RSS flow action. The expansion adds network headers to the
original pattern. The new pattern lists all network levels that
participate in the rule RSS action.

The patch validates that buffer for expanded pattern has enough bytes
for new flow items.

Fixes: c7870bfe09dc ("ethdev: move RSS expansion code to mlx5 driver")

Cc: sta...@dpdk.org
Signed-off-by: Gregory Etelson 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow.c | 63 +++-
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c5d4a95a8f..159e84bfab 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -264,6 +264,7 @@ mlx5_flow_expand_rss_item_complete(const struct 
rte_flow_item *item)
  *   set, the following errors are defined:
  *
  *   -E2BIG: graph-depth @p graph is too deep.
+ *   -EINVAL: @p size has not enough space for expanded pattern.
  */
 static int
 mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, size_t size,
@@ -290,12 +291,12 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
memset(&missed_item, 0, sizeof(missed_item));
lsize = offsetof(struct mlx5_flow_expand_rss, entry) +
MLX5_RSS_EXP_ELT_N * sizeof(buf->entry[0]);
-   if (lsize <= size) {
-   buf->entry[0].priority = 0;
-   buf->entry[0].pattern = (void *)&buf->entry[MLX5_RSS_EXP_ELT_N];
-   buf->entries = 0;
-   addr = buf->entry[0].pattern;
-   }
+   if (lsize > size)
+   return -EINVAL;
+   buf->entry[0].priority = 0;
+   buf->entry[0].pattern = (void *)&buf->entry[MLX5_RSS_EXP_ELT_N];
+   buf->entries = 0;
+   addr = buf->entry[0].pattern;
for (item = pattern; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
if (!mlx5_flow_is_rss_expandable_item(item)) {
user_pattern_size += sizeof(*item);
@@ -313,12 +314,12 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
}
user_pattern_size += sizeof(*item); /* Handle END item. */
lsize += user_pattern_size;
+   if (lsize > size)
+   return -EINVAL;
/* Copy the user pattern in the first entry of the buffer. */
-   if (lsize <= size) {
-   rte_memcpy(addr, pattern, user_pattern_size);
-   addr = (void *)(((uintptr_t)addr) + user_pattern_size);
-   buf->entries = 1;
-   }
+   rte_memcpy(addr, pattern, user_pattern_size);
+   addr = (void *)(((uintptr_t)addr) + user_pattern_size);
+   buf->entries = 1;
/* Start expanding. */
memset(flow_items, 0, sizeof(flow_items));
user_pattern_size -= sizeof(*item);
@@ -348,7 +349,9 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
elt = 2; /* missed item + item end. */
node = next;
lsize += elt * sizeof(*item) + user_pattern_size;
-   if ((node->rss_types & types) && lsize <= size) {
+   if (lsize > size)
+   return -EINVAL;
+   if (node->rss_types & types) {
buf->entry[buf->entries].priority = 1;
buf->entry[buf->entries].pattern = addr;
buf->entries++;
@@ -367,6 +370,7 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
while (node) {
flow_items[stack_pos].type = node->type;
if (node->rss_types & types) {
+   size_t n;
/*
 * compute the number of items to copy from the
 * expansion and copy it.
@@ -376,24 +380,23 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
elt = stack_pos + 2;
flow_items[stack_pos + 1].type = RTE_FLOW_ITEM_TYPE_END;
lsize += elt * sizeof(*item) + user_pattern_size;
-   if (lsize <= size) {
-   size_t n = elt * sizeof(*item);
-
-   buf->entry[buf->entries].priority =
-   stack_pos + 1 + missed;
-   buf->entry[buf->entries].pattern = addr;
-   buf->entries++;
-   rte_memcpy(addr, buf->entry[0].pattern,
-  user_pattern_size);
-   addr = (void *)(((uintptr_t)addr) +
-   user_pattern_size);
-   rte_memcpy(addr, &missed_item,
-  missed * sizeof(*item));
-   addr = (void *)(((uintptr_t)addr) +
-

Re: [dpdk-dev] [PATCH 0/2] MLX5 PMD tuning

2021-06-30 Thread Ruifeng Wang

> -Original Message-
> From: Ruifeng Wang 
> Sent: Tuesday, June 1, 2021 4:31 PM
> To: rasl...@nvidia.com; ma...@nvidia.com; shah...@nvidia.com;
> viachesl...@nvidia.com
> Cc: dev@dpdk.org; jer...@marvell.com; nd ; Honnappa
> Nagarahalli ; Ruifeng Wang
> 
> Subject: [PATCH 0/2] MLX5 PMD tuning
> 
> This series include optimizations for MLX5 PMD.
> In tests on Arm N1SDP with MLX5 40G NIC, changes showed performance
> gain.
> 
> Ruifeng Wang (2):
>   net/mlx5: remove redundant operations
>   net/mlx5: reduce unnecessary memory access
> 
>  drivers/net/mlx5/mlx5_rxtx_vec.c  | 6 --
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 9 +
>  2 files changed, 5 insertions(+), 10 deletions(-)
> 
> --
> 2.25.1

Ping.
Appreciate your review of these patches.

Thanks.

Re: [dpdk-dev] [PATCH v7 4/7] vhost: fix NUMA reallocation with multiqueue

2021-06-30 Thread David Marchand

On Tue, Jun 29, 2021 at 6:11 PM Maxime Coquelin
 wrote:
>
> Since the Vhost-user device initialization has been reworked,
> enabling the application to start using the device as soon as
> the first queue pair is ready, NUMA reallocation no more
> happened on queue pairs other than the first one since
> numa_realloc() was returning early if the device was running.
>
> This patch fixes this issue by only preventing the device
> metadata to be allocated if the device is running. For the

Hum, I understand the meaning, but I think we could make it easier to read:

This patch fixes this issue by reallocating the device metadata only
if the device is not running.

WDYT?

> virtqueues, a vring state change notification is sent to
> notify the application of its disablement. Since the callback
> is supposed to be blocking, it is safe to reallocate it
> afterwards.


-- 
David Marchand

[dpdk-dev] [PATCH v4 0/4] support async dequeue for split ring

2021-06-30 Thread Wenwu Ma

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading large copies to the DMA engine, thus saving precious CPU
cycles.
note: PATCH v4 3/4 depends on IOMMU patch from Ding,Xuan
(http://patches.dpdk.org/project/dpdk/patch/20210603173023.10487-1-xuan.d...@intel.com/)

v4:
- Fix wrong packet index issue in async dequeue improve
  the performance of small packet copies.

v3:
- Fix compilation warning and error in arm platform.
- Restore the removed function virtio_dev_pktmbuf_alloc,
  async dequeue allocate packets in separate.

v2:
- Refactor vhost datapath as preliminary patch for this series.
- The change of using new API in examples/vhost is put into a
  dedicated patch.
- Check queue_id value before using it.
- Async dequeue performance enhancement. 160% performance improvement
  for v2 vs. v1.
- Async dequeue API name change from rte_vhost_try_dequeue_burst to
  rte_vhost_async_try_dequeue_burst.
- The completed package updates the used ring directly.

Wenwu Ma (3):
  examples/vhost: refactor vhost enqueue and dequeue datapaths.
  examples/vhost: use a new API to query remaining ring space
  examples/vhost: support vhost async dequeue data path

Yuan Wang (1):
  vhost: support async dequeue for split ring

 doc/guides/prog_guide/vhost_lib.rst |  10 +
 doc/guides/sample_app_ug/vhost.rst  |   9 +-
 examples/vhost/ioat.c   |  67 +++-
 examples/vhost/ioat.h   |  25 ++
 examples/vhost/main.c   | 224 +++
 examples/vhost/main.h   |  33 +-
 examples/vhost/virtio_net.c |  16 +-
 lib/vhost/rte_vhost_async.h |  44 ++-
 lib/vhost/version.map   |   3 +
 lib/vhost/virtio_net.c  | 579 
 10 files changed, 902 insertions(+), 108 deletions(-)

-- 
2.25.1

[dpdk-dev] [PATCH v4 1/4] examples/vhost: refactor vhost enqueue and dequeue datapaths.

2021-06-30 Thread Wenwu Ma

Previously, by judging the flag, we call different enqueue/dequeue
functions in data path.

Now, we use an ops that was initialized when Vhost was created,
so that we can call ops directly in Vhost data path without any more
flag judgment.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/main.c   | 112 
 examples/vhost/main.h   |  33 +--
 examples/vhost/virtio_net.c |  16 +-
 3 files changed, 105 insertions(+), 56 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d2179eadb9..aebdc3a566 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
.rxmode = {
@@ -885,27 +887,8 @@ drain_vhost(struct vhost_dev *vdev)
uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-   if (builtin_net_driver) {
-   ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-   } else if (async_vhost_driver) {
-   uint32_t cpu_cpl_nr = 0;
-   uint16_t enqueue_fail = 0;
-   struct rte_mbuf *m_cpu_cpl[nr_xmit];
-
-   complete_async_pkts(vdev);
-   ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   m, nr_xmit, m_cpu_cpl, &cpu_cpl_nr);
-
-   if (cpu_cpl_nr)
-   free_pkts(m_cpu_cpl, cpu_cpl_nr);
-
-   enqueue_fail = nr_xmit - ret;
-   if (enqueue_fail)
-   free_pkts(&m[ret], nr_xmit - ret);
-   } else {
-   ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   m, nr_xmit);
-   }
+   ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+   VIRTIO_RXQ, m, nr_xmit);
 
if (enable_stats) {
__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1184,6 +1167,36 @@ drain_mbuf_table(struct mbuf_table *tx_q)
}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint32_t rx_count)
+{
+   uint16_t enqueue_count;
+   uint32_t cpu_cpl_nr = 0;
+   uint16_t enqueue_fail = 0;
+   struct rte_mbuf *m_cpu_cpl[MAX_PKT_BURST];
+
+   complete_async_pkts(vdev);
+   enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
+   queue_id, pkts, rx_count,
+   m_cpu_cpl, &cpu_cpl_nr);
+   if (cpu_cpl_nr)
+   free_pkts(m_cpu_cpl, cpu_cpl_nr);
+
+   enqueue_fail = rx_count - enqueue_count;
+   if (enqueue_fail)
+   free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+   return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint32_t rx_count)
+{
+   return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1214,29 +1227,8 @@ drain_eth_rx(struct vhost_dev *vdev)
}
}
 
-   if (builtin_net_driver) {
-   enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-   pkts, rx_count);
-   } else if (async_vhost_driver) {
-   uint32_t cpu_cpl_nr = 0;
-   uint16_t enqueue_fail = 0;
-   struct rte_mbuf *m_cpu_cpl[MAX_PKT_BURST];
-
-   complete_async_pkts(vdev);
-   enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-   VIRTIO_RXQ, pkts, rx_count,
-   m_cpu_cpl, &cpu_cpl_nr);
-   if (cpu_cpl_nr)
-   free_pkts(m_cpu_cpl, cpu_cpl_nr);
-
-   enqueue_fail = rx_count - enqueue_count;
-   if (enqueue_fail)
-   free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-   } else {
-   enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   pkts, rx_count);
-   }
+   enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+   VIRTIO_RXQ, pkts, rx_count);
 
if (enable_stats) {
__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1249,6 +1241,14 @@ drain_eth_rx(struct vhost_dev *vdev)
free_pkts(pkts, rx_count);
 }
 
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+   struct rte_mempool *mbuf_pool,
+

[dpdk-dev] [PATCH v4 2/4] examples/vhost: use a new API to query remaining ring space

2021-06-30 Thread Wenwu Ma

A new API for querying the remaining descriptor ring capacity
is available, so we use the new one instead of the old one.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/ioat.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 2a2c2d7202..bf4e033bdb 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -17,7 +17,6 @@ struct packet_tracker {
unsigned short next_read;
unsigned short next_write;
unsigned short last_remain;
-   unsigned short ioat_space;
 };
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
@@ -113,7 +112,6 @@ open_ioat(const char *value)
goto out;
}
rte_rawdev_start(dev_id);
-   cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
dma_info->nr++;
i++;
}
@@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
src = descs[i_desc].src;
dst = descs[i_desc].dst;
i_seg = 0;
-   if (cb_tracker[dev_id].ioat_space < src->nr_segs)
+   if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
break;
while (i_seg < src->nr_segs) {
rte_ioat_enqueue_copy(dev_id,
@@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
}
write &= mask;
cb_tracker[dev_id].size_track[write] = src->nr_segs;
-   cb_tracker[dev_id].ioat_space -= src->nr_segs;
write++;
}
} else {
@@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
if (n_seg == 0)
return 0;
 
-   cb_tracker[dev_id].ioat_space += n_seg;
n_seg += cb_tracker[dev_id].last_remain;
 
read = cb_tracker[dev_id].next_read;
-- 
2.25.1

[dpdk-dev] [PATCH v4 3/4] vhost: support async dequeue for split ring

2021-06-30 Thread Wenwu Ma

From: Yuan Wang 

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading large copies to the DMA engine, thus saving precious CPU
cycles.

Signed-off-by: Yuan Wang 
Signed-off-by: Jiayu Hu 
Signed-off-by: Wenwu Ma 
---
 doc/guides/prog_guide/vhost_lib.rst |  10 +
 lib/vhost/rte_vhost_async.h |  44 ++-
 lib/vhost/version.map   |   3 +
 lib/vhost/virtio_net.c  | 579 
 4 files changed, 633 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index d18fb98910..05c42c9b11 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -281,6 +281,16 @@ The following is an overview of some key Vhost API 
functions:
   Poll enqueue completion status from async data path. Completed packets
   are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count, 
nr_inflight)``
+
+  Try to receive packets from the guest with offloading large packets
+  to the DMA engine. Successfully dequeued packets are transfer
+  completed and returned in ``pkts``. But there may be other packets
+  that are sent from the guest but being transferred by the DMA engine,
+  called in-flight packets. This function will return in-flight packets
+  only after the DMA engine finishes transferring. The amount of
+  in-flight packets by now is returned in ``nr_inflight``.
+
 Vhost-user Implementations
 --
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 6faa31f5ad..58019408f1 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -84,13 +84,21 @@ struct rte_vhost_async_channel_ops {
 };
 
 /**
- * inflight async packet information
+ * in-flight async packet information
  */
+struct async_nethdr {
+   struct virtio_net_hdr hdr;
+   bool valid;
+};
+
 struct async_inflight_info {
struct rte_mbuf *mbuf;
-   uint16_t descs; /* num of descs inflight */
+   union {
+   uint16_t descs; /* num of descs in-flight */
+   struct async_nethdr nethdr;
+   };
uint16_t nr_buffers; /* num of buffers inflight for packed ring */
-};
+} __rte_cache_aligned;
 
 /**
  *  dma channel feature bit definition
@@ -193,4 +201,34 @@ __rte_experimental
 uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
struct rte_mbuf **pkts, uint16_t count);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * large copies to the DMA engine. Successfully dequeued packets are
+ * transfer completed, either by the CPU or the DMA engine, and they are
+ * returned in "pkts". There may be other packets that are sent from
+ * the guest but being transferred by the DMA engine, called in-flight
+ * packets. The amount of in-flight packets by now is returned in
+ * "nr_inflight". This function will return in-flight packets only after
+ * the DMA engine finishes transferring.
+ *
+ * @param vid
+ *  id of vhost device to dequeue data
+ * @param queue_id
+ *  queue id to dequeue data
+ * @param pkts
+ *  blank array to keep successfully dequeued packets
+ * @param count
+ *  size of the packet array
+ * @param nr_inflight
+ *  the amount of in-flight packets by now. If error occurred, its
+ *  value is set to -1.
+ * @return
+ *  num of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+   struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+   int *nr_inflight);
+
 #endif /* _RTE_VHOST_ASYNC_H_ */
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 9103a23cd4..a320f889cd 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -79,4 +79,7 @@ EXPERIMENTAL {
 
# added in 21.05
rte_vhost_get_negotiated_protocol_features;
+
+   # added in 21.08
+   rte_vhost_async_try_dequeue_burst;
 };
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index b93482587c..71ab1cef69 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2673,6 +2673,32 @@ virtio_dev_pktmbuf_prep(struct virtio_net *dev, struct 
rte_mbuf *pkt,
return -1;
 }
 
+/*
+ * Allocate a host supported pktmbuf.
+ */
+static __rte_always_inline struct rte_mbuf *
+virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
+uint32_t data_len)
+{
+   struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
+
+   if (unlikely(pkt == NULL)) {
+   VHOST_LOG_DATA(ERR,
+   "Failed to allocate memory for mbuf.\n");
+   return NULL;
+   }
+
+   if (virtio_dev_pktmbuf_prep(dev, pkt, data_len)) {
+   /* Data does

[dpdk-dev] [PATCH v4 4/4] examples/vhost: support vhost async dequeue data path

2021-06-30 Thread Wenwu Ma

This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma 
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c  |  61 ++---
 examples/vhost/ioat.h  |  25 ++
 examples/vhost/main.c  | 140 -
 4 files changed, 177 insertions(+), 58 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst 
b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in 
combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index bf4e033bdb..a305100b47 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
char *addrs = input;
char *ptrs[2];
char *start, *end, *substr;
-   int64_t vid, vring_id;
+   int64_t socketid, vring_id;
struct rte_ioat_rawdev_config config;
struct rte_rawdev_info info = { .dev_private = &config };
char name[32];
@@ -60,6 +62,8 @@ open_ioat(const char *value)
goto out;
}
while (i < args_nr) {
+   char *txd, *rxd;
+   bool is_txd;
char *arg_temp = dma_arg[i];
uint8_t sub_nr;
sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +72,38 @@ open_ioat(const char *value)
goto out;
}
 
-   start = strstr(ptrs[0], "txd");
-   if (start == NULL) {
+   int async_flag;
+   txd = strstr(ptrs[0], "txd");
+   rxd = strstr(ptrs[0], "rxd");
+   if (txd == NULL && rxd == NULL) {
ret = -1;
goto out;
+   } else if (txd) {
+   is_txd = true;
+   start = txd;
+   async_flag = ASYNC_RX_VHOST;
+   } else {
+   is_txd = false;
+   start = rxd;
+   async_flag = ASYNC_TX_VHOST;
}
 
start += 3;
-   vid = strtol(start, &end, 0);
+   socketid = strtol(start, &end, 0);
if (end == start) {
ret = -1;
goto out;
}
 
-   vring_id = 0 + VIRTIO_RXQ;
+   vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
if (rte_pci_addr_parse(ptrs[1],
-   &(dma_info + vid)->dmas[vring_id].addr) < 0) {
+   &(dma_info + socketid)->dmas[vring_id].addr) < 0) {
ret = -1;
goto out;
}
 
-   rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+   rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
name, sizeof(name));
dev_id = rte_rawdev_get_dev_id(name);
if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
goto out;
}
 
-   (dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-   (dma_info + vid)->dmas[vring_id].is_valid = true;
+   (dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+   (dma_info + socketid)->dmas[vring_id].is_valid = true;
+   (dma_info + socketid)->async_flag |= async_flag;
config.ring_size = IOAT_RING_SIZE;
config.hdls_disable = true;
if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
struct rte_vhost

Re: [dpdk-dev] [PATCH] net/mlx5: fix the modify field action flag checking

2021-06-30 Thread Raslan Darawsheh

Hi,

> -Original Message-
> From: Jiawei(Jonny) Wang 
> Sent: Monday, June 28, 2021 1:58 PM
> To: Slava Ovsiienko ; Matan Azrad
> ; Ori Kam ; NBU-Contact-Thomas
> Monjalon ; Shahaf Shuler ;
> Alexander Kozyrev 
> Cc: dev@dpdk.org; Raslan Darawsheh ;
> sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix the modify field action flag checking
> 
> The introduced MODIFY_FIELD action was used to manipulate
> the packet header field through copy or set operations.
> 
> These modify header actions should be counted as one action
> in low level, the current code used wrong actions flags
> checking for modify field action.
> 
> This patch update the action flags checking into the correct
> MODIFY_HDR_ACTIONS set.
> 
> Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jiawei Wang 
> Acked-by: Viacheslav Ovsiienko 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [dpdk-dev] [PATCH] common/mlx5: share memory free callback

2021-06-30 Thread Raslan Darawsheh

Hi,

> -Original Message-
> From: Michael Baum 
> Sent: Monday, June 28, 2021 6:06 PM
> To: dev@dpdk.org
> Cc: Matan Azrad ; Raslan Darawsheh
> ; Slava Ovsiienko ;
> sta...@dpdk.org
> Subject: [PATCH] common/mlx5: share memory free callback
> 
> All the mlx5 drivers using MRs for data-path must unregister the mapped
> memory when it is freed by the dpdk process.
> 
> Currently, only the net/eth driver unregisters MRs in free event.
> 
> Move the net callback handler from net driver to common.
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Michael Baum 
> Acked-by: Matan Azrad 
> ---

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [dpdk-dev] [PATCH] net/mlx5: fix the wrong representor ID checking for sample

2021-06-30 Thread Raslan Darawsheh

Hi,

> -Original Message-
> From: Jiawei(Jonny) Wang 
> Sent: Tuesday, June 29, 2021 10:00 AM
> To: Slava Ovsiienko ; Matan Azrad
> ; Ori Kam ; NBU-Contact-Thomas
> Monjalon ; Shahaf Shuler ;
> Xueming(Steven) Li 
> Cc: dev@dpdk.org; Raslan Darawsheh ;
> sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix the wrong representor ID checking for sample
> 
> The representor definition was introduced in the latest code.
> For non-representor port, like PF port, use the 0x instead of -1.
> 
> This patch updates the representor id checking during splitting sample flow.
> 
> Fixes: cb95feefdd03 ("net/mlx5: support sub-function representor")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jiawei Wang 
> Acked-by: Xueming Li 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [dpdk-dev] [PATCH] net/mlx5: fix multi-segment inline for the first segment

2021-06-30 Thread Raslan Darawsheh

Hi,

> -Original Message-
> From: Slava Ovsiienko 
> Sent: Tuesday, June 22, 2021 7:41 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh ; Matan Azrad
> ; Ali Alnubani ; sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix multi-segment inline for the first segment
> 
> If the first segment in the multi-segment packet is short
> and below the inline threshold it should be inline into
> the WQE to improve the performance. For example, the T-Rex
> traffic generator might use small leading segments to
> handle packet headers and performance was affected.
> 
> Fixes: cacb44a09962 ("net/mlx5: add no-inline Tx flag")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [dpdk-dev] [PATCH v7 4/7] vhost: fix NUMA reallocation with multiqueue

2021-06-30 Thread Maxime Coquelin




On 6/30/21 9:24 AM, David Marchand wrote:
> On Tue, Jun 29, 2021 at 6:11 PM Maxime Coquelin
>  wrote:
>>
>> Since the Vhost-user device initialization has been reworked,
>> enabling the application to start using the device as soon as
>> the first queue pair is ready, NUMA reallocation no more
>> happened on queue pairs other than the first one since
>> numa_realloc() was returning early if the device was running.
>>
>> This patch fixes this issue by only preventing the device
>> metadata to be allocated if the device is running. For the
> 
> Hum, I understand the meaning, but I think we could make it easier to read:
> 
> This patch fixes this issue by reallocating the device metadata only
> if the device is not running.
> 
> WDYT?

It sounds better, I can be changed while applying if only issue.

>> virtqueues, a vring state change notification is sent to
>> notify the application of its disablement. Since the callback
>> is supposed to be blocking, it is safe to reallocate it
>> afterwards.
> 
>

Re: [dpdk-dev] [PATCH v7 7/7] vhost: convert inflight data to DPDK allocation API

2021-06-30 Thread David Marchand

On Tue, Jun 29, 2021 at 6:11 PM Maxime Coquelin
 wrote:
>
> Inflight metadata are allocated using glibc's calloc.
> This patch converts them to rte_zmalloc_socket to take
> care of the NUMA affinity.

About the title, maybe:
vhost: use DPDK allocations for inflight data

>
> Signed-off-by: Maxime Coquelin 

[snip]

> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index d8ec087dfc..67935c4ccc 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c

[snip]

> @@ -1779,19 +1820,21 @@ vhost_check_queue_inflights_split(struct virtio_net 
> *dev,
> vq->last_avail_idx += resubmit_num;
>
> if (resubmit_num) {
> -   resubmit  = calloc(1, sizeof(struct rte_vhost_resubmit_info));
> +   resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct 
> rte_vhost_resubmit_info),

Nit: double space.


> +   0, vq->numa_node);
> if (!resubmit) {
> VHOST_LOG_CONFIG(ERR,
> "failed to allocate memory for resubmit 
> info.\n");
> return RTE_VHOST_MSG_RESULT_ERR;
> }
>
> -   resubmit->resubmit_list = calloc(resubmit_num,
> -   sizeof(struct rte_vhost_resubmit_desc));
> +   resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
> +   resubmit_num * sizeof(struct 
> rte_vhost_resubmit_desc),
> +   0, vq->numa_node);
> if (!resubmit->resubmit_list) {
> VHOST_LOG_CONFIG(ERR,
> "failed to allocate memory for inflight 
> desc.\n");
> -   free(resubmit);
> +   rte_free(resubmit);
> return RTE_VHOST_MSG_RESULT_ERR;
> }
>
> @@ -1873,19 +1916,21 @@ vhost_check_queue_inflights_packed(struct virtio_net 
> *dev,
> }
>
> if (resubmit_num) {
> -   resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
> +   resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct 
> rte_vhost_resubmit_info),

Copy/paste detected :-)
Double space.

Having a single allocator between split and packed implems would avoid
this, but it might not be that easy and this is out of the scope for
this patch.



> +   0, vq->numa_node);
> if (resubmit == NULL) {
> VHOST_LOG_CONFIG(ERR,
> "failed to allocate memory for resubmit 
> info.\n");
> return RTE_VHOST_MSG_RESULT_ERR;
> }
>
> -   resubmit->resubmit_list = calloc(resubmit_num,
> -   sizeof(struct rte_vhost_resubmit_desc));
> +   resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
> +   resubmit_num * sizeof(struct 
> rte_vhost_resubmit_desc),
> +   0, vq->numa_node);
> if (resubmit->resubmit_list == NULL) {
> VHOST_LOG_CONFIG(ERR,
> "failed to allocate memory for resubmit 
> desc.\n");
> -   free(resubmit);
> +   rte_free(resubmit);
> return RTE_VHOST_MSG_RESULT_ERR;
> }
>
> --
> 2.31.1
>


-- 
David Marchand

Re: [dpdk-dev] [PATCH v7 0/7] vhost: Fix and improve NUMA reallocation

2021-06-30 Thread David Marchand

On Tue, Jun 29, 2021 at 6:11 PM Maxime Coquelin
 wrote:
>
> This patch series first fixes missing reallocations of some
> Virtqueue and device metadata.
>
> Then, it improves the numa_realloc function by using
> rte_realloc_socket API that takes care of the memcpy &
> freeing. The VQs NUMA IDs are also saved in the VQ metadata
> and used for every allocations so that all allocations
> before NUMA realloc are on the same VQ, later ones are
> allocated on the proper one.
>
> Finally inflight feature metada are converted from calloc()
> to rte_zmalloc_socket() and their reallocation is handled
> in numa_realloc().
>

Series lgtm with some little nits.
Thanks Maxime!


-- 
David Marchand

Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK

2021-06-30 Thread Lin, Xueqin

> -Original Message-
> From: Burakov, Anatoly 
> Sent: Monday, June 28, 2021 10:22 PM
> To: David Marchand ; Lin, Xueqin
> 
> Cc: Jerin Jacob ; Peng, ZhihongX
> ; Ananyev, Konstantin
> ; Stephen Hemminger
> ; dpdk-dev 
> Subject: Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK
> 
> On 18-Jun-21 10:04 AM, David Marchand wrote:
> > On Fri, Jun 18, 2021 at 9:49 AM Lin, Xueqin  wrote:
>  Suggest listing demo code and tool capture information for user to
>  try if
> >>> tool works, also add this part into doc.
> 
> >
> > # Also, Please update the release note for this feature.
>  Sure, we can update the release note if code merge.
> >>>
> >>> Probably you can send v1 version next i.e change the RFC status to
> >>> get merged.
> >>
> >> Sure, we will send v1 patch if no obvious objection for that, hope patch
> could receive some ACKs and could success to merge, thanks.
> >
> > How did you test this work?
> >
> > UNH recently started testing with ASAN and it reveals leaks just in
> > the unit test.
> >
> > Merging these annotations will help catch more issues.
> > But users will hit the current issues that we must fix first.
> >
> 
> As far as i can tell, the regular build is not affected by this patch, so no 
> issues
> will be hit until someone actually runs the test. IMO it's better to merge it
> early to catch more issues than to gate the feature on the condition that we
> fix all bugs unrelated to this feature first.

Thanks for review and good feedback. 
> 
> --
> Thanks,
> Anatoly

Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK

2021-06-30 Thread David Marchand

On Mon, Jun 28, 2021 at 4:22 PM Burakov, Anatoly
 wrote:
>
> On 18-Jun-21 10:04 AM, David Marchand wrote:
> > On Fri, Jun 18, 2021 at 9:49 AM Lin, Xueqin  wrote:
>  Suggest listing demo code and tool capture information for user to try if
> >>> tool works, also add this part into doc.
> 
> >
> > # Also, Please update the release note for this feature.
>  Sure, we can update the release note if code merge.
> >>>
> >>> Probably you can send v1 version next i.e change the RFC status to get
> >>> merged.
> >>
> >> Sure, we will send v1 patch if no obvious objection for that, hope patch 
> >> could receive some ACKs and could success to merge, thanks.
> >
> > How did you test this work?
> >
> > UNH recently started testing with ASAN and it reveals leaks just in
> > the unit test.
> >
> > Merging these annotations will help catch more issues.
> > But users will hit the current issues that we must fix first.
> >
>
> As far as i can tell, the regular build is not affected by this patch,
> so no issues will be hit until someone actually runs the test. IMO it's
> better to merge it early to catch more issues than to gate the feature
> on the condition that we fix all bugs unrelated to this feature first.

- This is affecting more than unit tests.

$ meson setup build-asan -Db_lundef=false -Db_sanitize=address
...

$ ninja-build -C build-asan
ninja: Entering directory `build-asan'
[2801/2801] Linking target app/test/dpdk-test

$ ./devtools/test-null.sh build-asan
EAL: Detected 28 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: WARNING! Base virtual address hint (0x15000 !=
0x7fb31c632000) not respected!
EAL:This may cause issues with mapping memory into secondary processes
EAL: Multi-process socket /run/user/1001/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: WARNING! Base virtual address hint (0x1b000 !=
0x7fb31c3b2000) not respected!
EAL:This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x100011000 !=
0x7fb31c375000) not respected!
EAL:This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x100017000 !=
0x7fb319bfe000) not respected!
EAL:This may cause issues with mapping memory into secondary processes
Interactive-mode selected
Auto-start selected
[...]
Bye...
EAL: recvmsg failed, Bad file descriptor
EAL: recvmsg failed, Bad file descriptor
EAL: recvmsg failed, Bad file descriptor
EAL: recvmsg failed, Bad file descriptor
EAL: recvmsg failed, Bad file descriptor
EAL: recvmsg failed, Bad file descriptor

Infinite loop of those messages.
In the thread with Owen, we also noticed what looks like a deadlock
with multiprocess when ASAN is enabled.

- Adding a new feature on top of something that does not work yet
seems at best premature to me.
This patch does not seem that much tested, since those issues above
are fairly easy to catch.

Anyway, the memory allocator is your stuff, so your call.

Prefix for the title of such a patch should be mem:.

-- 
David Marchand

Re: [dpdk-dev] [PATCH] net/mlx5: fix TSO multi-segment inline length

2021-06-30 Thread Raslan Darawsheh

Hi,

> -Original Message-
> From: Slava Ovsiienko 
> Sent: Sunday, June 20, 2021 9:30 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh ; Matan Azrad
> ; sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix TSO multi-segment inline length
> 
> The inline data length for TSO ethernet segment should be calculated from
> the TSO header instead of the inline size configured by txq_inline_min
> devarg or reported by the NIC.
> It is imposed by the nature of TSO offload - inline header is being duplicated
> to every output TCP packet.
> 
> Fixes: cacb44a09962 ("net/mlx5: add no-inline Tx flag")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Re: [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc

2021-06-30 Thread Xia, Chenbo

Hi Maxime,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, June 29, 2021 10:39 PM
> To: Xia, Chenbo ; dev@dpdk.org;
> david.march...@redhat.com
> Subject: Re: [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
> 
> 
> 
> On 6/25/21 4:50 AM, Xia, Chenbo wrote:
> > Hi Maxime,
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Friday, June 18, 2021 10:04 PM
> >> To: dev@dpdk.org; david.march...@redhat.com; Xia, Chenbo
> 
> >> Cc: Maxime Coquelin 
> >> Subject: [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
> >>
> >> When the guest allocates virtqueues on a different NUMA node
> >> than the one the Vhost metadata are allocated, both the Vhost
> >> device struct and the virtqueues struct are reallocated.
> >>
> >> However, reallocating the log cache on the new NUMA node was
> >> not done. This patch fixes this by reallocating it if it has
> >> been allocated already, which means a live-migration is
> >> on-going.
> >>
> >> Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")
> >
> > This commit is of 21.05, although LTS maintainers don't maintain non-LTS
> stable
> > releases now, I guess it's still better to add 'cc stable tag' in case
> anyone
> > volunteers to do that?
> 
> 
> I don't think that's what we do usually.
> If someone wants to maintain v21.05 in the future, he can just look for
> the Fixes tag in the git history.
> 
> Thanks,
> Maxime

I asked Thomas and Ferruh this question to make sure we are all aligned. Seems
they think we'd better add it in this case. Thomas's two reasons:

- we don't know in advance whether a branch will be maintained
- it helps those maintaining a private stable branch

And my understanding is adding both fix tag and stable tag makes it clearer for
stable release maintainers (They can just ignore 'only fix tag' case). And 
anyway
they need to check the fix commit ID.

Anyway, I could add it with some small changes David asked for if you don’t plan
a new version. Do you?

Thanks,
Chenbo

> 
> > Thanks,
> > Chenbo
> >
> >>
> >> Signed-off-by: Maxime Coquelin 
> >> ---
> >>  lib/vhost/vhost_user.c | 10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> >> index 5fb055ea2e..82adf80fe5 100644
> >> --- a/lib/vhost/vhost_user.c
> >> +++ b/lib/vhost/vhost_user.c
> >> @@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
> >>vq->batch_copy_elems = new_batch_copy_elems;
> >>}
> >>
> >> +  if (vq->log_cache) {
> >> +  struct log_cache_entry *log_cache;
> >> +
> >> +  log_cache = rte_realloc_socket(vq->log_cache,
> >> +  sizeof(struct log_cache_entry) *
> >> VHOST_LOG_CACHE_NR,
> >> +  0, newnode);
> >> +  if (log_cache)
> >> +  vq->log_cache = log_cache;
> >> +  }
> >> +
> >>rte_free(old_vq);
> >>}
> >>
> >> --
> >> 2.31.1
> >

Re: [dpdk-dev] [PATCH v2 2/2] drivers: add octeontx crypto adapter data path

2021-06-30 Thread Akhil Goyal

> Added support for crypto adapter OP_FORWARD mode.
> 
> As OcteonTx CPT crypto completions could be out of order, each crypto op
> is enqueued to CPT, dequeued from CPT and enqueued to SSO one-by-one.
> 
> Signed-off-by: Shijith Thotton 
> ---
This patch shows a CI warning for FreeBSD, but was not able to locate the 
error/warning in the logs.
Can anybody confirm what is the issue?

http://mails.dpdk.org/archives/test-report/2021-June/200637.html

Regards,
Akhil

Re: [dpdk-dev] [PATCH] ethdev: add namespace

2021-06-30 Thread Ferruh Yigit

On 6/30/2021 7:29 AM, David Marchand wrote:
> Hello Ferruh,
> 
> On Tue, Jun 29, 2021 at 3:46 PM Ferruh Yigit  wrote:
>>
>> Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
>> way. The macros for backward compatibility can be removed in next LTS.
>>
>> Signed-off-by: Ferruh Yigit 
> 
> - I did not do a full check but I noticed that ETH_RSS compat macro at
> least is removed.
> Is this intentional?
> 

Yes, two groups of macros was remaining from 2013 for backward compatibility,
with this patch it would be two layer of redicrection, so I moved to old ones.
The only changes in the examples/app are because of these removed macros, since
rest are all backward compatible.

Removed ones:
/**
 * for rx mq mode backward compatible
 */
#define ETH_RSS   ETH_MQ_RX_RSS
#define VMDQ_DCB  ETH_MQ_RX_VMDQ_DCB
#define ETH_DCB_RXETH_MQ_RX_DCB

/**
 * for tx mq mode backward compatible
 */
#define ETH_DCB_NONEETH_MQ_TX_NONE
#define ETH_VMDQ_DCB_TX ETH_MQ_TX_VMDQ_DCB
#define ETH_DCB_TX  ETH_MQ_TX_DCB



> 
> - libabigail is not happy because of enum names changes.
> Example:
> 
>   [C] 'function int rte_eth_dev_configure(uint16_t, uint16_t,
> uint16_t, const rte_eth_conf*)' at rte_ethdev.c:1326:1 has some
> indirect sub-type changes:
> parameter 4 of type 'const rte_eth_conf*' has sub-type changes:
>   in pointed to type 'const rte_eth_conf':
> in unqualified underlying type 'struct rte_eth_conf' at
> rte_ethdev.h:1491:1:
>   type size hasn't changed
>   5 data member changes (1 filtered):
> type of 'rte_eth_rxmode rxmode' changed:
>   type size hasn't changed
>   1 data member change:
> type of 'rte_eth_rx_mq_mode mq_mode' changed:
>   type size hasn't changed
>   8 enumerator deletions:
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_NONE' value '0'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_RSS' value '1'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_DCB' value '2'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_DCB_RSS' value '3'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_VMDQ_ONLY' value '4'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_VMDQ_RSS' value '5'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_VMDQ_DCB' value '6'
> 'rte_eth_rx_mq_mode::ETH_MQ_RX_VMDQ_DCB_RSS' value '7'
>   8 enumerator insertions:
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_NONE' value '0'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_RSS' value '1'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_DCB' value '2'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_DCB_RSS' value '3'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_VMDQ_ONLY' value '4'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_VMDQ_RSS' value '5'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_VMDQ_DCB' value '6'
> 'rte_eth_rx_mq_mode::RTE_ETH_MQ_RX_VMDQ_DCB_RSS' value '7'
> [snip]
> 
> 
> I guess libabigail is lost because the symbol
> rte_eth_rx_mq_mode::ETH_MQ_RX_NONE simply disappeared (because we used
> a macro to wrap to the new name).

Yes.

> Maybe we could go the other way: leave the current enums defined as is
> and put in place wrappers for new names pointing as old names. 
> The rest of the code in DPDK would use the new names only.

It works to prevent libabigail warnings but I think it can be confusing for
users on figuring out which ones are the correct ones to use.

> This comment applies if we want to merge this change in 21.08 and/or
> we want to backport this change.
> 
> This won't be a problem if we merge this patch in 21.11.
> 

OK to have it on v21.11. In that case I assume all internal components also
needs to be updated to use new macros/enums during v21.11 release.

Let me send a deprecation notice for it.

Meanwhile, we can use this patch to discuss the prefix/namespace, if it should
be 'RTE_ETH_' or 'RTE_ETH_DEV_', or mix of both (as done now).

And if mixed prefix used, we can try to define when to have DEV_ part and when
not to have it.

> 
>> ---
>> We can get the update on v21.11 and remove backward compatibility macros
>> on v22.11.
> 
> 
>

[dpdk-dev] [PATCH] doc: announce common prefix for ethdev

2021-06-30 Thread Ferruh Yigit

Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums on
v21.11.
Backward compatibility macros will be added on v21.11 and they will be
removed on v22.11.

Signed-off-by: Ferruh Yigit 
---
Cc: Andrew Rybchenko 
Cc: Thomas Monjalon 
Cc: David Marchand 
Cc: Qi Z Zhang 
Cc: Raslan Darawsheh 
Cc: Ajit Khaparde 
Cc: Jerin Jacob Kollanukkaran 
---
 doc/guides/rel_notes/deprecation.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd723..ae79673e37e3 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -118,6 +118,11 @@ Deprecation Notices
   consistent with existing outer header checksum status flag naming, which
   should help in reducing confusion about its usage.
 
+* ethdev: Will add 'RTE_ETH_' prefix to all ethdev macros/enums in v21.11. 
Macros
+  will be added for backward compatibility. Backward compatibility macros will 
be
+  removed on v22.11. A few old backward compatibility macros from 2013 that 
does
+  not have proper prefix will be removed on v21.11.
+
 * i40e: As there are both i40evf and iavf pmd, the functions of them are
   duplicated. And now more and more advanced features are developed on iavf.
   To keep consistent with kernel driver's name
-- 
2.31.1

Re: [dpdk-dev] Question about 'rxm->hash.rss' and 'mb->hash.fdir'

2021-06-30 Thread Ferruh Yigit

On 6/30/2021 3:45 AM, Min Hu (Connor) wrote:
> Hi, all
> one question about 'rxm->hash.rss' and 'mb->hash.fdir'.
> 
> In Rx recv packets function,
> 'rxm->hash.rss' will report rss hash result from Rx desc.
> 'rxm->hash.fdir' will report filter identifier from Rx desc.
> 
> But function implementation differs from some PMDs. for example:
> i40e, MLX5 report the two at the same time if pkt_flags is set,like:
> **
>     if (pkt_flags & PKT_RX_RSS_HASH) {
>     rxm->hash.rss =
> rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
>     }
>     if (pkt_flags & PKT_RX_FDIR) {
>     mb->hash.fdir.hi =
>     rte_le_to_cpu_32(rxdp->wb.qword3.hi_dword.fd_id);
>     }
> 
> 
> While, ixgbe only report one of the two. like:
> **
>     if (likely(pkt_flags & PKT_RX_RSS_HASH))
>     mb->hash.rss = rte_le_to_cpu_32(
>     rxdp[j].wb.lower.hi_dword.rss);
>     else if (pkt_flags & PKT_RX_FDIR) {
>     mb->hash.fdir.hash = rte_le_to_cpu_16(
>     rxdp[j].wb.lower.hi_dword.csum_ip.csum) &
>     IXGBE_ATR_HASH_MASK;
>     mb->hash.fdir.id = rte_le_to_cpu_16(
>     rxdp[j].wb.lower.hi_dword.csum_ip.ip_id);
>     }
> 
> So, what is application scenario for 'rxm->hash.rss' and 'mb->hash.fdir',
> that is, why the two should be reported? How about reporting the two at the 
> same
> time?
> Thanks for  your reply.


Hi Connor,

mbuf->hash is union, so it is not possible to set both 'hash.rss' & 'hash.fdir'.

I assume for i40e & mlx5 case 'pkt_flags' indicate which one is valid and only
one is set in practice. Cc'ed driver mainteriners for more comment.

Re: [dpdk-dev] [PATCH v5 5/7] power: support callbacks for multiple Rx queues

2021-06-30 Thread David Hunt


Hi Anatoly,

On 29/6/2021 4:48 PM, Anatoly Burakov wrote:

Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
   polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
   added to the list of queues to poll, so that the callback is aware of
   other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
   shared between all queues polled on a particular lcore, and is only
   activated when all queues in the list were polled and were determined
   to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
   is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov 
---

Notes:
 v5:
 - Remove the "power save queue" API and replace it with mechanism 
suggested by
   Konstantin
 
 v3:

 - Move the list of supported NICs to NIC feature table
 
 v2:

 - Use a TAILQ for queues instead of a static array
 - Address feedback from Konstantin
 - Add additional checks for stopped queues

  doc/guides/nics/features.rst   |  10 +
  doc/guides/prog_guide/power_man.rst|  65 ++--
  doc/guides/rel_notes/release_21_08.rst |   3 +
  lib/power/rte_power_pmd_mgmt.c | 431 ++---
  4 files changed, 373 insertions(+), 136 deletions(-)



--snip--


  int
  rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
  {
-   struct pmd_queue_cfg *queue_cfg;
+   const union queue qdata = {.portid = port_id, .qid = queue_id};
+   struct pmd_core_cfg *lcore_cfg;
+   struct queue_list_entry *queue_cfg;
struct rte_eth_dev_info info;
rte_rx_callback_fn clb;
int ret;
@@ -202,9 +401,19 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
goto end;
}
  
-	queue_cfg = &port_cfg[port_id][queue_id];

+   lcore_cfg = &lcore_cfgs[lcore_id];
  
-	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {

+   /* check if other queues are stopped as well */
+   ret = cfg_queues_stopped(lcore_cfg);
+   if (ret != 1) {
+   /* error means invalid queue, 0 means queue wasn't stopped */
+   ret = ret < 0 ? -EINVAL : -EBUSY;
+   goto end;
+   }
+
+   /* if callback was already enabled, check current callback type */
+   if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED &&
+   lcore_cfg->cb_mode != mode) {
ret = -EINVAL;
goto end;
}
@@ -214,53 +423,20 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int 
lcore_id, uint16_t port_id,
  
  	switch (mode) {

case RTE_POWER_MGMT_TYPE_MONITOR:
-   {
-   struct rte_power_monitor_cond dummy;
-
-   /* check if rte_power_monitor is supported */
-   if (!global_data.intrinsics_support.power_monitor) {
-   RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not 
supported\n");
-   ret = -ENOTSUP;
+   /* check if we can add a new queue */
+   ret = check_monitor(lcore_cfg, &qdata);
+   if (ret < 0)
goto end;
-   }
  
-		/* check if the device supports the necessary PMD API */

-   if (rte_eth_get_monitor_addr(port_id, queue_id,
-   &dummy) == -ENOTSUP) {
-   RTE_LOG(DEBUG, POWER, "The device does not support 
rte_eth_get_monitor_addr\n");
-   ret = -ENOTSUP;
-   goto end;
-   }
clb = clb_umwait;
break;
-   }
case RTE_POWER_MGMT_TYPE_SCALE:
-   {
-   enum power_management_env env;
-   /* only PSTATE and ACPI modes are supported */
-   if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
-   !rte_power_check_env_supported(
-   PM_ENV_PSTATE_CPUFREQ)) {
-   RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are 
supported\n");
-   ret = -ENOTSUP;
+   /* check if we can add a ne

Re: [dpdk-dev] [PATCH v2] ifpga/base/meson: fix looking for librt

2021-06-30 Thread Xu, Rosen

CC Tianfei, who is maintainer.

> -Original Message-
> From: Hussin, Mohamad Noor Alim 
> Sent: Wednesday, June 30, 2021 17:26
> To: Xu, Rosen 
> Cc: dev@dpdk.org; Hussin, Mohamad Noor Alim
> ; Huang, Wei
> ; sta...@dpdk.org
> Subject: [PATCH v2] ifpga/base/meson: fix looking for librt
> 
> Finding with "librt" keyword would give the output with full path of librt 
> such
> as /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/librt.so
> instead of -lrt in libdpdk.pc pkg-config file.
> 
> Assume find_library() will prepend "lib", thus remove "lib" from "librt"
> keyword. The output will shows as -lrt.
> 
> This will cause an issue when compile DPDK app with static library as the
> path of librt has been hard-coded in the libdpdk.pc file.
> 
> Fixes: e41856b515ce ("raw/ifpga/base: enhance driver reliability in multi-
> process")
> Cc: wei.hu...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Mohamad Noor Alim Hussin
> 
> ---
>  drivers/raw/ifpga/base/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/raw/ifpga/base/meson.build
> b/drivers/raw/ifpga/base/meson.build
> index 8d27c6021..ce592a13a 100644
> --- a/drivers/raw/ifpga/base/meson.build
> +++ b/drivers/raw/ifpga/base/meson.build
> @@ -27,7 +27,7 @@ sources = [
> 
>  rtdep = dependency('librt', required: false)  if not rtdep.found()
> -rtdep = cc.find_library('librt', required: false)
> +rtdep = cc.find_library('rt', required: false)
>  endif
>  if not rtdep.found()
>  build = false
> --
> 2.17.1

Re: [dpdk-dev] [PATCH v5 6/7] power: support monitoring multiple Rx queues

2021-06-30 Thread Ananyev, Konstantin




> Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
> Rx queues while entering the energy efficient power state. The multi
> version will be used unconditionally if supported, and the UMWAIT one
> will only be used when multi-monitor is not supported by the hardware.
> 
> Signed-off-by: Anatoly Burakov 
> ---
> 
> Notes:
> v4:
> - Fix possible out of bounds access
> - Added missing index increment
> 
>  doc/guides/prog_guide/power_man.rst |  9 ++--
>  lib/power/rte_power_pmd_mgmt.c  | 81 -
>  2 files changed, 85 insertions(+), 5 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/power_man.rst 
> b/doc/guides/prog_guide/power_man.rst
> index ec04a72108..94353ca012 100644
> --- a/doc/guides/prog_guide/power_man.rst
> +++ b/doc/guides/prog_guide/power_man.rst
> @@ -221,13 +221,16 @@ power saving whenever empty poll count reaches a 
> certain number.
>  The "monitor" mode is only supported in the following configurations and 
> scenarios:
> 
>  * If ``rte_cpu_get_intrinsics_support()`` function indicates that
> +  ``rte_power_monitor_multi()`` function is supported by the platform, then
> +  monitoring multiple Ethernet Rx queues for traffic will be supported.
> +
> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that only
>``rte_power_monitor()`` is supported by the platform, then monitoring will 
> be
>limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
>monitored from a different lcore).
> 
> -* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
> -  ``rte_power_monitor()`` function is not supported, then monitor mode will 
> not
> -  be supported.
> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of 
> the
> +  two monitoring functions are supported, then monitor mode will not be 
> supported.
> 
>  * Not all Ethernet drivers support monitoring, even if the underlying
>platform may support the necessary CPU instructions. Please refer to
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index fccfd236c2..2056996b9c 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -124,6 +124,32 @@ queue_list_take(struct pmd_core_cfg *cfg, const union 
> queue *q)
>   return found;
>  }
> 
> +static inline int
> +get_monitor_addresses(struct pmd_core_cfg *cfg,
> + struct rte_power_monitor_cond *pmc, size_t len)
> +{
> + const struct queue_list_entry *qle;
> + size_t i = 0;
> + int ret;
> +
> + TAILQ_FOREACH(qle, &cfg->head, next) {
> + const union queue *q = &qle->queue;
> + struct rte_power_monitor_cond *cur;
> +
> + /* attempted out of bounds access */
> + if (i >= len) {
> + RTE_LOG(ERR, POWER, "Too many queues being 
> monitored\n");
> + return -1;
> + }
> +
> + cur = &pmc[i++];
> + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur);
> + if (ret < 0)
> + return ret;
> + }
> + return 0;
> +}
> +
>  static void
>  calc_tsc(void)
>  {
> @@ -190,6 +216,45 @@ lcore_can_sleep(struct pmd_core_cfg *cfg)
>   return true;
>  }
> 
> +static uint16_t
> +clb_multiwait(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused,
> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> + uint16_t max_pkts __rte_unused, void *arg)
> +{
> + const unsigned int lcore = rte_lcore_id();
> + struct queue_list_entry *queue_conf = arg;
> + struct pmd_core_cfg *lcore_conf;
> + const bool empty = nb_rx == 0;
> +
> + lcore_conf = &lcore_cfgs[lcore];
> +
> + /* early exit */
> + if (likely(!empty))
> + /* early exit */
> + queue_reset(lcore_conf, queue_conf);
> + else {
> + struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS];

As discussed, I still think it needs to be pmc[lcore_conf->n_queues];
Or if VLA is not an option - alloca(), or dynamic lcore_conf->pmc[], or...

> + int ret;
> +
> + /* can this queue sleep? */
> + if (!queue_can_sleep(lcore_conf, queue_conf))
> + return nb_rx;
> +
> + /* can this lcore sleep? */
> + if (!lcore_can_sleep(lcore_conf))
> + return nb_rx;
> +
> + /* gather all monitoring conditions */
> + ret = get_monitor_addresses(lcore_conf, pmc, RTE_DIM(pmc));
> + if (ret < 0)
> + return nb_rx;
> +
> + rte_power_monitor_multi(pmc, lcore_conf->n_queues, UINT64_MAX);
> + }
> +
> + return nb_rx;
> +}
> +
>  static uint16_t
>  clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts 
> __rte_unused,
>   uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *arg)
> @@ -341,14 +406,19 @@ static int
>  check_mon

Re: [dpdk-dev] 20.11.2 patches review and test

2021-06-30 Thread Jiang, YuX

All,
Testing with dpdk v20.11.2-rc2 from Intel looks good, no critical issue is 
found. All of them are known issues.
Below two issues has been fixed in 20.11.2-rc2:
  1) Fedora34 GCC11 and Clang12 build failed.
  2) dcf_lifecycle/handle_acl_filter_05: after reset port the mac changed.

# Basic Intel(R) NIC testing
*PF(i40e, ixgbe): test scenarios including rte_flow/TSO/Jumboframe/checksum 
offload/Tunnel, etc. Listed but not all.
- Below two known issues are found.
  1)https://bugs.dpdk.org/show_bug.cgi?id=687 : unit_tests_power/power_cpufreq: 
unit test failed. This issue is found in 21.05 and not fixed yet.
  2)ddp_gtp_qregion/fd_gtpu_ipv4_dstip: flow director does not work. This issue 
is found in 21.05, fixed in 21.08.
Fixed patch link: 
http://patches.dpdk.org/project/dpdk/patch/20210519032745.707639-1-stevex.y...@intel.com/
 
*VF(i40e,ixgbe): test scenarios including vf-rte_flow/TSO/Jumboframe/checksum 
offload/Tunnel, Listed but not all.
- No new issues are found.  
*PF/VF(ice): test scenarios including switch features/Flow Director/Advanced 
RSS/ACL/DCF/Flexible Descriptor and so on, Listed but not all.
- Below 3 known DPDK issues are found. 
  1)rxtx_offload/rxoffload_port: Pkt1 can't be distributed to the same queue. 
This issue is found in 21.05, fixed in 21.08
Fixed patch link: 
http://patches.dpdk.org/project/dpdk/patch/20210527064251.242076-1-dapengx...@intel.com/
 
  2)cvl_advanced_iavf_rss: change the SCTP port value, the hash value remains 
unchanged. This issue is found in 20.11-rc3, fixed in 21.02, but it’s belong to 
21.02 new feature, won’t backporting to LTS20.11.
  3)Can't create 512 acl rules after creating a full mask switch rule. This 
issue is also occurred in dpdk 20.11 and not fixed yet. 
* Build: cover the build test combination with latest GCC/Clang/ICC version and 
the popular OS revision such as Ubuntu20.04, CentOS8.3 and so on. Listed but 
not all.
- All passed.  
* Intel NIC single core/NIC performance: test scenarios including PF/VF single 
core performance test(AVX2+AVX512) test and so on. Listed but not all.
- All passed. No big data drop. 

# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as 
PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing, etc.. 
Listed but not all.
- One known issues as below:
> (1)The UDP fragmentation offload feature of Virtio-net device can’t be turned 
> on in the VM, kernel issue, bugzilla has been submited: 
> https://bugzilla.kernel.org/show_bug.cgi?id=207075, not fixed yet.
>  
* Cryptodev: 
- Function test: test scenarios including Cryptodev API testing/CompressDev 
ISA-L/QAT/ZLIB PMD Testing/FIPS, etc. Listed but not all.
  - All passed.
- Performance test: test scenarios including Thoughput Performance /Cryptodev 
Latency, etc. Listed but not all.
  - No big data drop.

Best regards,
Yu Jiang

> -Original Message-
> From: dev  On Behalf Of Xueming Li
> Sent: Sunday, June 27, 2021 7:28 AM
> To: sta...@dpdk.org
> Cc: dev@dpdk.org; Abhishek Marathe ;
> Akhil Goyal ; Ali Alnubani ;
> Walker, Benjamin ; David Christensen
> ; Govindharajan, Hariprasad
> ; Hemant Agrawal
> ; Stokes, Ian ; Jerin
> Jacob ; Mcnamara, John ;
> Ju-Hyoung Lee ; Kevin Traynor
> ; Luca Boccassi ; Pei Zhang
> ; Yu, PingX ; Xu, Qian Q
> ; Raslan Darawsheh ; Thomas
> Monjalon ; Peng, Yuan ;
> Chen, Zhaoyan ; xuemi...@nvidia.com
> Subject: [dpdk-dev] 20.11.2 patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for stable release 20.11.2.
> 
> The planned date for the final release is 6th July.
> 
> Please help with testing and validation of your use cases and report any
> issues/results with reply-all to this mail. For the final release the fixes 
> and
> reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2
> 
> These patches are located at branch 20.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Xueming Li 
> 
> ---
> Adam Dybkowski (3):
>   common/qat: increase IM buffer size for GEN3
>   compress/qat: enable compression on GEN3
>   crypto/qat: fix null authentication request
> 
> Ajit Khaparde (7):
>   net/bnxt: fix RSS context cleanup
>   net/bnxt: check kvargs parsing
>   net/bnxt: fix resource cleanup
>   doc: fix formatting in testpmd guide
>   net/bnxt: fix mismatched type comparison in MAC restore
>   net/bnxt: check PCI config read
>   net/bnxt: fix mismatched type comparison in Rx
> 
> Alvin Zhang (11):
>   net/ice: fix VLAN filter with PF
>   net/i40e: fix input set field mask
>   net/igc: fix Rx RSS hash offload capability
>   net/igc: fix Rx error counter for bad length
>   net/e1000: fix Rx error counter for bad length
>   net/e1000: fix max

Re: [dpdk-dev] [PATCH v5 5/7] power: support callbacks for multiple Rx queues

2021-06-30 Thread Ananyev, Konstantin




 
> Currently, there is a hard limitation on the PMD power management
> support that only allows it to support a single queue per lcore. This is
> not ideal as most DPDK use cases will poll multiple queues per core.
> 
> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> is very difficult to implement such support because callbacks are
> effectively stateless and have no visibility into what the other ethdev
> devices are doing. This places limitations on what we can do within the
> framework of Rx callbacks, but the basics of this implementation are as
> follows:
> 
> - Replace per-queue structures with per-lcore ones, so that any device
>   polled from the same lcore can share data
> - Any queue that is going to be polled from a specific lcore has to be
>   added to the list of queues to poll, so that the callback is aware of
>   other queues being polled by the same lcore
> - Both the empty poll counter and the actual power saving mechanism is
>   shared between all queues polled on a particular lcore, and is only
>   activated when all queues in the list were polled and were determined
>   to have no traffic.
> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>   is incapable of monitoring more than one address.
> 
> Also, while we're at it, update and improve the docs.
> 
> Signed-off-by: Anatoly Burakov 
> ---
> 
> Notes:
> v5:
> - Remove the "power save queue" API and replace it with mechanism 
> suggested by
>   Konstantin
> 
> v3:
> - Move the list of supported NICs to NIC feature table
> 
> v2:
> - Use a TAILQ for queues instead of a static array
> - Address feedback from Konstantin
> - Add additional checks for stopped queues
> 
>  doc/guides/nics/features.rst   |  10 +
>  doc/guides/prog_guide/power_man.rst|  65 ++--
>  doc/guides/rel_notes/release_21_08.rst |   3 +
>  lib/power/rte_power_pmd_mgmt.c | 431 ++---
>  4 files changed, 373 insertions(+), 136 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 403c2b03a3..a96e12d155 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
>  * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
>  * **[related] API**: ``rte_eth_rx_burst_mode_get()``, 
> ``rte_eth_tx_burst_mode_get()``.
> 
> +.. _nic_features_get_monitor_addr:
> +
> +PMD power management using monitor addresses
> +
> +
> +Supports getting a monitoring condition to use together with Ethernet PMD 
> power
> +management (see :doc:`../prog_guide/power_man` for more details).
> +
> +* **[implements] eth_dev_ops**: ``get_monitor_addr``
> +
>  .. _nic_features_other:
> 
>  Other dev ops not represented by a Feature
> diff --git a/doc/guides/prog_guide/power_man.rst 
> b/doc/guides/prog_guide/power_man.rst
> index c70ae128ac..ec04a72108 100644
> --- a/doc/guides/prog_guide/power_man.rst
> +++ b/doc/guides/prog_guide/power_man.rst
> @@ -198,34 +198,41 @@ Ethernet PMD Power Management API
>  Abstract
>  
> 
> -Existing power management mechanisms require developers
> -to change application design or change code to make use of it.
> -The PMD power management API provides a convenient alternative
> -by utilizing Ethernet PMD RX callbacks,
> -and triggering power saving whenever empty poll count reaches a certain 
> number.
> -
> -Monitor
> -   This power saving scheme will put the CPU into optimized power state
> -   and use the ``rte_power_monitor()`` function
> -   to monitor the Ethernet PMD RX descriptor address,
> -   and wake the CPU up whenever there's new traffic.
> -
> -Pause
> -   This power saving scheme will avoid busy polling
> -   by either entering power-optimized sleep state
> -   with ``rte_power_pause()`` function,
> -   or, if it's not available, use ``rte_pause()``.
> -
> -Frequency scaling
> -   This power saving scheme will use ``librte_power`` library
> -   functionality to scale the core frequency up/down
> -   depending on traffic volume.
> -
> -.. note::
> -
> -   Currently, this power management API is limited to mandatory mapping
> -   of 1 queue to 1 core (multiple queues are supported,
> -   but they must be polled from different cores).
> +Existing power management mechanisms require developers to change application
> +design or change code to make use of it. The PMD power management API 
> provides a
> +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering
> +power saving whenever empty poll count reaches a certain number.
> +
> +* Monitor
> +   This power saving scheme will put the CPU into optimized power state and
> +   monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever
> +   there's new traffic. Support for this scheme may not be available on all
> +   platforms, and furthe

[dpdk-dev] [PATCH v2] net: prepare the outer ipv4 hdr for checksum

2021-06-30 Thread Mohsin Kazmi

Preparation the headers for the hardware offload
misses the outer ipv4 checksum offload.
It results in bad checksum computed by hardware NIC.

This patch fixes the issue by setting the outer ipv4
checksum field to 0.

Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
Cc: sta...@dpdk.org

Signed-off-by: Mohsin Kazmi 
Acked-by: Qi Zhang 
---

v2:
* Update the commit message with Fixes.
---
 lib/net/rte_net.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
index 434435ffa2..e47365099e 100644
--- a/lib/net/rte_net.h
+++ b/lib/net/rte_net.h
@@ -128,8 +128,18 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, 
uint64_t ol_flags)
if (!(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK | PKT_TX_TCP_SEG)))
return 0;
 
-   if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6))
+   if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6)) {
inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+   /*
+* prepare outer ipv4 header checksum by setting it to 0,
+* in order to be computed by hardware NICs.
+*/
+   if (ol_flags & PKT_TX_OUTER_IP_CKSUM) {
+   ipv4_hdr = rte_pktmbuf_mtod_offset(m,
+   struct rte_ipv4_hdr *, m->outer_l2_len);
+   ipv4_hdr->hdr_checksum = 0;
+   }
+   }
 
/*
 * Check if headers are fragmented.
-- 
2.17.1

[dpdk-dev] [RFC] lib/security: add SA config option for inner pkt csum

2021-06-30 Thread Archana Muniganti

Add inner packet IPv4 hdr and L4 checksum enable options
in conf. These will be used in case of protocol offload.
Per SA, application could specify whether the
checksum(compute/verify) can be offloaded to security device.

Signed-off-by: Archana Muniganti 
---
 lib/cryptodev/rte_crypto.h|  9 +
 lib/cryptodev/rte_cryptodev.h |  2 ++
 lib/security/rte_security.h   | 17 +
 3 files changed, 28 insertions(+)

diff --git a/lib/cryptodev/rte_crypto.h b/lib/cryptodev/rte_crypto.h
index fd5ef3a876..3510ed109f 100644
--- a/lib/cryptodev/rte_crypto.h
+++ b/lib/cryptodev/rte_crypto.h
@@ -52,6 +52,15 @@ enum rte_crypto_op_status {
/**< Operation failed due to invalid arguments in request */
RTE_CRYPTO_OP_STATUS_ERROR,
/**< Error handling operation */
+   RTE_CRYPTO_OP_STATUS_WAR = 128,
+   /**<
+* Operation completed successfully with warnings.
+* Note: All the warnings starts from here.
+*/
+   RTE_CRYPTO_OP_STATUS_WAR_L3_CSUM_BAD,
+   /**< Operation completed successfully with invalid L3 checksum */
+   RTE_CRYPTO_OP_STATUS_WAR_L4_CSUM_BAD,
+   /**< Operation completed successfully with invalid L4 checksum */
 };
 
 /**
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index 11f4e6fdbf..6a6a2d0537 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -479,6 +479,8 @@ rte_cryptodev_asym_get_xform_enum(enum 
rte_crypto_asym_xform_type *xform_enum,
 /**< Support operations on multiple data-units message */
 #define RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY(1ULL << 26)
 /**< Support wrapped key in cipher xform  */
+#define RTE_CRYPTODEV_FF_SECURITY_INNER_CSUM   (1ULL << 27)
+/**< Support inner checksum computation/verification */
 
 /**
  * Get the name of a crypto device feature flag
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 88d31de0a6..2fdefab878 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -181,6 +181,23 @@ struct rte_security_ipsec_sa_options {
 * * 0: Disable per session security statistics collection for this SA.
 */
uint32_t stats : 1;
+
+   /** Compute/verify inner packet IPv4 header checksum
+*
+* * 1: In tunnel mode, compute inner packet IPv4 header checksum
+*  before tunnel encapsulation, or verify after tunnel
+*  decapsulation.
+* * 0: Inner packet IP header checksum is not computed/verified.
+*/
+   uint32_t ip_csum_enable : 1;
+
+   /** Compute/verify inner packet L4 checksum
+*
+* * 1: In tunnel mode, compute inner packet L4 checksum before
+*  tunnel encapsulation, or verify after tunnel decapsulation.
+* * 0: Inner packet L4 checksum is not computed/verified.
+*/
+   uint32_t l4_csum_enable : 1;
 };
 
 /** IPSec security association direction */
-- 
2.22.0

Re: [dpdk-dev] [PATCH v7 7/7] vhost: convert inflight data to DPDK allocation API

2021-06-30 Thread Maxime Coquelin




On 6/30/21 9:55 AM, David Marchand wrote:
> On Tue, Jun 29, 2021 at 6:11 PM Maxime Coquelin
>  wrote:
>>
>> Inflight metadata are allocated using glibc's calloc.
>> This patch converts them to rte_zmalloc_socket to take
>> care of the NUMA affinity.
> 
> About the title, maybe:
> vhost: use DPDK allocations for inflight data

Agree with your proposal.

>>
>> Signed-off-by: Maxime Coquelin 
> 
> [snip]
> 
>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
>> index d8ec087dfc..67935c4ccc 100644
>> --- a/lib/vhost/vhost_user.c
>> +++ b/lib/vhost/vhost_user.c
> 
> [snip]
> 
>> @@ -1779,19 +1820,21 @@ vhost_check_queue_inflights_split(struct virtio_net 
>> *dev,
>> vq->last_avail_idx += resubmit_num;
>>
>> if (resubmit_num) {
>> -   resubmit  = calloc(1, sizeof(struct 
>> rte_vhost_resubmit_info));
>> +   resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct 
>> rte_vhost_resubmit_info),
> 
> Nit: double space.
> 

Can be fixed while applying.

>> +   0, vq->numa_node);
>> if (!resubmit) {
>> VHOST_LOG_CONFIG(ERR,
>> "failed to allocate memory for resubmit 
>> info.\n");
>> return RTE_VHOST_MSG_RESULT_ERR;
>> }
>>
>> -   resubmit->resubmit_list = calloc(resubmit_num,
>> -   sizeof(struct rte_vhost_resubmit_desc));
>> +   resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
>> +   resubmit_num * sizeof(struct 
>> rte_vhost_resubmit_desc),
>> +   0, vq->numa_node);
>> if (!resubmit->resubmit_list) {
>> VHOST_LOG_CONFIG(ERR,
>> "failed to allocate memory for inflight 
>> desc.\n");
>> -   free(resubmit);
>> +   rte_free(resubmit);
>> return RTE_VHOST_MSG_RESULT_ERR;
>> }
>>
>> @@ -1873,19 +1916,21 @@ vhost_check_queue_inflights_packed(struct virtio_net 
>> *dev,
>> }
>>
>> if (resubmit_num) {
>> -   resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
>> +   resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct 
>> rte_vhost_resubmit_info),
> 
> Copy/paste detected :-)
> Double space.

Indeed!

> Having a single allocator between split and packed implems would avoid
> this, but it might not be that easy and this is out of the scope for
> this patch.
> 

Agree, I think the inflight code may be simpler.
We can think of a refactoring for v21.11.

Thanks,
Maxime

> 
>> +   0, vq->numa_node);
>> if (resubmit == NULL) {
>> VHOST_LOG_CONFIG(ERR,
>> "failed to allocate memory for resubmit 
>> info.\n");
>> return RTE_VHOST_MSG_RESULT_ERR;
>> }
>>
>> -   resubmit->resubmit_list = calloc(resubmit_num,
>> -   sizeof(struct rte_vhost_resubmit_desc));
>> +   resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
>> +   resubmit_num * sizeof(struct 
>> rte_vhost_resubmit_desc),
>> +   0, vq->numa_node);
>> if (resubmit->resubmit_list == NULL) {
>> VHOST_LOG_CONFIG(ERR,
>> "failed to allocate memory for resubmit 
>> desc.\n");
>> -   free(resubmit);
>> +   rte_free(resubmit);
>> return RTE_VHOST_MSG_RESULT_ERR;
>> }
>>
>> --
>> 2.31.1
>>
> 
>

[dpdk-dev] [PATCH] doc: announce SA config option struct changes

2021-06-30 Thread Archana Muniganti

Proposing following two new fields for IPsec inner checksum
configuration in the structure ``rte_security_ipsec_sa_options``.
uint32_t ip_csum_enable : 1;
uint32_t l4_csum_enable : 1;

With these config options, per SA, application can specify if
the inner checksum(compute/verify) to be offloaded to the
security device.

https://mails.dpdk.org/archives/dev/2021-June/212977.html

Signed-off-by: Archana Muniganti 
---
 doc/guides/rel_notes/deprecation.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..da65ae68be 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -141,6 +141,10 @@ Deprecation Notices
   in "rte_sched.h". These changes are aligned to improvements suggested in the
   RFC https://mails.dpdk.org/archives/dev/2018-November/120035.html.
 
+* security: The IPsec SA config options structure ``struct 
rte_security_ipsec_sa_options``
+  will be updated with two new fields to support IPsec inner checksum in case
+  of protocol offload.
+
 * metrics: The function ``rte_metrics_init`` will have a non-void return
   in order to notify errors instead of calling ``rte_exit``.
 
-- 
2.22.0

Re: [dpdk-dev] Question about 'rxm->hash.rss' and 'mb->hash.fdir'

2021-06-30 Thread Min Hu (Connor)


Hi, Beilei, Matan, Shahaf, Viacheslav,

how about your opinion?

在 2021/6/30 17:34, Ferruh Yigit 写道:

On 6/30/2021 3:45 AM, Min Hu (Connor) wrote:

Hi, all
 one question about 'rxm->hash.rss' and 'mb->hash.fdir'.

 In Rx recv packets function,
 'rxm->hash.rss' will report rss hash result from Rx desc.
 'rxm->hash.fdir' will report filter identifier from Rx desc.

 But function implementation differs from some PMDs. for example:
 i40e, MLX5 report the two at the same time if pkt_flags is set,like:
**
     if (pkt_flags & PKT_RX_RSS_HASH) {
     rxm->hash.rss =
rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
     }
     if (pkt_flags & PKT_RX_FDIR) {
     mb->hash.fdir.hi =
     rte_le_to_cpu_32(rxdp->wb.qword3.hi_dword.fd_id);
     }


 While, ixgbe only report one of the two. like:
**
     if (likely(pkt_flags & PKT_RX_RSS_HASH))
     mb->hash.rss = rte_le_to_cpu_32(
     rxdp[j].wb.lower.hi_dword.rss);
     else if (pkt_flags & PKT_RX_FDIR) {
     mb->hash.fdir.hash = rte_le_to_cpu_16(
     rxdp[j].wb.lower.hi_dword.csum_ip.csum) &
     IXGBE_ATR_HASH_MASK;
     mb->hash.fdir.id = rte_le_to_cpu_16(
     rxdp[j].wb.lower.hi_dword.csum_ip.ip_id);
     }

 So, what is application scenario for 'rxm->hash.rss' and 'mb->hash.fdir',
that is, why the two should be reported? How about reporting the two at the same
time?
 Thanks for  your reply.



Hi Connor,

mbuf->hash is union, so it is not possible to set both 'hash.rss' & 'hash.fdir'.

I assume for i40e & mlx5 case 'pkt_flags' indicate which one is valid and only
one is set in practice. Cc'ed driver mainteriners for more comment.


Thanks Ferruh,
	another question, why does user need this information:  rxm->hash.rss 
or mb->hash.fdir.hi ? what is the function?



.

Re: [dpdk-dev] Question about 'rxm->hash.rss' and 'mb->hash.fdir'

2021-06-30 Thread Slava Ovsiienko

Hi,

> -Original Message-
> From: Min Hu (Connor) 
> Sent: Wednesday, June 30, 2021 14:22
> To: Ferruh Yigit ; dev@dpdk.org; NBU-Contact-Thomas
> Monjalon ; Andrew Rybchenko
> 
> Cc: Beilei Xing ; Matan Azrad ;
> Shahaf Shuler ; Slava Ovsiienko
> 
> Subject: Re: Question about 'rxm->hash.rss' and 'mb->hash.fdir'
> 
> Hi, Beilei, Matan, Shahaf, Viacheslav,
> 
>   how about your opinion?
> 
> 在 2021/6/30 17:34, Ferruh Yigit 写道:
> > On 6/30/2021 3:45 AM, Min Hu (Connor) wrote:
> >> Hi, all
> >>  one question about 'rxm->hash.rss' and 'mb->hash.fdir'.
> >>
> >>  In Rx recv packets function,
> >>  'rxm->hash.rss' will report rss hash result from Rx desc.
> >>  'rxm->hash.fdir' will report filter identifier from Rx desc.
> >>
> >>  But function implementation differs from some PMDs. for example:
> >>  i40e, MLX5 report the two at the same time if pkt_flags is set,like:
> >> **
> >>      if (pkt_flags & PKT_RX_RSS_HASH) {
> >>      rxm->hash.rss =
> >> rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
> >>      }
> >>      if (pkt_flags & PKT_RX_FDIR) {
> >>      mb->hash.fdir.hi =
> >>      rte_le_to_cpu_32(rxdp->wb.qword3.hi_dword.fd_id);
> >>      }
> >> 
> >>
> >>  While, ixgbe only report one of the two. like:
> >> **
> >>      if (likely(pkt_flags & PKT_RX_RSS_HASH))
> >>      mb->hash.rss = rte_le_to_cpu_32(
> >>      rxdp[j].wb.lower.hi_dword.rss);
> >>      else if (pkt_flags & PKT_RX_FDIR) {
> >>      mb->hash.fdir.hash = rte_le_to_cpu_16(
> >>      rxdp[j].wb.lower.hi_dword.csum_ip.csum) &
> >>      IXGBE_ATR_HASH_MASK;
> >>      mb->hash.fdir.id = rte_le_to_cpu_16(
> >>      rxdp[j].wb.lower.hi_dword.csum_ip.ip_id);
> >>      }
> >> 
> >>  So, what is application scenario for 'rxm->hash.rss' and
> >> 'mb->hash.fdir', that is, why the two should be reported? How about
> >> reporting the two at the same time?
> >>  Thanks for  your reply.
> >
> >
> > Hi Connor,
> >
> > mbuf->hash is union, so it is not possible to set both 'hash.rss' & 
> > 'hash.fdir'.

hash.rss is uint32_t and shares the memory with hash.dir.lo.
hash.dir.hi is untouched by access to hash.rss.
Hence, IIUC, we can provide both valid hash.rss and hash.fdir.hi at the same 
time.

At least mlx5 provides both (at least if CQE compression option allows it).
RSS hash is provided in the hash.rss, and MARK RTE Flow action result is
reported in hash.fdir.hi in independent way.

> >
> > I assume for i40e & mlx5 case 'pkt_flags' indicate which one is valid
> > and only one is set in practice. Cc'ed driver mainteriners for more comment.
> 
> Thanks Ferruh,
>   another question, why does user need this information:  rxm->hash.rss
> or mb->hash.fdir.hi ? what is the function?

IIRC, hash.rss is the lower bits if calculated hash function result over the 
packet.
hash.fdir.hi is the result of MARK RTE Flow action (at least for mlx5).

With best regards,
Slava

Re: [dpdk-dev] [PATCH v7 0/7] vhost: Fix and improve NUMA reallocation

2021-06-30 Thread Xia, Chenbo

> -Original Message-
> From: Maxime Coquelin 
> Sent: Wednesday, June 30, 2021 12:11 AM
> To: dev@dpdk.org; Xia, Chenbo ;
> david.march...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [PATCH v7 0/7] vhost: Fix and improve NUMA reallocation
> 
> This patch series first fixes missing reallocations of some
> Virtqueue and device metadata.
> 
> Then, it improves the numa_realloc function by using
> rte_realloc_socket API that takes care of the memcpy &
> freeing. The VQs NUMA IDs are also saved in the VQ metadata
> and used for every allocations so that all allocations
> before NUMA realloc are on the same VQ, later ones are
> allocated on the proper one.
> 
> Finally inflight feature metada are converted from calloc()
> to rte_zmalloc_socket() and their reallocation is handled
> in numa_realloc().
> --
> 2.31.1

Series applied to next-virtio/main with all things fixed.

Thanks!

Re: [dpdk-dev] [dpdk-stable] [PATCH] vdpa/mlx5: fix TSO offload without CSUM

2021-06-30 Thread Xia, Chenbo

> -Original Message-
> From: stable  On Behalf Of Xueming Li
> Sent: Sunday, June 13, 2021 8:52 PM
> Cc: dev@dpdk.org; xuemi...@nvidia.com; ma...@nvidia.com; sta...@dpdk.org;
> Viacheslav Ovsiienko ; Maxime Coquelin
> 
> Subject: [dpdk-stable] [PATCH] vdpa/mlx5: fix TSO offload without CSUM
> 
> Packet was corrupted when TSO requested without CSUM update.
> 
> Enables CSUM automatically if only TSO requested.
> 
> Fixes: 2aa8444b0084 ("vdpa/mlx5: support stateless offloads")
> Cc: ma...@nvidia.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xueming Li 
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> index 024c5c4180..f530646058 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> @@ -442,6 +442,13 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
>   DRV_LOG(ERR, "Failed to configure negotiated features.");
>   return -1;
>   }
> + if ((priv->features & (1ULL << VIRTIO_NET_F_CSUM)) == 0 &&
> + ((priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4)) > 0 ||
> +  (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6)) > 0)) {
> + /* Packet may be corrupted if TSO is enabled without CSUM. */
> + DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
> + priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
> + }
>   if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
>   DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
>   (int)priv->caps.max_num_virtio_queues * 2,
> --
> 2.25.1

Applied to next-virtio/main, thanks!

Re: [dpdk-dev] [EXT] [dpdk-dev v2] crypto/snow3g: add support for digest appended ops

2021-06-30 Thread Zhang, Roy Fan

Hi Akhil,

This is a required feature from our customer. So could you help us merging it? 

And we do not plan to remove this PMD - just put it to a new folder with common 
code shared for all SW Intel-ipsec-mb based PMDs.
>From the user point of view everything will be the same including the EAL 
>commands and the ways of accessing the driver.
And sure the changes in this patch will be moved to new place then.

Regards,
Fan

> -Original Message-
> From: Akhil Goyal 
> Sent: Tuesday, June 29, 2021 9:15 PM
> To: Ji, Kai ; dev@dpdk.org
> Cc: Zhang, Roy Fan ; De Lara Guarch, Pablo
> ; Damian Nowak
> 
> Subject: RE: [EXT] [dpdk-dev] [dpdk-dev v2] crypto/snow3g: add support for
> digest appended ops
> 
> > This patch enable out-of-place auth-cipher operations where
> > digest should be encrypted among with the rest of raw data.
> > It also adds support for partially encrypted digest when using
> > auth-cipher operations.
> >
> > Fixes: 7c87e2d7b359 ("crypto/snow3g: use IPsec library")
> > Cc: pablo.de.lara.gua...@intel.com
> >
> This patch is a feature addition and not a fix. So no need for this fixes tag.
> 
> > Signed-off-by: Damian Nowak 
> > Signed-off-by: Kai Ji 
> >
> Is this patch really required now, as I see that you plan to remove this PMD?

Re: [dpdk-dev] [PATCH v1 2/2] linux/kni: Added support for KNI multiple fifos

2021-06-30 Thread Ferruh Yigit

On 12/10/2020 11:15 PM, dheemanth wrote:
> In order to improve performance, the KNI is made to
> support multiple fifos, So that multiple threads pinned
> to multiple cores can process packets in parallel.
> 

Hi Dheemanth,

As far as I know, in KNI the bottle neck is in the kernel thread. In this patch
FIFO between userspace and kernelspace converted into multiple FIFOs but in
kernel side still same thread process all FIFOs, so only userspace can scale to
more cores, I wonder how this imporves the performance, can you please share use
case and some numbers?

Also FIFOs seems converted from simple single producer, single consumer to multi
producer and multi consumer. What is the performance impact of this? And why
this is needed? In the dpdk application, is there N-N relation between cores and
fifos, again can you please clarifiy your usecase?

In the kernel to userspace transfer, packets distributed to multiple FIFOs based
on packet hash, this should be additional load to the kernel thread.

The sample application and unit test (also kni pmd) is not using this new
feature but they only use single fifo. They also should be updated to use this
feature, that helps as sample and helps to demonstrade the usecase.

Btw, can you please clarify why 'queues_num' is used? Is it expected to be same
with 'fifos_num'?

Also documentation needs to be updated, but before more change I think the
benefit of the work needs to be clarified to decide to proceed or not with the 
set.

> Signed-off-by: dheemanth 
> ---
>  app/test/test_kni.c |   4 +-
>  drivers/net/kni/rte_eth_kni.c   |   5 +-
>  examples/kni/main.c |   4 +-
>  kernel/linux/kni/kni_dev.h  |  11 +-
>  kernel/linux/kni/kni_fifo.h | 190 ++-
>  kernel/linux/kni/kni_misc.c | 189 +--
>  kernel/linux/kni/kni_net.c  |  88 ++--
>  lib/librte_kni/rte_kni.c| 216 
> ++--
>  lib/librte_kni/rte_kni.h|  11 +-
>  lib/librte_kni/rte_kni_common.h |  10 +-
>  lib/librte_port/rte_port_kni.c  |  12 +--
>  11 files changed, 514 insertions(+), 226 deletions(-)
> 

<...>

> @@ -292,51 +292,69 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
>  {
>   struct kni_net *knet = net_generic(net, kni_net_id);
>   int ret;
> - struct rte_kni_device_info dev_info;
> + unsigned int i, tx_queues_num;
> + struct rte_kni_device_info *dev_info;
>   struct net_device *net_dev = NULL;
>   struct kni_dev *kni, *dev, *n;
>  
>   pr_info("Creating kni...\n");
> +
> + /* allocate dev_info from stack to avoid Wframe-larger-than=1024
> +  * compile error.
> +  */

s/stack/heap

> + dev_info = kzalloc(sizeof(struct rte_kni_device_info), GFP_KERNEL);
> + if (!dev_info)
> + return -ENOMEM;
> +

<...>

[dpdk-dev] [PATCH v2 00/22] net/mlx5: insertion rate optimization

2021-06-30 Thread Suanming Mou

This patch series optimize the flow insertion rate with adding
local cache to index pool and list.

For object which wants efficient index allocate and free, local
cache will be very helpful.

For index pool, two level cache is added, one as local and another
as global. The global cache is able to save all the allocated
index. That means all the allocated index will not be freed. Once
the local cache is full, the extra index will be flushed to the
global cache. Once local cache is empty, first try to fetch more
index from global, if global is still empty, allocate new trunk
and more index.

For list, sub local core list is introduced. The allocated objects
will be added and released only from local list without any locks.
Only the objects need to be shared will be synced from global list.

---

v2: add the list per-lcore cache optimization

---

Matan Azrad (9):
  net/mlx5: optimize modify header action memory
  net/mlx5: remove cache term from the list utility
  net/mlx5: add per lcore cache to the list utility
  net/mlx5: minimize list critical sections
  net/mlx5: manage list cache entries release
  net/mlx5: relax the list utility atomic operations
  net/mlx5: allocate list memory by the create API
  common/mlx5: add per-lcore cache to hash list utility
  net/mlx5: move modify header allocator to ipool

Suanming Mou (13):
  net/mlx5: allow limiting the index pool maximum index
  net/mlx5: add indexed pool local cache
  net/mlx5: add index pool foreach define
  net/mlx5: replace flow list with index pool
  net/mlx5: adjust the hash bucket size
  common/mlx5: allocate cache list memory individually
  net/mlx5: enable index pool per-core cache
  net/mlx5: optimize hash list table allocate on demand
  common/mlx5: optimize cache list object memory
  net/mlx5: change memory release configuration
  net/mlx5: support index pool none local core operations
  net/mlx5: support list none local core operations
  net/mlx5: optimize Rx queue match

 doc/guides/nics/mlx5.rst|5 +
 doc/guides/rel_notes/release_21_08.rst  |6 +
 drivers/common/mlx5/linux/mlx5_glue.h   |1 +
 drivers/common/mlx5/mlx5_common.h   |2 +
 drivers/common/mlx5/mlx5_common_utils.c |  569 ---
 drivers/common/mlx5/mlx5_common_utils.h |  283 --
 drivers/net/mlx5/linux/mlx5_flow_os.h   |3 +-
 drivers/net/mlx5/linux/mlx5_os.c|  209 ++--
 drivers/net/mlx5/mlx5.c |   34 +-
 drivers/net/mlx5/mlx5.h |   46 +-
 drivers/net/mlx5/mlx5_defs.h|   12 +-
 drivers/net/mlx5/mlx5_flow.c|  305 +++---
 drivers/net/mlx5/mlx5_flow.h|  210 ++--
 drivers/net/mlx5/mlx5_flow_dv.c | 1203 +++
 drivers/net/mlx5/mlx5_rx.h  |   14 +-
 drivers/net/mlx5/mlx5_rxq.c |  136 ++-
 drivers/net/mlx5/mlx5_trigger.c |8 +-
 drivers/net/mlx5/mlx5_utils.c   |  617 
 drivers/net/mlx5/mlx5_utils.h   |  255 ++---
 drivers/net/mlx5/windows/mlx5_os.c  |   11 +-
 20 files changed, 2518 insertions(+), 1411 deletions(-)

-- 
2.25.1

[dpdk-dev] [PATCH v2 01/22] net/mlx5: allow limiting the index pool maximum index

2021-06-30 Thread Suanming Mou

Some ipool instances in the driver are used as ID\index allocator and
added other logic in order to work with limited index values.

Add a new configuration for ipool specify the maximum index value.
The ipool will ensure that no index bigger than the maximum value is
provided.

Use this configuration in ID allocator cases instead of the current
logics. This patch add the maximum ID configurable for the index pool.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_utils.c | 14 --
 drivers/net/mlx5/mlx5_utils.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index 18fe23e4fb..bf2b2ebc72 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -270,6 +270,9 @@ mlx5_ipool_create(struct mlx5_indexed_pool_config *cfg)
if (i > 0)
pool->grow_tbl[i] += pool->grow_tbl[i - 1];
}
+   if (!pool->cfg.max_idx)
+   pool->cfg.max_idx =
+   mlx5_trunk_idx_offset_get(pool, TRUNK_MAX_IDX + 1);
return pool;
 }
 
@@ -282,9 +285,11 @@ mlx5_ipool_grow(struct mlx5_indexed_pool *pool)
size_t trunk_size = 0;
size_t data_size;
size_t bmp_size;
-   uint32_t idx;
+   uint32_t idx, cur_max_idx, i;
 
-   if (pool->n_trunk_valid == TRUNK_MAX_IDX)
+   cur_max_idx = mlx5_trunk_idx_offset_get(pool, pool->n_trunk_valid);
+   if (pool->n_trunk_valid == TRUNK_MAX_IDX ||
+   cur_max_idx >= pool->cfg.max_idx)
return -ENOMEM;
if (pool->n_trunk_valid == pool->n_trunk) {
/* No free trunk flags, expand trunk list. */
@@ -336,6 +341,11 @@ mlx5_ipool_grow(struct mlx5_indexed_pool *pool)
trunk->bmp = rte_bitmap_init_with_all_set(data_size, &trunk->data
 [RTE_CACHE_LINE_ROUNDUP(data_size * pool->cfg.size)],
 bmp_size);
+   /* Clear the overhead bits in the trunk if it happens. */
+   if (cur_max_idx + data_size > pool->cfg.max_idx) {
+   for (i = pool->cfg.max_idx - cur_max_idx; i < data_size; i++)
+   rte_bitmap_clear(trunk->bmp, i);
+   }
MLX5_ASSERT(trunk->bmp);
pool->n_trunk_valid++;
 #ifdef POOL_DEBUG
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index b54517c6df..15870e14c2 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -208,6 +208,7 @@ struct mlx5_indexed_pool_config {
uint32_t need_lock:1;
/* Lock is needed for multiple thread usage. */
uint32_t release_mem_en:1; /* Rlease trunk when it is free. */
+   uint32_t max_idx; /* The maximum index can be allocated. */
const char *type; /* Memory allocate type name. */
void *(*malloc)(uint32_t flags, size_t size, unsigned int align,
int socket);
-- 
2.25.1

[dpdk-dev] [PATCH v2 02/22] net/mlx5: add indexed pool local cache

2021-06-30 Thread Suanming Mou

For object which wants efficient index allocate and free, local
cache will be very helpful.

Two level cache is introduced to allocate and free the index more
efficient. One as local and the other as global. The global cache
is able to save all the allocated index. That means all the allocated
index will not be freed. Once the local cache is full, the extra
index will be flushed to the global cache. Once local cache is empty,
first try to fetch more index from global, if global is still empty,
allocate new trunk with more index.

This commit adds new local cache mechanism for indexed pool.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_utils.c | 323 --
 drivers/net/mlx5/mlx5_utils.h |  64 ++-
 2 files changed, 372 insertions(+), 15 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index bf2b2ebc72..215024632d 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -175,14 +175,14 @@ static inline void
 mlx5_ipool_lock(struct mlx5_indexed_pool *pool)
 {
if (pool->cfg.need_lock)
-   rte_spinlock_lock(&pool->lock);
+   rte_spinlock_lock(&pool->rsz_lock);
 }
 
 static inline void
 mlx5_ipool_unlock(struct mlx5_indexed_pool *pool)
 {
if (pool->cfg.need_lock)
-   rte_spinlock_unlock(&pool->lock);
+   rte_spinlock_unlock(&pool->rsz_lock);
 }
 
 static inline uint32_t
@@ -243,6 +243,7 @@ mlx5_ipool_create(struct mlx5_indexed_pool_config *cfg)
uint32_t i;
 
if (!cfg || (!cfg->malloc ^ !cfg->free) ||
+   (cfg->per_core_cache && cfg->release_mem_en) ||
(cfg->trunk_size && ((cfg->trunk_size & (cfg->trunk_size - 1)) ||
((__builtin_ffs(cfg->trunk_size) + TRUNK_IDX_BITS) > 32
return NULL;
@@ -258,9 +259,8 @@ mlx5_ipool_create(struct mlx5_indexed_pool_config *cfg)
pool->cfg.malloc = mlx5_malloc;
pool->cfg.free = mlx5_free;
}
-   pool->free_list = TRUNK_INVALID;
if (pool->cfg.need_lock)
-   rte_spinlock_init(&pool->lock);
+   rte_spinlock_init(&pool->rsz_lock);
/*
 * Initialize the dynamic grow trunk size lookup table to have a quick
 * lookup for the trunk entry index offset.
@@ -273,6 +273,8 @@ mlx5_ipool_create(struct mlx5_indexed_pool_config *cfg)
if (!pool->cfg.max_idx)
pool->cfg.max_idx =
mlx5_trunk_idx_offset_get(pool, TRUNK_MAX_IDX + 1);
+   if (!cfg->per_core_cache)
+   pool->free_list = TRUNK_INVALID;
return pool;
 }
 
@@ -355,6 +357,274 @@ mlx5_ipool_grow(struct mlx5_indexed_pool *pool)
return 0;
 }
 
+static inline struct mlx5_indexed_cache *
+mlx5_ipool_update_global_cache(struct mlx5_indexed_pool *pool, int cidx)
+{
+   struct mlx5_indexed_cache *gc, *lc, *olc = NULL;
+
+   lc = pool->cache[cidx]->lc;
+   gc = __atomic_load_n(&pool->gc, __ATOMIC_RELAXED);
+   if (gc && lc != gc) {
+   mlx5_ipool_lock(pool);
+   if (lc && !(--lc->ref_cnt))
+   olc = lc;
+   lc = pool->gc;
+   lc->ref_cnt++;
+   pool->cache[cidx]->lc = lc;
+   mlx5_ipool_unlock(pool);
+   if (olc)
+   pool->cfg.free(olc);
+   }
+   return lc;
+}
+
+static uint32_t
+mlx5_ipool_allocate_from_global(struct mlx5_indexed_pool *pool, int cidx)
+{
+   struct mlx5_indexed_trunk *trunk;
+   struct mlx5_indexed_cache *p, *lc, *olc = NULL;
+   size_t trunk_size = 0;
+   size_t data_size;
+   uint32_t cur_max_idx, trunk_idx, trunk_n;
+   uint32_t fetch_size, ts_idx, i;
+   int n_grow;
+
+check_again:
+   p = NULL;
+   fetch_size = 0;
+   /*
+* Fetch new index from global if possible. First round local
+* cache will be NULL.
+*/
+   lc = pool->cache[cidx]->lc;
+   mlx5_ipool_lock(pool);
+   /* Try to update local cache first. */
+   if (likely(pool->gc)) {
+   if (lc != pool->gc) {
+   if (lc && !(--lc->ref_cnt))
+   olc = lc;
+   lc = pool->gc;
+   lc->ref_cnt++;
+   pool->cache[cidx]->lc = lc;
+   }
+   if (lc->len) {
+   /* Use the updated local cache to fetch index. */
+   fetch_size = pool->cfg.per_core_cache >> 2;
+   if (lc->len < fetch_size)
+   fetch_size = lc->len;
+   lc->len -= fetch_size;
+   memcpy(pool->cache[cidx]->idx, &lc->idx[lc->len],
+  sizeof(uint32_t) * fetch_size);
+   }
+   }
+   mlx5_ipool_unlock(pool);
+   if (unlikely(olc)) {
+   pool->cfg

[dpdk-dev] [PATCH v2 03/22] net/mlx5: add index pool foreach define

2021-06-30 Thread Suanming Mou

In some cases, application may want to know all the allocated
index in order to apply some operations to the allocated index.

This commit adds the indexed pool functions to support foreach
operation.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_utils.c | 86 +++
 drivers/net/mlx5/mlx5_utils.h |  8 
 2 files changed, 94 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index 215024632d..0ed279e162 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -839,6 +839,92 @@ mlx5_ipool_destroy(struct mlx5_indexed_pool *pool)
return 0;
 }
 
+void
+mlx5_ipool_flush_cache(struct mlx5_indexed_pool *pool)
+{
+   uint32_t i, j;
+   struct mlx5_indexed_cache *gc;
+   struct rte_bitmap *ibmp;
+   uint32_t bmp_num, mem_size;
+
+   if (!pool->cfg.per_core_cache)
+   return;
+   gc = pool->gc;
+   if (!gc)
+   return;
+   /* Reset bmp. */
+   bmp_num = mlx5_trunk_idx_offset_get(pool, gc->n_trunk_valid);
+   mem_size = rte_bitmap_get_memory_footprint(bmp_num);
+   pool->bmp_mem = pool->cfg.malloc(MLX5_MEM_ZERO, mem_size,
+RTE_CACHE_LINE_SIZE, rte_socket_id());
+   if (!pool->bmp_mem) {
+   DRV_LOG(ERR, "Ipool bitmap mem allocate failed.\n");
+   return;
+   }
+   ibmp = rte_bitmap_init_with_all_set(bmp_num, pool->bmp_mem, mem_size);
+   if (!ibmp) {
+   pool->cfg.free(pool->bmp_mem);
+   pool->bmp_mem = NULL;
+   DRV_LOG(ERR, "Ipool bitmap create failed.\n");
+   return;
+   }
+   pool->ibmp = ibmp;
+   /* Clear global cache. */
+   for (i = 0; i < gc->len; i++)
+   rte_bitmap_clear(ibmp, gc->idx[i] - 1);
+   /* Clear core cache. */
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   struct mlx5_ipool_per_lcore *ilc = pool->cache[i];
+
+   if (!ilc)
+   continue;
+   for (j = 0; j < ilc->len; j++)
+   rte_bitmap_clear(ibmp, ilc->idx[j] - 1);
+   }
+}
+
+static void *
+mlx5_ipool_get_next_cache(struct mlx5_indexed_pool *pool, uint32_t *pos)
+{
+   struct rte_bitmap *ibmp;
+   uint64_t slab = 0;
+   uint32_t iidx = *pos;
+
+   ibmp = pool->ibmp;
+   if (!ibmp || !rte_bitmap_scan(ibmp, &iidx, &slab)) {
+   if (pool->bmp_mem) {
+   pool->cfg.free(pool->bmp_mem);
+   pool->bmp_mem = NULL;
+   pool->ibmp = NULL;
+   }
+   return NULL;
+   }
+   iidx += __builtin_ctzll(slab);
+   rte_bitmap_clear(ibmp, iidx);
+   iidx++;
+   *pos = iidx;
+   return mlx5_ipool_get_cache(pool, iidx);
+}
+
+void *
+mlx5_ipool_get_next(struct mlx5_indexed_pool *pool, uint32_t *pos)
+{
+   uint32_t idx = *pos;
+   void *entry;
+
+   if (pool->cfg.per_core_cache)
+   return mlx5_ipool_get_next_cache(pool, pos);
+   while (idx <= mlx5_trunk_idx_offset_get(pool, pool->n_trunk)) {
+   entry = mlx5_ipool_get(pool, idx);
+   if (entry) {
+   *pos = idx;
+   return entry;
+   }
+   idx++;
+   }
+   return NULL;
+}
+
 void
 mlx5_ipool_dump(struct mlx5_indexed_pool *pool)
 {
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 0469062695..737dd7052d 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -261,6 +261,9 @@ struct mlx5_indexed_pool {
/* Global cache. */
struct mlx5_ipool_per_lcore *cache[RTE_MAX_LCORE];
/* Local cache. */
+   struct rte_bitmap *ibmp;
+   void *bmp_mem;
+   /* Allocate objects bitmap. Use during flush. */
};
};
 #ifdef POOL_DEBUG
@@ -862,4 +865,9 @@ struct {
\
 (entry);   \
 idx++, (entry) = mlx5_l3t_get_next((tbl), &idx))
 
+#define MLX5_IPOOL_FOREACH(ipool, idx, entry)  \
+   for ((idx) = 0, mlx5_ipool_flush_cache((ipool)),\
+   (entry) = mlx5_ipool_get_next((ipool), &idx);   \
+   (entry); idx++, (entry) = mlx5_ipool_get_next((ipool), &idx))
+
 #endif /* RTE_PMD_MLX5_UTILS_H_ */
-- 
2.25.1

[dpdk-dev] [PATCH v2 05/22] net/mlx5: optimize modify header action memory

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

Define the types of the modify header action fields to be with the
minimum size needed for the optional values range.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/common/mlx5/linux/mlx5_glue.h |  1 +
 drivers/net/mlx5/linux/mlx5_flow_os.h |  3 ++-
 drivers/net/mlx5/mlx5_flow.h  |  6 +++---
 drivers/net/mlx5/mlx5_flow_dv.c   | 13 ++---
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_glue.h 
b/drivers/common/mlx5/linux/mlx5_glue.h
index 840d8cf57f..a186ee577f 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -78,6 +78,7 @@ struct mlx5dv_devx_async_cmd_hdr;
 enum  mlx5dv_dr_domain_type { unused, };
 struct mlx5dv_dr_domain;
 struct mlx5dv_dr_action;
+#define MLX5DV_DR_ACTION_FLAGS_ROOT_LEVEL 1
 #endif
 
 #ifndef HAVE_MLX5DV_DR_DEVX_PORT
diff --git a/drivers/net/mlx5/linux/mlx5_flow_os.h 
b/drivers/net/mlx5/linux/mlx5_flow_os.h
index cee685015b..1926d26410 100644
--- a/drivers/net/mlx5/linux/mlx5_flow_os.h
+++ b/drivers/net/mlx5/linux/mlx5_flow_os.h
@@ -225,7 +225,8 @@ mlx5_flow_os_create_flow_action_modify_header(void *ctx, 
void *domain,
(struct mlx5_flow_dv_modify_hdr_resource *)resource;
 
*action = mlx5_glue->dv_create_flow_action_modify_header
-   (ctx, res->ft_type, domain, res->flags,
+   (ctx, res->ft_type, domain, res->root ?
+MLX5DV_DR_ACTION_FLAGS_ROOT_LEVEL : 0,
 actions_len, (uint64_t *)res->actions);
return (*action) ? 0 : -1;
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index d9b6acaafd..81c95e0beb 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -523,11 +523,11 @@ struct mlx5_flow_dv_modify_hdr_resource {
void *action; /**< Modify header action object. */
/* Key area for hash list matching: */
uint8_t ft_type; /**< Flow table type, Rx or Tx. */
-   uint32_t actions_num; /**< Number of modification actions. */
-   uint64_t flags; /**< Flags for RDMA API. */
+   uint8_t actions_num; /**< Number of modification actions. */
+   bool root; /**< Whether action is in root table. */
struct mlx5_modification_cmd actions[];
/**< Modification actions. */
-};
+} __rte_packed;
 
 /* Modify resource key of the hash organization. */
 union mlx5_flow_modify_hdr_key {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 67f7243503..784ec11dea 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5000,21 +5000,21 @@ flow_dv_validate_action_port_id(struct rte_eth_dev *dev,
  *
  * @param dev
  *   Pointer to rte_eth_dev structure.
- * @param flags
- *   Flags bits to check if root level.
+ * @param root
+ *   Whether action is on root table.
  *
  * @return
  *   Max number of modify header actions device can support.
  */
 static inline unsigned int
 flow_dv_modify_hdr_action_max(struct rte_eth_dev *dev __rte_unused,
- uint64_t flags)
+ bool root)
 {
/*
 * There's no way to directly query the max capacity from FW.
 * The maximal value on root table should be assumed to be supported.
 */
-   if (!(flags & MLX5DV_DR_ACTION_FLAGS_ROOT_LEVEL))
+   if (!root)
return MLX5_MAX_MODIFY_NUM;
else
return MLX5_ROOT_TBL_MODIFY_NUM;
@@ -5582,10 +5582,9 @@ flow_dv_modify_hdr_resource_register
};
uint64_t key64;
 
-   resource->flags = dev_flow->dv.group ? 0 :
- MLX5DV_DR_ACTION_FLAGS_ROOT_LEVEL;
+   resource->root = !dev_flow->dv.group;
if (resource->actions_num > flow_dv_modify_hdr_action_max(dev,
-   resource->flags))
+   resource->root))
return rte_flow_error_set(error, EOVERFLOW,
  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
  "too many modify header items");
-- 
2.25.1

[dpdk-dev] [PATCH v2 04/22] net/mlx5: replace flow list with index pool

2021-06-30 Thread Suanming Mou

The flow list is used to save the create flows and to be used only
when port closes all the flows need to be flushed.

This commit takes advantage of the index pool foreach operation to
flush all the allocated flows.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   |  48 +-
 drivers/net/mlx5/mlx5.c|   9 +-
 drivers/net/mlx5/mlx5.h|  14 ++-
 drivers/net/mlx5/mlx5_flow.c   | 149 ++---
 drivers/net/mlx5/mlx5_flow.h   |   2 +-
 drivers/net/mlx5/mlx5_flow_dv.c|   5 +
 drivers/net/mlx5/mlx5_trigger.c|   8 +-
 drivers/net/mlx5/windows/mlx5_os.c |   1 -
 8 files changed, 126 insertions(+), 110 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 92b3009786..31cc8d9eb8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -69,6 +69,44 @@ static rte_spinlock_t mlx5_shared_data_lock = 
RTE_SPINLOCK_INITIALIZER;
 /* Process local data for secondary processes. */
 static struct mlx5_local_data mlx5_local_data;
 
+/* rte flow indexed pool configuration. */
+static struct mlx5_indexed_pool_config icfg[] = {
+   {
+   .size = sizeof(struct rte_flow),
+   .trunk_size = 64,
+   .need_lock = 1,
+   .release_mem_en = 0,
+   .malloc = mlx5_malloc,
+   .free = mlx5_free,
+   .per_core_cache = 0,
+   .type = "ctl_flow_ipool",
+   },
+   {
+   .size = sizeof(struct rte_flow),
+   .trunk_size = 64,
+   .grow_trunk = 3,
+   .grow_shift = 2,
+   .need_lock = 1,
+   .release_mem_en = 0,
+   .malloc = mlx5_malloc,
+   .free = mlx5_free,
+   .per_core_cache = 1 << 14,
+   .type = "rte_flow_ipool",
+   },
+   {
+   .size = sizeof(struct rte_flow),
+   .trunk_size = 64,
+   .grow_trunk = 3,
+   .grow_shift = 2,
+   .need_lock = 1,
+   .release_mem_en = 0,
+   .malloc = mlx5_malloc,
+   .free = mlx5_free,
+   .per_core_cache = 0,
+   .type = "mcp_flow_ipool",
+   },
+};
+
 /**
  * Set the completion channel file descriptor interrupt as non-blocking.
  *
@@ -823,6 +861,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
int own_domain_id = 0;
uint16_t port_id;
struct mlx5_port_info vport_info = { .query_flags = 0 };
+   int i;
 
/* Determine if this port representor is supposed to be spawned. */
if (switch_info->representor && dpdk_dev->devargs &&
@@ -1566,7 +1605,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
  mlx5_ifindex(eth_dev),
  eth_dev->data->mac_addrs,
  MLX5_MAX_MAC_ADDRESSES);
-   priv->flows = 0;
priv->ctrl_flows = 0;
rte_spinlock_init(&priv->flow_list_lock);
TAILQ_INIT(&priv->flow_meters);
@@ -1600,6 +1638,14 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mlx5_set_min_inline(spawn, config);
/* Store device configuration on private structure. */
priv->config = *config;
+   for (i = 0; i < MLX5_FLOW_TYPE_MAXI; i++) {
+   icfg[i].release_mem_en = !!config->reclaim_mode;
+   if (config->reclaim_mode)
+   icfg[i].per_core_cache = 0;
+   priv->flows[i] = mlx5_ipool_create(&icfg[i]);
+   if (!priv->flows[i])
+   goto error;
+   }
/* Create context for virtual machine VLAN workaround. */
priv->vmwa_context = mlx5_vlan_vmwa_init(eth_dev, spawn->ifindex);
if (config->dv_flow_en) {
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cf1815cb74..fcfc3dcdca 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -322,7 +322,8 @@ static const struct mlx5_indexed_pool_config 
mlx5_ipool_cfg[] = {
.grow_trunk = 3,
.grow_shift = 2,
.need_lock = 1,
-   .release_mem_en = 1,
+   .release_mem_en = 0,
+   .per_core_cache = 1 << 19,
.malloc = mlx5_malloc,
.free = mlx5_free,
.type = "mlx5_flow_handle_ipool",
@@ -792,8 +793,10 @@ mlx5_flow_ipool_create(struct mlx5_dev_ctx_shared *sh,
MLX5_FLOW_HANDLE_VERBS_SIZE;
break;
}
-   if (config->reclaim_mode)
+   if (config->reclaim_mode) {
cfg.release_mem_en = 1;
+   cfg.per_core_cache = 0;
+   }
sh->ipool[i] = mlx5_ipool_create(&cfg);
}
 }
@@ -1528,7 +1531,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 *

[dpdk-dev] [PATCH v2 06/22] net/mlx5: remove cache term from the list utility

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

The internal mlx5 list tool is used mainly when the list objects need to
be synchronized between multiple threads.

The "cache" term is used in the internal mlx5 list API.

Next enhancements on this tool will use the "cache" term for per thread
cache management.

To prevent confusing, remove the current "cache" term from the API's
names.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/linux/mlx5_os.c   |  32 +-
 drivers/net/mlx5/mlx5.c|   2 +-
 drivers/net/mlx5/mlx5.h|  15 +-
 drivers/net/mlx5/mlx5_flow.h   |  88 ++---
 drivers/net/mlx5/mlx5_flow_dv.c| 558 ++---
 drivers/net/mlx5/mlx5_rx.h |  12 +-
 drivers/net/mlx5/mlx5_rxq.c|  28 +-
 drivers/net/mlx5/mlx5_utils.c  |  78 ++--
 drivers/net/mlx5/mlx5_utils.h  |  94 ++---
 drivers/net/mlx5/windows/mlx5_os.c |   7 +-
 10 files changed, 454 insertions(+), 460 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 31cc8d9eb8..9aa57e38b7 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -272,27 +272,27 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
goto error;
/* The resources below are only valid with DV support. */
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-   /* Init port id action cache list. */
-   snprintf(s, sizeof(s), "%s_port_id_action_cache", sh->ibdev_name);
-   mlx5_cache_list_init(&sh->port_id_action_list, s, 0, sh,
+   /* Init port id action mlx5 list. */
+   snprintf(s, sizeof(s), "%s_port_id_action_list", sh->ibdev_name);
+   mlx5_list_create(&sh->port_id_action_list, s, 0, sh,
 flow_dv_port_id_create_cb,
 flow_dv_port_id_match_cb,
 flow_dv_port_id_remove_cb);
-   /* Init push vlan action cache list. */
-   snprintf(s, sizeof(s), "%s_push_vlan_action_cache", sh->ibdev_name);
-   mlx5_cache_list_init(&sh->push_vlan_action_list, s, 0, sh,
+   /* Init push vlan action mlx5 list. */
+   snprintf(s, sizeof(s), "%s_push_vlan_action_list", sh->ibdev_name);
+   mlx5_list_create(&sh->push_vlan_action_list, s, 0, sh,
 flow_dv_push_vlan_create_cb,
 flow_dv_push_vlan_match_cb,
 flow_dv_push_vlan_remove_cb);
-   /* Init sample action cache list. */
-   snprintf(s, sizeof(s), "%s_sample_action_cache", sh->ibdev_name);
-   mlx5_cache_list_init(&sh->sample_action_list, s, 0, sh,
+   /* Init sample action mlx5 list. */
+   snprintf(s, sizeof(s), "%s_sample_action_list", sh->ibdev_name);
+   mlx5_list_create(&sh->sample_action_list, s, 0, sh,
 flow_dv_sample_create_cb,
 flow_dv_sample_match_cb,
 flow_dv_sample_remove_cb);
-   /* Init dest array action cache list. */
-   snprintf(s, sizeof(s), "%s_dest_array_cache", sh->ibdev_name);
-   mlx5_cache_list_init(&sh->dest_array_list, s, 0, sh,
+   /* Init dest array action mlx5 list. */
+   snprintf(s, sizeof(s), "%s_dest_array_list", sh->ibdev_name);
+   mlx5_list_create(&sh->dest_array_list, s, 0, sh,
 flow_dv_dest_array_create_cb,
 flow_dv_dest_array_match_cb,
 flow_dv_dest_array_remove_cb);
@@ -500,8 +500,8 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
mlx5_release_tunnel_hub(sh, priv->dev_port);
sh->tunnel_hub = NULL;
}
-   mlx5_cache_list_destroy(&sh->port_id_action_list);
-   mlx5_cache_list_destroy(&sh->push_vlan_action_list);
+   mlx5_list_destroy(&sh->port_id_action_list);
+   mlx5_list_destroy(&sh->push_vlan_action_list);
mlx5_free_table_hash_list(priv);
 }
 
@@ -1702,7 +1702,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
err = ENOTSUP;
goto error;
}
-   mlx5_cache_list_init(&priv->hrxqs, "hrxq", 0, eth_dev,
+   mlx5_list_create(&priv->hrxqs, "hrxq", 0, eth_dev,
 mlx5_hrxq_create_cb,
 mlx5_hrxq_match_cb,
 mlx5_hrxq_remove_cb);
@@ -1761,7 +1761,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mlx5_drop_action_destroy(eth_dev);
if (own_domain_id)
claim_zero(rte_eth_switch_domain_free(priv->domain_id));
-   mlx5_cache_list_destroy(&priv->hrxqs);
+   mlx5_list_destroy(&priv->hrxqs);
mlx5_free(priv);
if (eth_dev != NULL)
eth_dev->data->dev_private = NULL;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fcfc3dcdca..9aade013c5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@

[dpdk-dev] [PATCH v2 07/22] net/mlx5: add per lcore cache to the list utility

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

When mlx5 list object is accessed by multiple cores, the list lock
counter is all the time written by all the cores what increases cache
misses in the memory caches.

In addition, when one thread accesses the list for add\remove\lookup
operation, all the other threads coming to do an operation in the list
are stuck in the lock.

Add per lcore cache to allow thread manipulations to be lockless when
the list objects are mostly reused.

Synchronization with atomic operations should be done in order to
allow threads to unregister an entry from other thread cache.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/linux/mlx5_os.c   |  58 
 drivers/net/mlx5/mlx5.h|   1 +
 drivers/net/mlx5/mlx5_flow.h   |  21 ++-
 drivers/net/mlx5/mlx5_flow_dv.c| 181 +++-
 drivers/net/mlx5/mlx5_rx.h |   5 +
 drivers/net/mlx5/mlx5_rxq.c|  71 +++---
 drivers/net/mlx5/mlx5_utils.c  | 214 ++---
 drivers/net/mlx5/mlx5_utils.h  |  30 ++--
 drivers/net/mlx5/windows/mlx5_os.c |   5 +-
 9 files changed, 451 insertions(+), 135 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 9aa57e38b7..8a043526da 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -272,30 +272,38 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
goto error;
/* The resources below are only valid with DV support. */
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-   /* Init port id action mlx5 list. */
+   /* Init port id action list. */
snprintf(s, sizeof(s), "%s_port_id_action_list", sh->ibdev_name);
-   mlx5_list_create(&sh->port_id_action_list, s, 0, sh,
-flow_dv_port_id_create_cb,
-flow_dv_port_id_match_cb,
-flow_dv_port_id_remove_cb);
-   /* Init push vlan action mlx5 list. */
+   mlx5_list_create(&sh->port_id_action_list, s, sh,
+flow_dv_port_id_create_cb,
+flow_dv_port_id_match_cb,
+flow_dv_port_id_remove_cb,
+flow_dv_port_id_clone_cb,
+flow_dv_port_id_clone_free_cb);
+   /* Init push vlan action list. */
snprintf(s, sizeof(s), "%s_push_vlan_action_list", sh->ibdev_name);
-   mlx5_list_create(&sh->push_vlan_action_list, s, 0, sh,
-flow_dv_push_vlan_create_cb,
-flow_dv_push_vlan_match_cb,
-flow_dv_push_vlan_remove_cb);
-   /* Init sample action mlx5 list. */
+   mlx5_list_create(&sh->push_vlan_action_list, s, sh,
+flow_dv_push_vlan_create_cb,
+flow_dv_push_vlan_match_cb,
+flow_dv_push_vlan_remove_cb,
+flow_dv_push_vlan_clone_cb,
+flow_dv_push_vlan_clone_free_cb);
+   /* Init sample action list. */
snprintf(s, sizeof(s), "%s_sample_action_list", sh->ibdev_name);
-   mlx5_list_create(&sh->sample_action_list, s, 0, sh,
-flow_dv_sample_create_cb,
-flow_dv_sample_match_cb,
-flow_dv_sample_remove_cb);
-   /* Init dest array action mlx5 list. */
+   mlx5_list_create(&sh->sample_action_list, s, sh,
+flow_dv_sample_create_cb,
+flow_dv_sample_match_cb,
+flow_dv_sample_remove_cb,
+flow_dv_sample_clone_cb,
+flow_dv_sample_clone_free_cb);
+   /* Init dest array action list. */
snprintf(s, sizeof(s), "%s_dest_array_list", sh->ibdev_name);
-   mlx5_list_create(&sh->dest_array_list, s, 0, sh,
-flow_dv_dest_array_create_cb,
-flow_dv_dest_array_match_cb,
-flow_dv_dest_array_remove_cb);
+   mlx5_list_create(&sh->dest_array_list, s, sh,
+flow_dv_dest_array_create_cb,
+flow_dv_dest_array_match_cb,
+flow_dv_dest_array_remove_cb,
+flow_dv_dest_array_clone_cb,
+flow_dv_dest_array_clone_free_cb);
/* Create tags hash list table. */
snprintf(s, sizeof(s), "%s_tags", sh->ibdev_name);
sh->tag_table = mlx5_hlist_create(s, MLX5_TAGS_HLIST_ARRAY_SIZE, 0,
@@ -1702,10 +1710,12 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
err = ENOTSUP;
goto error;
}
-   mlx5_list_create(&priv->hrxqs, "hrxq", 0, eth_dev,
-mlx5_hrxq_create_cb,
-mlx5_hrxq_match_cb,
-mlx5_hrxq_remove_cb);
+   mlx5_list_create(&priv->hrxq

[dpdk-dev] [PATCH v2 08/22] net/mlx5: minimize list critical sections

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

The mlx5 internal list utility is thread safe.

In order to synchronize list access between the threads, a RW lock is
taken for the critical sections.

The create\remove\clone\clone_free operations are in the critical
sections.

These operations are heavy and make the critical sections heavy because
they are used for memory and other resources allocations\deallocations.

Moved out the operations from the critical sections and use generation
counter in order to detect parallel allocations.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_utils.c | 86 ++-
 drivers/net/mlx5/mlx5_utils.h |  5 +-
 2 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index 51cca68ea9..772b352af5 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -101,7 +101,7 @@ mlx5_list_cache_insert(struct mlx5_list *list, int 
lcore_index,
 {
struct mlx5_list_entry *lentry = list->cb_clone(list, gentry, ctx);
 
-   if (!lentry)
+   if (unlikely(!lentry))
return NULL;
lentry->ref_cnt = 1u;
lentry->gentry = gentry;
@@ -112,8 +112,8 @@ mlx5_list_cache_insert(struct mlx5_list *list, int 
lcore_index,
 struct mlx5_list_entry *
 mlx5_list_register(struct mlx5_list *list, void *ctx)
 {
-   struct mlx5_list_entry *entry, *lentry;
-   uint32_t prev_gen_cnt = 0;
+   struct mlx5_list_entry *entry, *local_entry;
+   volatile uint32_t prev_gen_cnt = 0;
int lcore_index = rte_lcore_index(rte_lcore_id());
 
MLX5_ASSERT(list);
@@ -122,51 +122,56 @@ mlx5_list_register(struct mlx5_list *list, void *ctx)
rte_errno = ENOTSUP;
return NULL;
}
-   /* Lookup in local cache. */
-   lentry = __list_lookup(list, lcore_index, ctx, true);
-   if (lentry)
-   return lentry;
-   /* Lookup with read lock, reuse if found. */
+   /* 1. Lookup in local cache. */
+   local_entry = __list_lookup(list, lcore_index, ctx, true);
+   if (local_entry)
+   return local_entry;
+   /* 2. Lookup with read lock on global list, reuse if found. */
rte_rwlock_read_lock(&list->lock);
entry = __list_lookup(list, RTE_MAX_LCORE, ctx, true);
-   if (entry == NULL) {
-   prev_gen_cnt = __atomic_load_n(&list->gen_cnt,
-  __ATOMIC_ACQUIRE);
-   rte_rwlock_read_unlock(&list->lock);
-   } else {
+   if (likely(entry)) {
rte_rwlock_read_unlock(&list->lock);
return mlx5_list_cache_insert(list, lcore_index, entry, ctx);
}
-   /* Not found, append with write lock - block read from other threads. */
+   prev_gen_cnt = list->gen_cnt;
+   rte_rwlock_read_unlock(&list->lock);
+   /* 3. Prepare new entry for global list and for cache. */
+   entry = list->cb_create(list, entry, ctx);
+   if (unlikely(!entry))
+   return NULL;
+   local_entry = list->cb_clone(list, entry, ctx);
+   if (unlikely(!local_entry)) {
+   list->cb_remove(list, entry);
+   return NULL;
+   }
+   entry->ref_cnt = 1u;
+   local_entry->ref_cnt = 1u;
+   local_entry->gentry = entry;
rte_rwlock_write_lock(&list->lock);
-   /* If list changed by other threads before lock, search again. */
-   if (prev_gen_cnt != __atomic_load_n(&list->gen_cnt, __ATOMIC_ACQUIRE)) {
-   /* Lookup and reuse w/o read lock. */
-   entry = __list_lookup(list, RTE_MAX_LCORE, ctx, true);
-   if (entry) {
+   /* 4. Make sure the same entry was not created before the write lock. */
+   if (unlikely(prev_gen_cnt != list->gen_cnt)) {
+   struct mlx5_list_entry *oentry = __list_lookup(list,
+  RTE_MAX_LCORE,
+  ctx, true);
+
+   if (unlikely(oentry)) {
+   /* 4.5. Found real race!!, reuse the old entry. */
rte_rwlock_write_unlock(&list->lock);
-   return mlx5_list_cache_insert(list, lcore_index, entry,
- ctx);
-   }
-   }
-   entry = list->cb_create(list, entry, ctx);
-   if (entry) {
-   lentry = mlx5_list_cache_insert(list, lcore_index, entry, ctx);
-   if (!lentry) {
list->cb_remove(list, entry);
-   } else {
-   entry->ref_cnt = 1u;
-   LIST_INSERT_HEAD(&list->cache[RTE_MAX_LCORE].h, entry,
-next);
-   __atomic_add_fetch(&list->gen_cnt, 1, __ATOMIC_RELEASE);
-   __atomic_add_fetch(

[dpdk-dev] [PATCH v2 09/22] net/mlx5: manage list cache entries release

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

When a cache entry is allocated by lcore A and is released by lcore B,
the driver should synchronize the cache list access of lcore A.

The design decision is to manage a counter per lcore cache that will be
increased atomically when the non-original lcore decreases the reference
counter of cache entry to 0.

In list register operation, before the running lcore starts a lookup in
its cache, it will check the counter in order to free invalid entries in
its cache.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_utils.c | 79 +++
 drivers/net/mlx5/mlx5_utils.h |  2 +
 2 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index 772b352af5..7cdf44dcf7 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -47,36 +47,25 @@ __list_lookup(struct mlx5_list *list, int lcore_index, void 
*ctx, bool reuse)
uint32_t ret;
 
while (entry != NULL) {
-   struct mlx5_list_entry *nentry = LIST_NEXT(entry, next);
-
-   if (list->cb_match(list, entry, ctx)) {
-   if (lcore_index < RTE_MAX_LCORE) {
+   if (list->cb_match(list, entry, ctx) == 0) {
+   if (reuse) {
+   ret = __atomic_add_fetch(&entry->ref_cnt, 1,
+__ATOMIC_ACQUIRE) - 1;
+   DRV_LOG(DEBUG, "mlx5 list %s entry %p ref: %u.",
+   list->name, (void *)entry,
+   entry->ref_cnt);
+   } else if (lcore_index < RTE_MAX_LCORE) {
ret = __atomic_load_n(&entry->ref_cnt,
  __ATOMIC_ACQUIRE);
-   if (ret == 0) {
-   LIST_REMOVE(entry, next);
-   list->cb_clone_free(list, entry);
-   }
-   }
-   entry = nentry;
-   continue;
-   }
-   if (reuse) {
-   ret = __atomic_add_fetch(&entry->ref_cnt, 1,
-__ATOMIC_ACQUIRE);
-   if (ret == 1u) {
-   /* Entry was invalid before, free it. */
-   LIST_REMOVE(entry, next);
-   list->cb_clone_free(list, entry);
-   entry = nentry;
-   continue;
}
-   DRV_LOG(DEBUG, "mlx5 list %s entry %p ref++: %u.",
-   list->name, (void *)entry, entry->ref_cnt);
+   if (likely(ret != 0 || lcore_index == RTE_MAX_LCORE))
+   return entry;
+   if (reuse && ret == 0)
+   entry->ref_cnt--; /* Invalid entry. */
}
-   break;
+   entry = LIST_NEXT(entry, next);
}
-   return entry;
+   return NULL;
 }
 
 struct mlx5_list_entry *
@@ -105,10 +94,31 @@ mlx5_list_cache_insert(struct mlx5_list *list, int 
lcore_index,
return NULL;
lentry->ref_cnt = 1u;
lentry->gentry = gentry;
+   lentry->lcore_idx = (uint32_t)lcore_index;
LIST_INSERT_HEAD(&list->cache[lcore_index].h, lentry, next);
return lentry;
 }
 
+static void
+__list_cache_clean(struct mlx5_list *list, int lcore_index)
+{
+   struct mlx5_list_cache *c = &list->cache[lcore_index];
+   struct mlx5_list_entry *entry = LIST_FIRST(&c->h);
+   uint32_t inv_cnt = __atomic_exchange_n(&c->inv_cnt, 0,
+  __ATOMIC_RELAXED);
+
+   while (inv_cnt != 0 && entry != NULL) {
+   struct mlx5_list_entry *nentry = LIST_NEXT(entry, next);
+
+   if (__atomic_load_n(&entry->ref_cnt, __ATOMIC_RELAXED) == 0) {
+   LIST_REMOVE(entry, next);
+   list->cb_clone_free(list, entry);
+   inv_cnt--;
+   }
+   entry = nentry;
+   }
+}
+
 struct mlx5_list_entry *
 mlx5_list_register(struct mlx5_list *list, void *ctx)
 {
@@ -122,6 +132,8 @@ mlx5_list_register(struct mlx5_list *list, void *ctx)
rte_errno = ENOTSUP;
return NULL;
}
+   /* 0. Free entries that was invalidated by other lcores. */
+   __list_cache_clean(list, lcore_index);
/* 1. Lookup in local cache. */
local_entry = __list_lookup(list, lcore_index, ctx, true);
if (local_entry)
@@ -147,6 +159,7 @@ mlx5_list_register(struct mlx5_list *list, void *ctx)
entry->ref_cnt = 1u;
local_entr

[dpdk-dev] [PATCH v2 10/22] net/mlx5: relax the list utility atomic operations

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

The atomic operation in the list utility no need a barriers because the
critical part are managed by RW lock.

Relax them.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5_utils.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index 7cdf44dcf7..29248c80ed 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -50,13 +50,13 @@ __list_lookup(struct mlx5_list *list, int lcore_index, void 
*ctx, bool reuse)
if (list->cb_match(list, entry, ctx) == 0) {
if (reuse) {
ret = __atomic_add_fetch(&entry->ref_cnt, 1,
-__ATOMIC_ACQUIRE) - 1;
+__ATOMIC_RELAXED) - 1;
DRV_LOG(DEBUG, "mlx5 list %s entry %p ref: %u.",
list->name, (void *)entry,
entry->ref_cnt);
} else if (lcore_index < RTE_MAX_LCORE) {
ret = __atomic_load_n(&entry->ref_cnt,
- __ATOMIC_ACQUIRE);
+ __ATOMIC_RELAXED);
}
if (likely(ret != 0 || lcore_index == RTE_MAX_LCORE))
return entry;
@@ -181,7 +181,7 @@ mlx5_list_register(struct mlx5_list *list, void *ctx)
list->gen_cnt++;
rte_rwlock_write_unlock(&list->lock);
LIST_INSERT_HEAD(&list->cache[lcore_index].h, local_entry, next);
-   __atomic_add_fetch(&list->count, 1, __ATOMIC_ACQUIRE);
+   __atomic_add_fetch(&list->count, 1, __ATOMIC_RELAXED);
DRV_LOG(DEBUG, "mlx5 list %s entry %p new: %u.", list->name,
(void *)entry, entry->ref_cnt);
return local_entry;
@@ -194,7 +194,7 @@ mlx5_list_unregister(struct mlx5_list *list,
struct mlx5_list_entry *gentry = entry->gentry;
int lcore_idx;
 
-   if (__atomic_sub_fetch(&entry->ref_cnt, 1, __ATOMIC_ACQUIRE) != 0)
+   if (__atomic_sub_fetch(&entry->ref_cnt, 1, __ATOMIC_RELAXED) != 0)
return 1;
lcore_idx = rte_lcore_index(rte_lcore_id());
MLX5_ASSERT(lcore_idx < RTE_MAX_LCORE);
@@ -207,14 +207,14 @@ mlx5_list_unregister(struct mlx5_list *list,
} else {
return 0;
}
-   if (__atomic_sub_fetch(&gentry->ref_cnt, 1, __ATOMIC_ACQUIRE) != 0)
+   if (__atomic_sub_fetch(&gentry->ref_cnt, 1, __ATOMIC_RELAXED) != 0)
return 1;
rte_rwlock_write_lock(&list->lock);
if (likely(gentry->ref_cnt == 0)) {
LIST_REMOVE(gentry, next);
rte_rwlock_write_unlock(&list->lock);
list->cb_remove(list, gentry);
-   __atomic_sub_fetch(&list->count, 1, __ATOMIC_ACQUIRE);
+   __atomic_sub_fetch(&list->count, 1, __ATOMIC_RELAXED);
DRV_LOG(DEBUG, "mlx5 list %s entry %p removed.",
list->name, (void *)gentry);
return 0;
-- 
2.25.1

[dpdk-dev] [PATCH v2 11/22] net/mlx5: allocate list memory by the create API

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

Currently, the list memory was allocated by the list API caller.

Move it to be allocated by the create API in order to save consistence
with the hlist utility.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 105 -
 drivers/net/mlx5/mlx5.c|   3 +-
 drivers/net/mlx5/mlx5.h|  10 +--
 drivers/net/mlx5/mlx5_flow.h   |   2 +-
 drivers/net/mlx5/mlx5_flow_dv.c|  56 ---
 drivers/net/mlx5/mlx5_rxq.c|   6 +-
 drivers/net/mlx5/mlx5_utils.c  |  19 --
 drivers/net/mlx5/mlx5_utils.h  |  15 ++---
 drivers/net/mlx5/windows/mlx5_os.c |   2 +-
 9 files changed, 137 insertions(+), 81 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 8a043526da..87b63d852b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -274,36 +274,44 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
/* Init port id action list. */
snprintf(s, sizeof(s), "%s_port_id_action_list", sh->ibdev_name);
-   mlx5_list_create(&sh->port_id_action_list, s, sh,
-flow_dv_port_id_create_cb,
-flow_dv_port_id_match_cb,
-flow_dv_port_id_remove_cb,
-flow_dv_port_id_clone_cb,
-flow_dv_port_id_clone_free_cb);
+   sh->port_id_action_list = mlx5_list_create(s, sh,
+  flow_dv_port_id_create_cb,
+  flow_dv_port_id_match_cb,
+  flow_dv_port_id_remove_cb,
+  flow_dv_port_id_clone_cb,
+flow_dv_port_id_clone_free_cb);
+   if (!sh->port_id_action_list)
+   goto error;
/* Init push vlan action list. */
snprintf(s, sizeof(s), "%s_push_vlan_action_list", sh->ibdev_name);
-   mlx5_list_create(&sh->push_vlan_action_list, s, sh,
-flow_dv_push_vlan_create_cb,
-flow_dv_push_vlan_match_cb,
-flow_dv_push_vlan_remove_cb,
-flow_dv_push_vlan_clone_cb,
-flow_dv_push_vlan_clone_free_cb);
+   sh->push_vlan_action_list = mlx5_list_create(s, sh,
+   flow_dv_push_vlan_create_cb,
+   flow_dv_push_vlan_match_cb,
+   flow_dv_push_vlan_remove_cb,
+   flow_dv_push_vlan_clone_cb,
+  flow_dv_push_vlan_clone_free_cb);
+   if (!sh->push_vlan_action_list)
+   goto error;
/* Init sample action list. */
snprintf(s, sizeof(s), "%s_sample_action_list", sh->ibdev_name);
-   mlx5_list_create(&sh->sample_action_list, s, sh,
-flow_dv_sample_create_cb,
-flow_dv_sample_match_cb,
-flow_dv_sample_remove_cb,
-flow_dv_sample_clone_cb,
-flow_dv_sample_clone_free_cb);
+   sh->sample_action_list = mlx5_list_create(s, sh,
+ flow_dv_sample_create_cb,
+ flow_dv_sample_match_cb,
+ flow_dv_sample_remove_cb,
+ flow_dv_sample_clone_cb,
+ flow_dv_sample_clone_free_cb);
+   if (!sh->sample_action_list)
+   goto error;
/* Init dest array action list. */
snprintf(s, sizeof(s), "%s_dest_array_list", sh->ibdev_name);
-   mlx5_list_create(&sh->dest_array_list, s, sh,
-flow_dv_dest_array_create_cb,
-flow_dv_dest_array_match_cb,
-flow_dv_dest_array_remove_cb,
-flow_dv_dest_array_clone_cb,
-flow_dv_dest_array_clone_free_cb);
+   sh->dest_array_list = mlx5_list_create(s, sh,
+  flow_dv_dest_array_create_cb,
+  flow_dv_dest_array_match_cb,
+  flow_dv_dest_array_remove_cb,
+  flow_dv_dest_array_clone_cb,
+ flow_dv_dest_array_clone_free_cb);
+   if (!sh->dest_array_list)
+   goto error;
/* Create tags hash list table. */
snprintf(s, sizeof(s), "%s_tags", sh->ibdev_name);
sh->tag_table = mlx5_hlist_create(s,

[dpdk-dev] [PATCH v2 13/22] net/mlx5: move modify header allocator to ipool

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

Modify header actions are allocated by mlx5_malloc which has a big
overhead of memory and allocation time.

One of the action types under the modify header object is SET_TAG,

The SET_TAG action is commonly not reused by the flows and each flow has
its own value.

Hence, the mlx5_malloc becomes a bottleneck in flow insertion rate in
the common cases of SET_TAG.

Use ipool allocator for SET_TAG action.

Ipool allocator has less overhead of memory and insertion rate and has
better synchronization mechanism in multithread cases.

Different ipool is created for each optional size of modify header
handler.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 drivers/net/mlx5/mlx5.c |  4 ++
 drivers/net/mlx5/mlx5.h | 14 ++
 drivers/net/mlx5/mlx5_flow.h| 14 +-
 drivers/net/mlx5/mlx5_flow_dv.c | 79 -
 4 files changed, 86 insertions(+), 25 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0e80408511..713accf675 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -801,6 +801,7 @@ mlx5_flow_ipool_create(struct mlx5_dev_ctx_shared *sh,
}
 }
 
+
 /**
  * Release the flow resources' indexed mempool.
  *
@@ -814,6 +815,9 @@ mlx5_flow_ipool_destroy(struct mlx5_dev_ctx_shared *sh)
 
for (i = 0; i < MLX5_IPOOL_MAX; ++i)
mlx5_ipool_destroy(sh->ipool[i]);
+   for (i = 0; i < MLX5_MAX_MODIFY_NUM; ++i)
+   if (sh->mdh_ipools[i])
+   mlx5_ipool_destroy(sh->mdh_ipools[i]);
 }
 
 /*
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 896c0c2da1..5774f63244 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -36,6 +36,19 @@
 
 #define MLX5_SH(dev) (((struct mlx5_priv *)(dev)->data->dev_private)->sh)
 
+/*
+ * Number of modification commands.
+ * The maximal actions amount in FW is some constant, and it is 16 in the
+ * latest releases. In some old releases, it will be limited to 8.
+ * Since there is no interface to query the capacity, the maximal value should
+ * be used to allow PMD to create the flow. The validation will be done in the
+ * lower driver layer or FW. A failure will be returned if exceeds the maximal
+ * supported actions number on the root table.
+ * On non-root tables, there is no limitation, but 32 is enough right now.
+ */
+#define MLX5_MAX_MODIFY_NUM32
+#define MLX5_ROOT_TBL_MODIFY_NUM   16
+
 enum mlx5_ipool_index {
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
MLX5_IPOOL_DECAP_ENCAP = 0, /* Pool for encap/decap resource. */
@@ -1123,6 +1136,7 @@ struct mlx5_dev_ctx_shared {
struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
void *default_miss_action; /* Default miss action. */
struct mlx5_indexed_pool *ipool[MLX5_IPOOL_MAX];
+   struct mlx5_indexed_pool *mdh_ipools[MLX5_MAX_MODIFY_NUM];
/* Memory Pool for mlx5 flow resources. */
struct mlx5_l3t_tbl *cnt_id_tbl; /* Shared counter lookup table. */
/* Shared interrupt handler section. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ab4e8c5c4f..4552aaa803 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -504,23 +504,11 @@ struct mlx5_flow_dv_tag_resource {
uint32_t tag_id; /**< Tag ID. */
 };
 
-/*
- * Number of modification commands.
- * The maximal actions amount in FW is some constant, and it is 16 in the
- * latest releases. In some old releases, it will be limited to 8.
- * Since there is no interface to query the capacity, the maximal value should
- * be used to allow PMD to create the flow. The validation will be done in the
- * lower driver layer or FW. A failure will be returned if exceeds the maximal
- * supported actions number on the root table.
- * On non-root tables, there is no limitation, but 32 is enough right now.
- */
-#define MLX5_MAX_MODIFY_NUM32
-#define MLX5_ROOT_TBL_MODIFY_NUM   16
-
 /* Modify resource structure */
 struct mlx5_flow_dv_modify_hdr_resource {
struct mlx5_list_entry entry;
void *action; /**< Modify header action object. */
+   uint32_t idx;
/* Key area for hash list matching: */
uint8_t ft_type; /**< Flow table type, Rx or Tx. */
uint8_t actions_num; /**< Number of modification actions. */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1dfd5db361..08704d892a 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5304,6 +5304,45 @@ flow_dv_modify_match_cb(void *tool_ctx __rte_unused,
   memcmp(&ref->ft_type, &resource->ft_type, key_len);
 }
 
+static struct mlx5_indexed_pool *
+flow_dv_modify_ipool_get(struct mlx5_dev_ctx_shared *sh, uint8_t index)
+{
+   struct mlx5_indexed_pool *ipool = __atomic_load_n
+

[dpdk-dev] [PATCH v2 12/22] common/mlx5: add per-lcore cache to hash list utility

2021-06-30 Thread Suanming Mou

From: Matan Azrad 

Using the mlx5 list utility object in the hlist buckets.

This patch moves the list utility object to the common utility, creates
all the clone operations for all the hlist instances in the driver.

Also adjust all the utility callbacks to be generic for both list and
hlist.

Signed-off-by: Matan Azrad 
Acked-by: Suanming Mou 
---
 doc/guides/nics/mlx5.rst|   5 +
 doc/guides/rel_notes/release_21_08.rst  |   6 +
 drivers/common/mlx5/mlx5_common.h   |   2 +
 drivers/common/mlx5/mlx5_common_utils.c | 466 +---
 drivers/common/mlx5/mlx5_common_utils.h | 261 +
 drivers/net/mlx5/linux/mlx5_os.c|  46 +--
 drivers/net/mlx5/mlx5.c |  10 +-
 drivers/net/mlx5/mlx5.h |   1 +
 drivers/net/mlx5/mlx5_flow.c| 155 +---
 drivers/net/mlx5/mlx5_flow.h| 185 +-
 drivers/net/mlx5/mlx5_flow_dv.c | 407 -
 drivers/net/mlx5/mlx5_rx.h  |  13 +-
 drivers/net/mlx5/mlx5_rxq.c |  53 +--
 drivers/net/mlx5/mlx5_utils.c   | 251 -
 drivers/net/mlx5/mlx5_utils.h   | 197 --
 drivers/net/mlx5/windows/mlx5_os.c  |   2 +-
 16 files changed, 1016 insertions(+), 1044 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index eb44a070b1..9bd3846e0d 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -445,6 +445,11 @@ Limitations
   - 256 ports maximum.
   - 4M connections maximum.
 
+- Multiple-thread flow insertion:
+
+  - In order to achieve best insertion rate, application should manage the 
flows on the rte-lcore.
+  - Better to configure ``reclaim_mem_mode`` as 0 to accelerate the flow 
object allocate and release with cache.
+
 Statistics
 --
 
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..f6cd1d137d 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Mellanox mlx5 driver.**
+
+  Updated the Mellanox mlx5 driver with new features and improvements, 
including:
+
+  * Optimize multiple-thread flow insertion rate.
+
 
 Removed Items
 -
diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index 1fbefe0fa6..1809ff1e95 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -14,6 +14,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "mlx5_prm.h"
diff --git a/drivers/common/mlx5/mlx5_common_utils.c 
b/drivers/common/mlx5/mlx5_common_utils.c
index ad2011e858..4e385c616a 100644
--- a/drivers/common/mlx5/mlx5_common_utils.c
+++ b/drivers/common/mlx5/mlx5_common_utils.c
@@ -11,39 +11,324 @@
 #include "mlx5_common_utils.h"
 #include "mlx5_common_log.h"
 
-/* Hash List **/
+/* mlx5 list /
+
+static int
+mlx5_list_init(struct mlx5_list *list, const char *name, void *ctx,
+  bool lcores_share, mlx5_list_create_cb cb_create,
+  mlx5_list_match_cb cb_match,
+  mlx5_list_remove_cb cb_remove,
+  mlx5_list_clone_cb cb_clone,
+  mlx5_list_clone_free_cb cb_clone_free)
+{
+   int i;
+
+   if (!cb_match || !cb_create || !cb_remove || !cb_clone ||
+   !cb_clone_free) {
+   rte_errno = EINVAL;
+   return -EINVAL;
+   }
+   if (name)
+   snprintf(list->name, sizeof(list->name), "%s", name);
+   list->ctx = ctx;
+   list->lcores_share = lcores_share;
+   list->cb_create = cb_create;
+   list->cb_match = cb_match;
+   list->cb_remove = cb_remove;
+   list->cb_clone = cb_clone;
+   list->cb_clone_free = cb_clone_free;
+   rte_rwlock_init(&list->lock);
+   DRV_LOG(DEBUG, "mlx5 list %s initialized.", list->name);
+   for (i = 0; i <= RTE_MAX_LCORE; i++)
+   LIST_INIT(&list->cache[i].h);
+   return 0;
+}
+
+struct mlx5_list *
+mlx5_list_create(const char *name, void *ctx, bool lcores_share,
+mlx5_list_create_cb cb_create,
+mlx5_list_match_cb cb_match,
+mlx5_list_remove_cb cb_remove,
+mlx5_list_clone_cb cb_clone,
+mlx5_list_clone_free_cb cb_clone_free)
+{
+   struct mlx5_list *list;
+
+   list = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*list), 0, SOCKET_ID_ANY);
+   if (!list)
+   return NULL;
+   if (mlx5_list_init(list, name, ctx, lcores_share,
+  cb_create, cb_match, cb_remove, cb_clone,
+  cb_clone_free) != 0) {
+   mlx5_free(list);
+   return NULL;
+   }
+   return li

[dpdk-dev] [PATCH v2 14/22] net/mlx5: adjust the hash bucket size

2021-06-30 Thread Suanming Mou

With the new per core optimization to the list, the hash bucket size
can be tuned to a more accurate number.

This commit adjusts the hash bucket size.

Signed-off-by: Suanming Mou 
---
 drivers/net/mlx5/linux/mlx5_os.c | 2 +-
 drivers/net/mlx5/mlx5.c  | 2 +-
 drivers/net/mlx5/mlx5_defs.h | 6 +++---
 drivers/net/mlx5/mlx5_flow.c | 5 ++---
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index cf573a9a4d..a82dc4db00 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -50,7 +50,7 @@
 #include "mlx5_nl.h"
 #include "mlx5_devx.h"
 
-#define MLX5_TAGS_HLIST_ARRAY_SIZE 8192
+#define MLX5_TAGS_HLIST_ARRAY_SIZE (1 << 15)
 
 #ifndef HAVE_IBV_MLX5_MOD_MPW
 #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 713accf675..8fb7f4442d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -373,7 +373,7 @@ static const struct mlx5_indexed_pool_config 
mlx5_ipool_cfg[] = {
 #define MLX5_FLOW_MIN_ID_POOL_SIZE 512
 #define MLX5_ID_GENERATION_ARRAY_FACTOR 16
 
-#define MLX5_FLOW_TABLE_HLIST_ARRAY_SIZE 4096
+#define MLX5_FLOW_TABLE_HLIST_ARRAY_SIZE 1024
 
 /**
  * Decide whether representor ID is a HPF(host PF) port on BF2.
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 906aa43c5a..ca67ce8213 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -178,15 +178,15 @@
 sizeof(struct rte_ipv4_hdr))
 
 /* Size of the simple hash table for metadata register table. */
-#define MLX5_FLOW_MREG_HTABLE_SZ 4096
+#define MLX5_FLOW_MREG_HTABLE_SZ 64
 #define MLX5_FLOW_MREG_HNAME "MARK_COPY_TABLE"
 #define MLX5_DEFAULT_COPY_ID UINT32_MAX
 
 /* Size of the simple hash table for header modify table. */
-#define MLX5_FLOW_HDR_MODIFY_HTABLE_SZ (1 << 16)
+#define MLX5_FLOW_HDR_MODIFY_HTABLE_SZ (1 << 15)
 
 /* Size of the simple hash table for encap decap table. */
-#define MLX5_FLOW_ENCAP_DECAP_HTABLE_SZ (1 << 16)
+#define MLX5_FLOW_ENCAP_DECAP_HTABLE_SZ (1 << 12)
 
 /* Hairpin TX/RX queue configuration parameters. */
 #define MLX5_HAIRPIN_QUEUE_STRIDE 6
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e6b71a87a0..cb8161f668 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8625,7 +8625,7 @@ mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
DRV_LOG(ERR, "Tunnel ID %d exceed max limit.", id);
return NULL;
}
-   tunnel->groups = mlx5_hlist_create("tunnel groups", 1024, false, true,
+   tunnel->groups = mlx5_hlist_create("tunnel groups", 64, false, true,
   priv->sh,
   mlx5_flow_tunnel_grp2tbl_create_cb,
   mlx5_flow_tunnel_grp2tbl_match_cb,
@@ -8734,8 +8734,7 @@ int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
return -ENOMEM;
LIST_INIT(&thub->tunnels);
rte_spinlock_init(&thub->sl);
-   thub->groups = mlx5_hlist_create("flow groups",
-rte_align32pow2(MLX5_MAX_TABLES),
+   thub->groups = mlx5_hlist_create("flow groups", 64,
 false, true, sh,
 mlx5_flow_tunnel_grp2tbl_create_cb,
 mlx5_flow_tunnel_grp2tbl_match_cb,
-- 
2.25.1

[dpdk-dev] [PATCH v2 15/22] common/mlx5: allocate cache list memory individually

2021-06-30 Thread Suanming Mou

Currently, the list's local cache instance memory is allocated with
the list. As the local cache instance array size is RTE_MAX_LCORE,
most of the cases the system will only have very limited cores.
allocate the instance memory individually per core will be more
economic to the memory.

This commit changes the instance array to pointer array, allocate
the local cache memory only when the core is to be used.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/common/mlx5/mlx5_common_utils.c | 62 ++---
 drivers/common/mlx5/mlx5_common_utils.h |  2 +-
 2 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common_utils.c 
b/drivers/common/mlx5/mlx5_common_utils.c
index 4e385c616a..f75b1cb0da 100644
--- a/drivers/common/mlx5/mlx5_common_utils.c
+++ b/drivers/common/mlx5/mlx5_common_utils.c
@@ -15,14 +15,13 @@
 
 static int
 mlx5_list_init(struct mlx5_list *list, const char *name, void *ctx,
-  bool lcores_share, mlx5_list_create_cb cb_create,
+  bool lcores_share, struct mlx5_list_cache *gc,
+  mlx5_list_create_cb cb_create,
   mlx5_list_match_cb cb_match,
   mlx5_list_remove_cb cb_remove,
   mlx5_list_clone_cb cb_clone,
   mlx5_list_clone_free_cb cb_clone_free)
 {
-   int i;
-
if (!cb_match || !cb_create || !cb_remove || !cb_clone ||
!cb_clone_free) {
rte_errno = EINVAL;
@@ -38,9 +37,11 @@ mlx5_list_init(struct mlx5_list *list, const char *name, 
void *ctx,
list->cb_clone = cb_clone;
list->cb_clone_free = cb_clone_free;
rte_rwlock_init(&list->lock);
+   if (lcores_share) {
+   list->cache[RTE_MAX_LCORE] = gc;
+   LIST_INIT(&list->cache[RTE_MAX_LCORE]->h);
+   }
DRV_LOG(DEBUG, "mlx5 list %s initialized.", list->name);
-   for (i = 0; i <= RTE_MAX_LCORE; i++)
-   LIST_INIT(&list->cache[i].h);
return 0;
 }
 
@@ -53,11 +54,16 @@ mlx5_list_create(const char *name, void *ctx, bool 
lcores_share,
 mlx5_list_clone_free_cb cb_clone_free)
 {
struct mlx5_list *list;
+   struct mlx5_list_cache *gc = NULL;
 
-   list = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*list), 0, SOCKET_ID_ANY);
+   list = mlx5_malloc(MLX5_MEM_ZERO,
+  sizeof(*list) + (lcores_share ? sizeof(*gc) : 0),
+  0, SOCKET_ID_ANY);
if (!list)
return NULL;
-   if (mlx5_list_init(list, name, ctx, lcores_share,
+   if (lcores_share)
+   gc = (struct mlx5_list_cache *)(list + 1);
+   if (mlx5_list_init(list, name, ctx, lcores_share, gc,
   cb_create, cb_match, cb_remove, cb_clone,
   cb_clone_free) != 0) {
mlx5_free(list);
@@ -69,7 +75,8 @@ mlx5_list_create(const char *name, void *ctx, bool 
lcores_share,
 static struct mlx5_list_entry *
 __list_lookup(struct mlx5_list *list, int lcore_index, void *ctx, bool reuse)
 {
-   struct mlx5_list_entry *entry = LIST_FIRST(&list->cache[lcore_index].h);
+   struct mlx5_list_entry *entry =
+   LIST_FIRST(&list->cache[lcore_index]->h);
uint32_t ret;
 
while (entry != NULL) {
@@ -121,14 +128,14 @@ mlx5_list_cache_insert(struct mlx5_list *list, int 
lcore_index,
lentry->ref_cnt = 1u;
lentry->gentry = gentry;
lentry->lcore_idx = (uint32_t)lcore_index;
-   LIST_INSERT_HEAD(&list->cache[lcore_index].h, lentry, next);
+   LIST_INSERT_HEAD(&list->cache[lcore_index]->h, lentry, next);
return lentry;
 }
 
 static void
 __list_cache_clean(struct mlx5_list *list, int lcore_index)
 {
-   struct mlx5_list_cache *c = &list->cache[lcore_index];
+   struct mlx5_list_cache *c = list->cache[lcore_index];
struct mlx5_list_entry *entry = LIST_FIRST(&c->h);
uint32_t inv_cnt = __atomic_exchange_n(&c->inv_cnt, 0,
   __ATOMIC_RELAXED);
@@ -161,6 +168,17 @@ mlx5_list_register(struct mlx5_list *list, void *ctx)
rte_errno = ENOTSUP;
return NULL;
}
+   if (unlikely(!list->cache[lcore_index])) {
+   list->cache[lcore_index] = mlx5_malloc(0,
+   sizeof(struct mlx5_list_cache),
+   RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+   if (!list->cache[lcore_index]) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+   list->cache[lcore_index]->inv_cnt = 0;
+   LIST_INIT(&list->cache[lcore_index]->h);
+   }
/* 0. Free entries that was invalidated by other lcores. */
__list_cache_clean(list, lcore_index);
/* 1. Lookup in local cache. */
@@ -186,7 +204,7 @@ mlx5_list_register(struct mlx5_list *list, void *ctx)

[dpdk-dev] [PATCH v2 16/22] net/mlx5: enable index pool per-core cache

2021-06-30 Thread Suanming Mou

This commit enables the tag and header modify action index pool
per-core cache in non-reclaim memory mode.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.c | 4 +++-
 drivers/net/mlx5/mlx5.h | 1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8fb7f4442d..bf1463c289 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -214,7 +214,8 @@ static const struct mlx5_indexed_pool_config 
mlx5_ipool_cfg[] = {
.grow_trunk = 3,
.grow_shift = 2,
.need_lock = 1,
-   .release_mem_en = 1,
+   .release_mem_en = 0,
+   .per_core_cache = (1 << 16),
.malloc = mlx5_malloc,
.free = mlx5_free,
.type = "mlx5_tag_ipool",
@@ -1128,6 +1129,7 @@ mlx5_alloc_shared_dev_ctx(const struct 
mlx5_dev_spawn_data *spawn,
}
sh->refcnt = 1;
sh->max_port = spawn->max_port;
+   sh->reclaim_mode = config->reclaim_mode;
strncpy(sh->ibdev_name, mlx5_os_get_ctx_device_name(sh->ctx),
sizeof(sh->ibdev_name) - 1);
strncpy(sh->ibdev_path, mlx5_os_get_ctx_device_path(sh->ctx),
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5774f63244..516f3ffae5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1093,6 +1093,7 @@ struct mlx5_dev_ctx_shared {
uint32_t qp_ts_format:2; /* QP timestamp formats supported. */
uint32_t meter_aso_en:1; /* Flow Meter ASO is supported. */
uint32_t ct_aso_en:1; /* Connection Tracking ASO is supported. */
+   uint32_t reclaim_mode:1; /* Reclaim memory. */
uint32_t max_port; /* Maximal IB device port index. */
struct mlx5_bond_info bond; /* Bonding information. */
void *ctx; /* Verbs/DV/DevX context. */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 08704d892a..f79c60e489 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5321,7 +5321,8 @@ flow_dv_modify_ipool_get(struct mlx5_dev_ctx_shared *sh, 
uint8_t index)
   .grow_trunk = 3,
   .grow_shift = 2,
   .need_lock = 1,
-  .release_mem_en = 1,
+  .release_mem_en = !!sh->reclaim_mode,
+  .per_core_cache = sh->reclaim_mode ? 0 : (1 << 16),
   .malloc = mlx5_malloc,
   .free = mlx5_free,
   .type = "mlx5_modify_action_resource",
-- 
2.25.1

[dpdk-dev] [PATCH v2 17/22] net/mlx5: optimize hash list table allocate on demand

2021-06-30 Thread Suanming Mou

Currently, all the hash list tables are allocated during start up.
Since different applications may only use dedicated limited actions,
optimized the hash list table allocate on demand will save initial
memory.

This commit optimizes hash list table allocate on demand.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 44 +
 drivers/net/mlx5/mlx5_defs.h   |  6 +++
 drivers/net/mlx5/mlx5_flow_dv.c| 79 --
 drivers/net/mlx5/windows/mlx5_os.c |  2 -
 4 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index a82dc4db00..75324e35d8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -50,8 +50,6 @@
 #include "mlx5_nl.h"
 #include "mlx5_devx.h"
 
-#define MLX5_TAGS_HLIST_ARRAY_SIZE (1 << 15)
-
 #ifndef HAVE_IBV_MLX5_MOD_MPW
 #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2)
 #define MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3)
@@ -312,46 +310,6 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
  flow_dv_dest_array_clone_free_cb);
if (!sh->dest_array_list)
goto error;
-   /* Create tags hash list table. */
-   snprintf(s, sizeof(s), "%s_tags", sh->ibdev_name);
-   sh->tag_table = mlx5_hlist_create(s, MLX5_TAGS_HLIST_ARRAY_SIZE, false,
- false, sh, flow_dv_tag_create_cb,
- flow_dv_tag_match_cb,
- flow_dv_tag_remove_cb,
- flow_dv_tag_clone_cb,
- flow_dv_tag_clone_free_cb);
-   if (!sh->tag_table) {
-   DRV_LOG(ERR, "tags with hash creation failed.");
-   err = ENOMEM;
-   goto error;
-   }
-   snprintf(s, sizeof(s), "%s_hdr_modify", sh->ibdev_name);
-   sh->modify_cmds = mlx5_hlist_create(s, MLX5_FLOW_HDR_MODIFY_HTABLE_SZ,
-   true, false, sh,
-   flow_dv_modify_create_cb,
-   flow_dv_modify_match_cb,
-   flow_dv_modify_remove_cb,
-   flow_dv_modify_clone_cb,
-   flow_dv_modify_clone_free_cb);
-   if (!sh->modify_cmds) {
-   DRV_LOG(ERR, "hdr modify hash creation failed");
-   err = ENOMEM;
-   goto error;
-   }
-   snprintf(s, sizeof(s), "%s_encaps_decaps", sh->ibdev_name);
-   sh->encaps_decaps = mlx5_hlist_create(s,
- MLX5_FLOW_ENCAP_DECAP_HTABLE_SZ,
- true, true, sh,
- flow_dv_encap_decap_create_cb,
- flow_dv_encap_decap_match_cb,
- flow_dv_encap_decap_remove_cb,
- flow_dv_encap_decap_clone_cb,
-flow_dv_encap_decap_clone_free_cb);
-   if (!sh->encaps_decaps) {
-   DRV_LOG(ERR, "encap decap hash creation failed");
-   err = ENOMEM;
-   goto error;
-   }
 #endif
 #ifdef HAVE_MLX5DV_DR
void *domain;
@@ -396,7 +354,7 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
goto error;
}
 #endif
-   if (!sh->tunnel_hub)
+   if (!sh->tunnel_hub && priv->config.dv_miss_info)
err = mlx5_alloc_tunnel_hub(sh);
if (err) {
DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index ca67ce8213..fe86bb40d3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,6 +188,12 @@
 /* Size of the simple hash table for encap decap table. */
 #define MLX5_FLOW_ENCAP_DECAP_HTABLE_SZ (1 << 12)
 
+/* Size of the hash table for tag table. */
+#define MLX5_TAGS_HLIST_ARRAY_SIZE (1 << 15)
+
+/* Size fo the hash table for SFT table. */
+#define MLX5_FLOW_SFT_HLIST_ARRAY_SIZE 4096
+
 /* Hairpin TX/RX queue configuration parameters. */
 #define MLX5_HAIRPIN_QUEUE_STRIDE 6
 #define MLX5_HAIRPIN_JUMBO_LOG_SIZE (14 + 2)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f79c60e489..fe610594c5 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -310,6 +310,41 @@ mlx5_flow_tunnel_ip_check(const struct rte_flow_item *item 
__rte_unused,
}
 }
 
+static inline struct mlx5_hlist *
+flow_dv_hlist_prepare(struct mlx5_dev_ctx_shared *sh, struct mlx5_hlist **phl,
+const char *name, uint32_t size, bool direct_key,
+

[dpdk-dev] [PATCH v2 18/22] common/mlx5: optimize cache list object memory

2021-06-30 Thread Suanming Mou

Currently, hash list uses the cache list as bucket list. The list
in the buckets have the same name, ctx and callbacks. This wastes
the memory.

This commit abstracts all the name, ctx and callback members in the
list to a constant struct and others to the inconstant struct, uses
the wrapper functions to satisfy both hash list and cache list can
set the list constant and inconstant struct individually.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/common/mlx5/mlx5_common_utils.c | 295 ++--
 drivers/common/mlx5/mlx5_common_utils.h |  45 ++--
 2 files changed, 201 insertions(+), 139 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common_utils.c 
b/drivers/common/mlx5/mlx5_common_utils.c
index f75b1cb0da..858c8d8164 100644
--- a/drivers/common/mlx5/mlx5_common_utils.c
+++ b/drivers/common/mlx5/mlx5_common_utils.c
@@ -14,34 +14,16 @@
 /* mlx5 list /
 
 static int
-mlx5_list_init(struct mlx5_list *list, const char *name, void *ctx,
-  bool lcores_share, struct mlx5_list_cache *gc,
-  mlx5_list_create_cb cb_create,
-  mlx5_list_match_cb cb_match,
-  mlx5_list_remove_cb cb_remove,
-  mlx5_list_clone_cb cb_clone,
-  mlx5_list_clone_free_cb cb_clone_free)
+mlx5_list_init(struct mlx5_list_inconst *l_inconst,
+  struct mlx5_list_const *l_const,
+  struct mlx5_list_cache *gc)
 {
-   if (!cb_match || !cb_create || !cb_remove || !cb_clone ||
-   !cb_clone_free) {
-   rte_errno = EINVAL;
-   return -EINVAL;
+   rte_rwlock_init(&l_inconst->lock);
+   if (l_const->lcores_share) {
+   l_inconst->cache[RTE_MAX_LCORE] = gc;
+   LIST_INIT(&l_inconst->cache[RTE_MAX_LCORE]->h);
}
-   if (name)
-   snprintf(list->name, sizeof(list->name), "%s", name);
-   list->ctx = ctx;
-   list->lcores_share = lcores_share;
-   list->cb_create = cb_create;
-   list->cb_match = cb_match;
-   list->cb_remove = cb_remove;
-   list->cb_clone = cb_clone;
-   list->cb_clone_free = cb_clone_free;
-   rte_rwlock_init(&list->lock);
-   if (lcores_share) {
-   list->cache[RTE_MAX_LCORE] = gc;
-   LIST_INIT(&list->cache[RTE_MAX_LCORE]->h);
-   }
-   DRV_LOG(DEBUG, "mlx5 list %s initialized.", list->name);
+   DRV_LOG(DEBUG, "mlx5 list %s initialized.", l_const->name);
return 0;
 }
 
@@ -56,16 +38,30 @@ mlx5_list_create(const char *name, void *ctx, bool 
lcores_share,
struct mlx5_list *list;
struct mlx5_list_cache *gc = NULL;
 
+   if (!cb_match || !cb_create || !cb_remove || !cb_clone ||
+   !cb_clone_free) {
+   rte_errno = EINVAL;
+   return NULL;
+   }
list = mlx5_malloc(MLX5_MEM_ZERO,
   sizeof(*list) + (lcores_share ? sizeof(*gc) : 0),
   0, SOCKET_ID_ANY);
+
if (!list)
return NULL;
+   if (name)
+   snprintf(list->l_const.name,
+sizeof(list->l_const.name), "%s", name);
+   list->l_const.ctx = ctx;
+   list->l_const.lcores_share = lcores_share;
+   list->l_const.cb_create = cb_create;
+   list->l_const.cb_match = cb_match;
+   list->l_const.cb_remove = cb_remove;
+   list->l_const.cb_clone = cb_clone;
+   list->l_const.cb_clone_free = cb_clone_free;
if (lcores_share)
gc = (struct mlx5_list_cache *)(list + 1);
-   if (mlx5_list_init(list, name, ctx, lcores_share, gc,
-  cb_create, cb_match, cb_remove, cb_clone,
-  cb_clone_free) != 0) {
+   if (mlx5_list_init(&list->l_inconst, &list->l_const, gc) != 0) {
mlx5_free(list);
return NULL;
}
@@ -73,19 +69,21 @@ mlx5_list_create(const char *name, void *ctx, bool 
lcores_share,
 }
 
 static struct mlx5_list_entry *
-__list_lookup(struct mlx5_list *list, int lcore_index, void *ctx, bool reuse)
+__list_lookup(struct mlx5_list_inconst *l_inconst,
+ struct mlx5_list_const *l_const,
+ int lcore_index, void *ctx, bool reuse)
 {
struct mlx5_list_entry *entry =
-   LIST_FIRST(&list->cache[lcore_index]->h);
+   LIST_FIRST(&l_inconst->cache[lcore_index]->h);
uint32_t ret;
 
while (entry != NULL) {
-   if (list->cb_match(list->ctx, entry, ctx) == 0) {
+   if (l_const->cb_match(l_const->ctx, entry, ctx) == 0) {
if (reuse) {
ret = __atomic_add_fetch(&entry->ref_cnt, 1,
 __ATOMIC_RELAXED) - 1;
DRV_LOG(DEBUG, "mlx5 list %s entry %p ref: %u.",
-   list->n

[dpdk-dev] [PATCH v2 19/22] net/mlx5: change memory release configuration

2021-06-30 Thread Suanming Mou

This commit changes the index pool memory release configuration
to 0 when memory reclaim mode is not required.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index bf1463c289..6b7225e55d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -797,6 +797,8 @@ mlx5_flow_ipool_create(struct mlx5_dev_ctx_shared *sh,
if (config->reclaim_mode) {
cfg.release_mem_en = 1;
cfg.per_core_cache = 0;
+   } else {
+   cfg.release_mem_en = 0;
}
sh->ipool[i] = mlx5_ipool_create(&cfg);
}
-- 
2.25.1

[dpdk-dev] [PATCH v2 20/22] net/mlx5: support index pool none local core operations

2021-06-30 Thread Suanming Mou

This commit supports the index pool none local core operations with
an extra cache.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_utils.c | 75 +--
 drivers/net/mlx5/mlx5_utils.h |  3 +-
 2 files changed, 56 insertions(+), 22 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_utils.c b/drivers/net/mlx5/mlx5_utils.c
index 94abe79860..c34d6d62a8 100644
--- a/drivers/net/mlx5/mlx5_utils.c
+++ b/drivers/net/mlx5/mlx5_utils.c
@@ -114,6 +114,7 @@ mlx5_ipool_create(struct mlx5_indexed_pool_config *cfg)
mlx5_trunk_idx_offset_get(pool, TRUNK_MAX_IDX + 1);
if (!cfg->per_core_cache)
pool->free_list = TRUNK_INVALID;
+   rte_spinlock_init(&pool->nlcore_lock);
return pool;
 }
 
@@ -354,20 +355,14 @@ mlx5_ipool_allocate_from_global(struct mlx5_indexed_pool 
*pool, int cidx)
 }
 
 static void *
-mlx5_ipool_get_cache(struct mlx5_indexed_pool *pool, uint32_t idx)
+_mlx5_ipool_get_cache(struct mlx5_indexed_pool *pool, int cidx, uint32_t idx)
 {
struct mlx5_indexed_trunk *trunk;
struct mlx5_indexed_cache *lc;
uint32_t trunk_idx;
uint32_t entry_idx;
-   int cidx;
 
MLX5_ASSERT(idx);
-   cidx = rte_lcore_index(rte_lcore_id());
-   if (unlikely(cidx == -1)) {
-   rte_errno = ENOTSUP;
-   return NULL;
-   }
lc = mlx5_ipool_update_global_cache(pool, cidx);
idx -= 1;
trunk_idx = mlx5_trunk_idx_get(pool, idx);
@@ -378,15 +373,27 @@ mlx5_ipool_get_cache(struct mlx5_indexed_pool *pool, 
uint32_t idx)
 }
 
 static void *
-mlx5_ipool_malloc_cache(struct mlx5_indexed_pool *pool, uint32_t *idx)
+mlx5_ipool_get_cache(struct mlx5_indexed_pool *pool, uint32_t idx)
 {
+   void *entry;
int cidx;
 
cidx = rte_lcore_index(rte_lcore_id());
if (unlikely(cidx == -1)) {
-   rte_errno = ENOTSUP;
-   return NULL;
+   cidx = RTE_MAX_LCORE;
+   rte_spinlock_lock(&pool->nlcore_lock);
}
+   entry = _mlx5_ipool_get_cache(pool, cidx, idx);
+   if (unlikely(cidx == RTE_MAX_LCORE))
+   rte_spinlock_unlock(&pool->nlcore_lock);
+   return entry;
+}
+
+
+static void *
+_mlx5_ipool_malloc_cache(struct mlx5_indexed_pool *pool, int cidx,
+uint32_t *idx)
+{
if (unlikely(!pool->cache[cidx])) {
pool->cache[cidx] = pool->cfg.malloc(MLX5_MEM_ZERO,
sizeof(struct mlx5_ipool_per_lcore) +
@@ -399,29 +406,40 @@ mlx5_ipool_malloc_cache(struct mlx5_indexed_pool *pool, 
uint32_t *idx)
} else if (pool->cache[cidx]->len) {
pool->cache[cidx]->len--;
*idx = pool->cache[cidx]->idx[pool->cache[cidx]->len];
-   return mlx5_ipool_get_cache(pool, *idx);
+   return _mlx5_ipool_get_cache(pool, cidx, *idx);
}
/* Not enough idx in global cache. Keep fetching from global. */
*idx = mlx5_ipool_allocate_from_global(pool, cidx);
if (unlikely(!(*idx)))
return NULL;
-   return mlx5_ipool_get_cache(pool, *idx);
+   return _mlx5_ipool_get_cache(pool, cidx, *idx);
 }
 
-static void
-mlx5_ipool_free_cache(struct mlx5_indexed_pool *pool, uint32_t idx)
+static void *
+mlx5_ipool_malloc_cache(struct mlx5_indexed_pool *pool, uint32_t *idx)
 {
+   void *entry;
int cidx;
+
+   cidx = rte_lcore_index(rte_lcore_id());
+   if (unlikely(cidx == -1)) {
+   cidx = RTE_MAX_LCORE;
+   rte_spinlock_lock(&pool->nlcore_lock);
+   }
+   entry = _mlx5_ipool_malloc_cache(pool, cidx, idx);
+   if (unlikely(cidx == RTE_MAX_LCORE))
+   rte_spinlock_unlock(&pool->nlcore_lock);
+   return entry;
+}
+
+static void
+_mlx5_ipool_free_cache(struct mlx5_indexed_pool *pool, int cidx, uint32_t idx)
+{
struct mlx5_ipool_per_lcore *ilc;
struct mlx5_indexed_cache *gc, *olc = NULL;
uint32_t reclaim_num = 0;
 
MLX5_ASSERT(idx);
-   cidx = rte_lcore_index(rte_lcore_id());
-   if (unlikely(cidx == -1)) {
-   rte_errno = ENOTSUP;
-   return;
-   }
/*
 * When index was allocated on core A but freed on core B. In this
 * case check if local cache on core B was allocated before.
@@ -464,6 +482,21 @@ mlx5_ipool_free_cache(struct mlx5_indexed_pool *pool, 
uint32_t idx)
pool->cache[cidx]->len++;
 }
 
+static void
+mlx5_ipool_free_cache(struct mlx5_indexed_pool *pool, uint32_t idx)
+{
+   int cidx;
+
+   cidx = rte_lcore_index(rte_lcore_id());
+   if (unlikely(cidx == -1)) {
+   cidx = RTE_MAX_LCORE;
+   rte_spinlock_lock(&pool->nlcore_lock);
+   }
+   _mlx5_ipool_free_cache(pool, cidx, idx);
+   if (unlikely(cidx == RTE_MAX_LCORE))
+   rte_spinlock_unlock(&pool->nlcore_lock);
+}
+
 void *
 mlx5_ipool_

[dpdk-dev] [PATCH v2 21/22] net/mlx5: support list none local core operations

2021-06-30 Thread Suanming Mou

This commit supports the list none local core operations with
an extra sub-list.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/common/mlx5/mlx5_common_utils.c | 92 +
 drivers/common/mlx5/mlx5_common_utils.h |  9 ++-
 2 files changed, 71 insertions(+), 30 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common_utils.c 
b/drivers/common/mlx5/mlx5_common_utils.c
index 858c8d8164..d58d0d08ab 100644
--- a/drivers/common/mlx5/mlx5_common_utils.c
+++ b/drivers/common/mlx5/mlx5_common_utils.c
@@ -20,8 +20,8 @@ mlx5_list_init(struct mlx5_list_inconst *l_inconst,
 {
rte_rwlock_init(&l_inconst->lock);
if (l_const->lcores_share) {
-   l_inconst->cache[RTE_MAX_LCORE] = gc;
-   LIST_INIT(&l_inconst->cache[RTE_MAX_LCORE]->h);
+   l_inconst->cache[MLX5_LIST_GLOBAL] = gc;
+   LIST_INIT(&l_inconst->cache[MLX5_LIST_GLOBAL]->h);
}
DRV_LOG(DEBUG, "mlx5 list %s initialized.", l_const->name);
return 0;
@@ -59,6 +59,7 @@ mlx5_list_create(const char *name, void *ctx, bool 
lcores_share,
list->l_const.cb_remove = cb_remove;
list->l_const.cb_clone = cb_clone;
list->l_const.cb_clone_free = cb_clone_free;
+   rte_spinlock_init(&list->l_const.nlcore_lock);
if (lcores_share)
gc = (struct mlx5_list_cache *)(list + 1);
if (mlx5_list_init(&list->l_inconst, &list->l_const, gc) != 0) {
@@ -85,11 +86,11 @@ __list_lookup(struct mlx5_list_inconst *l_inconst,
DRV_LOG(DEBUG, "mlx5 list %s entry %p ref: %u.",
l_const->name, (void *)entry,
entry->ref_cnt);
-   } else if (lcore_index < RTE_MAX_LCORE) {
+   } else if (lcore_index < MLX5_LIST_GLOBAL) {
ret = __atomic_load_n(&entry->ref_cnt,
  __ATOMIC_RELAXED);
}
-   if (likely(ret != 0 || lcore_index == RTE_MAX_LCORE))
+   if (likely(ret != 0 || lcore_index == MLX5_LIST_GLOBAL))
return entry;
if (reuse && ret == 0)
entry->ref_cnt--; /* Invalid entry. */
@@ -107,10 +108,11 @@ _mlx5_list_lookup(struct mlx5_list_inconst *l_inconst,
int i;
 
rte_rwlock_read_lock(&l_inconst->lock);
-   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   for (i = 0; i < MLX5_LIST_GLOBAL; i++) {
if (!l_inconst->cache[i])
continue;
-   entry = __list_lookup(l_inconst, l_const, i, ctx, false);
+   entry = __list_lookup(l_inconst, l_const, i,
+ ctx, false);
if (entry)
break;
}
@@ -170,18 +172,11 @@ __list_cache_clean(struct mlx5_list_inconst *l_inconst,
 static inline struct mlx5_list_entry *
 _mlx5_list_register(struct mlx5_list_inconst *l_inconst,
struct mlx5_list_const *l_const,
-   void *ctx)
+   void *ctx, int lcore_index)
 {
struct mlx5_list_entry *entry, *local_entry;
volatile uint32_t prev_gen_cnt = 0;
-   int lcore_index = rte_lcore_index(rte_lcore_id());
-
MLX5_ASSERT(l_inconst);
-   MLX5_ASSERT(lcore_index < RTE_MAX_LCORE);
-   if (unlikely(lcore_index == -1)) {
-   rte_errno = ENOTSUP;
-   return NULL;
-   }
if (unlikely(!l_inconst->cache[lcore_index])) {
l_inconst->cache[lcore_index] = mlx5_malloc(0,
sizeof(struct mlx5_list_cache),
@@ -202,7 +197,7 @@ _mlx5_list_register(struct mlx5_list_inconst *l_inconst,
if (l_const->lcores_share) {
/* 2. Lookup with read lock on global list, reuse if found. */
rte_rwlock_read_lock(&l_inconst->lock);
-   entry = __list_lookup(l_inconst, l_const, RTE_MAX_LCORE,
+   entry = __list_lookup(l_inconst, l_const, MLX5_LIST_GLOBAL,
  ctx, true);
if (likely(entry)) {
rte_rwlock_read_unlock(&l_inconst->lock);
@@ -241,7 +236,7 @@ _mlx5_list_register(struct mlx5_list_inconst *l_inconst,
if (unlikely(prev_gen_cnt != l_inconst->gen_cnt)) {
struct mlx5_list_entry *oentry = __list_lookup(l_inconst,
   l_const,
-  RTE_MAX_LCORE,
+  MLX5_LIST_GLOBAL,
   ctx, true);
 
if (unlikely(oentry)) {
@@ -255,7 +250,7 @@ _mlx5_list_register(struct mlx5_list_inconst *l_inconst,
}

[dpdk-dev] [PATCH v2 22/22] net/mlx5: optimize Rx queue match

2021-06-30 Thread Suanming Mou

As hrxq struct has the indirect table pointer, while matching the
hrxq, better to use the hrxq indirect table instead of searching
from the list.

This commit optimizes the hrxq indirect table matching.

Signed-off-by: Suanming Mou 
Acked-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_rxq.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 7893b3edd4..23685d7654 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2094,23 +2094,19 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 }
 
 int
-mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry, void *cb_ctx)
+mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
+  void *cb_ctx)
 {
-   struct rte_eth_dev *dev = tool_ctx;
struct mlx5_flow_cb_ctx *ctx = cb_ctx;
struct mlx5_flow_rss_desc *rss_desc = ctx->data;
struct mlx5_hrxq *hrxq = container_of(entry, typeof(*hrxq), entry);
-   struct mlx5_ind_table_obj *ind_tbl;
 
-   if (hrxq->rss_key_len != rss_desc->key_len ||
+   return (hrxq->rss_key_len != rss_desc->key_len ||
memcmp(hrxq->rss_key, rss_desc->key, rss_desc->key_len) ||
-   hrxq->hash_fields != rss_desc->hash_fields)
-   return 1;
-   ind_tbl = mlx5_ind_table_obj_get(dev, rss_desc->queue,
-rss_desc->queue_num);
-   if (ind_tbl)
-   mlx5_ind_table_obj_release(dev, ind_tbl, hrxq->standalone);
-   return ind_tbl != hrxq->ind_table;
+   hrxq->hash_fields != rss_desc->hash_fields ||
+   hrxq->ind_table->queues_n != rss_desc->queue_num ||
+   memcmp(hrxq->ind_table->queues, rss_desc->queue,
+   rss_desc->queue_num * sizeof(rss_desc->queue[0])));
 }
 
 /**
-- 
2.25.1

[dpdk-dev] [PATCH] test: fix crypto_op length for sessionless case

2021-06-30 Thread Abhinandan Gujjar

Currently, private_data_offset for the sessionless is computed
wrongly which includes extra bytes added because of using
sizeof(struct rte_crypto_sym_xform) * 2) instead of
(sizeof(union rte_event_crypto_metadata)). Due to this buffer
overflow, the corruption was leading to test application
crash while freeing the ops mempool.

Fixes: 3c2c535ecfc0 ("test: add event crypto adapter auto-test")
Reported-by: ciara.po...@intel.com

Signed-off-by: Abhinandan Gujjar 
---
 app/test/test_event_crypto_adapter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/test/test_event_crypto_adapter.c 
b/app/test/test_event_crypto_adapter.c
index f689bc1f2..688ac0b2f 100644
--- a/app/test/test_event_crypto_adapter.c
+++ b/app/test/test_event_crypto_adapter.c
@@ -229,7 +229,7 @@ test_op_forward_mode(uint8_t session_less)
first_xform = &cipher_xform;
sym_op->xform = first_xform;
uint32_t len = IV_OFFSET + MAXIMUM_IV_LENGTH +
-   (sizeof(struct rte_crypto_sym_xform) * 2);
+   (sizeof(union rte_event_crypto_metadata));
op->private_data_offset = len;
/* Fill in private data information */
rte_memcpy(&m_data.response_info, &response_info,
@@ -424,7 +424,7 @@ test_op_new_mode(uint8_t session_less)
first_xform = &cipher_xform;
sym_op->xform = first_xform;
uint32_t len = IV_OFFSET + MAXIMUM_IV_LENGTH +
-   (sizeof(struct rte_crypto_sym_xform) * 2);
+   (sizeof(union rte_event_crypto_metadata));
op->private_data_offset = len;
/* Fill in private data information */
rte_memcpy(&m_data.response_info, &response_info,
-- 
2.25.1

[dpdk-dev] [PATCH v6] build: use platform for generic and native builds

2021-06-30 Thread Juraj Linkeš

The current meson option 'machine' should only specify the ISA, which is
not sufficient for Arm, where setting ISA implies other settings as well
(and is used in Arm configuration as such).
Use the existing 'platform' meson option to differentiate the type of
the build (native/generic) and set ISA accordingly, unless the user
chooses to override it with a new option, 'cpu_instruction_set'.
The 'machine' option set the ISA in x86 builds and set native/default
'build type' in aarch64 builds. These two new variables, 'platform' and
'cpu_instruction_set', now properly set both ISA and build type for all
architectures in a uniform manner.
The 'machine' option also doesn't describe very well what it sets. The
new option, 'cpu_instruction_set', is much more descriptive. Keep
'machine' for backwards compatibility.

Signed-off-by: Juraj Linkeš 
---
 config/arm/meson.build  | 33 ++---
 config/meson.build  | 55 +
 config/ppc/meson.build  |  2 +-
 devtools/test-meson-builds.sh   |  9 ++---
 doc/guides/linux_gsg/build_dpdk.rst | 33 -
 meson_options.txt   | 10 +++---
 6 files changed, 104 insertions(+), 38 deletions(-)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 9b147c0b93..77ee5fabfc 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -380,19 +380,23 @@ if dpdk_conf.get('RTE_ARCH_32')
 machine_args += '-mfpu=neon'
 else
 # aarch64 build
-soc = get_option('platform')
+# for backwards compatibility:
+#   machine=native is the same behavior as platform=native
+#   machine=generic/default is the same as platform=generic
+if machine != 'auto'
+# cpu_instruction_set holds the proper value - native, generic or cpu
+# the old behavior only distinguished between generic and native build
+if cpu_instruction_set == 'generic'
+soc = 'generic'
+else
+soc = 'native'
+endif
+else
+soc = platform
+endif
 soc_config = {}
 if not meson.is_cross_build()
-if machine == 'generic'
-# generic build
-if soc != ''
-error('Building for a particular platform is unsupported with 
generic build.')
-endif
-implementer_id = 'generic'
-part_number = 'generic'
-elif soc != ''
-soc_config = socs.get(soc, {'not_supported': true})
-else
+if soc == 'native'
 # native build
 # The script returns ['Implementer', 'Variant', 'Architecture',
 # 'Primary Part number', 'Revision']
@@ -406,6 +410,9 @@ else
 else
 error('Error when getting Arm Implementer ID and part number.')
 endif
+else
+# SoC build
+soc_config = socs.get(soc, {'not_supported': true})
 endif
 else
 # cross build
@@ -437,7 +444,7 @@ else
 else
 error('Unsupported Arm implementer: @0@. '.format(implementer_id) +
   'Please add support for it or use the generic ' +
-  '(-Dmachine=generic) build.')
+  '(-Dplatform=generic) build.')
 endif
 
 message('Arm implementer: ' + implementer_config['description'])
@@ -452,7 +459,7 @@ else
 error('Unsupported part number @0@ of implementer @1@. '
   .format(part_number, implementer_id) +
   'Please add support for it or use the generic ' +
-  '(-Dmachine=generic) build.')
+  '(-Dplatform=generic) build.')
 endif
 
 # add/overwrite flags in the proper order
diff --git a/config/meson.build b/config/meson.build
index 017bb2efbb..77826452b4 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -65,43 +65,68 @@ endif
 disable_drivers = ''
 enable_drivers = ''
 
-# set the machine type and cflags for it
+platform = get_option('platform')
+
+# set the cpu_instruction_set and cflags for it
 if meson.is_cross_build()
-machine = host_machine.cpu()
+cpu_instruction_set = host_machine.cpu()
 else
+cpu_instruction_set = get_option('cpu_instruction_set')
 machine = get_option('machine')
+if machine != 'auto'
+warning('The "machine" option is deprecated. ' +
+'Please use "cpu_instruction_set" instead.')
+if cpu_instruction_set != 'auto'
+error('Setting both "machine" and ' +
+'"cpu_instruction_set" is unsupported.')
+endif
+cpu_instruction_set = machine
+if cpu_instruction_set == 'default'
+cpu_instruction_set = 'generic'
+endif
+endif
+endif
+
+if platform == 'native'
+if cpu_instruction_set == 'auto'
+cpu_instruction_set = 'native'
+endif
+elif platform == 'generic'
+if cpu_instruction_set == 'auto'
+cpu_instruction_set = 'generic'
+endif
 endif
 
-# machine type 'generic' is spec

[dpdk-dev] [PATCH v2 0/4] net/tap: fix Rx cksum

2021-06-30 Thread Olivier Matz

This patchset fixes the Rx checksum flags in net/tap
driver. The first two patches are the effective fixes.

The last 2 patches introduce a new checksum API to
verify a L4 checksum and its unt test, in order to
simplify the net/tap code, or any other code that has
the same needs.

v2:

* clarify why RTE_PTYPE_L3_IPV4_EXT_UNKNOWN cannot happen in
  tap_verify_csum() (patch 1)
* align style of rte_ipv6_udptcp_cksum_verify() to
  rte_ipv4_udptcp_cksum_verify() (patch 3)
* clarify comment above rte_ipv4_udptcp_cksum_verify() and
  rte_ipv6_udptcp_cksum_verify() (patch 3)


Olivier Matz (4):
  net/tap: fix Rx cksum flags on IP options packets
  net/tap: fix Rx cksum flags on TCP packets
  net: introduce functions to verify L4 checksums
  test/cksum: new test for L3/L4 checksum API

 MAINTAINERS   |   1 +
 app/test/autotest_data.py |   6 +
 app/test/meson.build  |   2 +
 app/test/test_cksum.c | 271 ++
 drivers/net/tap/rte_eth_tap.c |  23 ++-
 lib/net/rte_ip.h  | 127 +---
 6 files changed, 398 insertions(+), 32 deletions(-)
 create mode 100644 app/test/test_cksum.c

-- 
2.29.2

[dpdk-dev] [PATCH v2 1/4] net/tap: fix Rx cksum flags on IP options packets

2021-06-30 Thread Olivier Matz

When packet type is IPV4_EXT, the checksum is always marked as good in
the mbuf offload flags.

Since we know the header lengths, we can easily call
rte_ipv4_udptcp_cksum() in this case too.

Fixes: 8ae3023387e9 ("net/tap: add Rx/Tx checksum offload support")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
---
 drivers/net/tap/rte_eth_tap.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5735988e7c..5513cfd2d7 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -342,7 +342,11 @@ tap_verify_csum(struct rte_mbuf *mbuf)
rte_pktmbuf_data_len(mbuf))
return;
} else {
-   /* IPv6 extensions are not supported */
+   /* - RTE_PTYPE_L3_IPV4_EXT_UNKNOWN cannot happen because
+*   mbuf->packet_type is filled by rte_net_get_ptype() which
+*   never returns this value.
+* - IPv6 extensions are not supported.
+*/
return;
}
if (l4 == RTE_PTYPE_L4_UDP || l4 == RTE_PTYPE_L4_TCP) {
@@ -350,7 +354,7 @@ tap_verify_csum(struct rte_mbuf *mbuf)
/* Don't verify checksum for multi-segment packets. */
if (mbuf->nb_segs > 1)
return;
-   if (l3 == RTE_PTYPE_L3_IPV4) {
+   if (l3 == RTE_PTYPE_L3_IPV4 || l3 == RTE_PTYPE_L3_IPV4_EXT) {
if (l4 == RTE_PTYPE_L4_UDP) {
udp_hdr = (struct rte_udp_hdr *)l4_hdr;
if (udp_hdr->dgram_cksum == 0) {
@@ -364,7 +368,7 @@ tap_verify_csum(struct rte_mbuf *mbuf)
}
}
cksum = ~rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
-   } else if (l3 == RTE_PTYPE_L3_IPV6) {
+   } else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
cksum = ~rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
}
mbuf->ol_flags |= cksum ?
-- 
2.29.2

[dpdk-dev] [PATCH v2 2/4] net/tap: fix Rx cksum flags on TCP packets

2021-06-30 Thread Olivier Matz

Since commit d5df2ae0428a ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() or
rte_ipv6_udptcp_cksum() can return either 0x or 0x when used to
verify a packet containing a valid checksum.

This new behavior broke the checksum verification in tap driver for TCP
packets: these packets are marked with PKT_RX_L4_CKSUM_BAD.

Fix this by checking the 2 possible values. A next commit will introduce
a checksum verification helper to simplify this a bit.

Fixes: d5df2ae0428a ("net: fix unneeded replacement of TCP checksum 0")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
Acked-by: Andrew Rybchenko 
---
 drivers/net/tap/rte_eth_tap.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5513cfd2d7..5429f611c1 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -350,6 +350,8 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
if (l4 == RTE_PTYPE_L4_UDP || l4 == RTE_PTYPE_L4_TCP) {
+   int cksum_ok;
+
l4_hdr = rte_pktmbuf_mtod_offset(mbuf, void *, l2_len + l3_len);
/* Don't verify checksum for multi-segment packets. */
if (mbuf->nb_segs > 1)
@@ -367,13 +369,13 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
}
-   cksum = ~rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum = rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
} else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
-   cksum = ~rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum = rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
}
-   mbuf->ol_flags |= cksum ?
-   PKT_RX_L4_CKSUM_BAD :
-   PKT_RX_L4_CKSUM_GOOD;
+   cksum_ok = (cksum == 0) || (cksum == 0x);
+   mbuf->ol_flags |= cksum_ok ?
+   PKT_RX_L4_CKSUM_GOOD : PKT_RX_L4_CKSUM_BAD;
}
 }
 
-- 
2.29.2

[dpdk-dev] [PATCH v2 3/4] net: introduce functions to verify L4 checksums

2021-06-30 Thread Olivier Matz

Since commit d5df2ae0428a ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() and
rte_ipv6_udptcp_cksum() can return either 0x or 0x when used to
verify a packet containing a valid checksum.

Since these functions should be used to calculate the checksum to set in
a packet, introduce 2 new helpers for checksum verification. They return
0 if the checksum is valid in the packet.

Use this new helper in net/tap driver.

Signed-off-by: Olivier Matz 
Acked-by: Morten Brørup 
---
 drivers/net/tap/rte_eth_tap.c |   7 +-
 lib/net/rte_ip.h  | 127 +++---
 2 files changed, 107 insertions(+), 27 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5429f611c1..2229eef059 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -369,11 +369,12 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
}
-   cksum = rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum_ok = !rte_ipv4_udptcp_cksum_verify(l3_hdr,
+l4_hdr);
} else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
-   cksum = rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum_ok = !rte_ipv6_udptcp_cksum_verify(l3_hdr,
+l4_hdr);
}
-   cksum_ok = (cksum == 0) || (cksum == 0x);
mbuf->ol_flags |= cksum_ok ?
PKT_RX_L4_CKSUM_GOOD : PKT_RX_L4_CKSUM_BAD;
}
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 4b728969c1..05948b69b7 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -344,20 +344,10 @@ rte_ipv4_phdr_cksum(const struct rte_ipv4_hdr *ipv4_hdr, 
uint64_t ol_flags)
 }
 
 /**
- * Process the IPv4 UDP or TCP checksum.
- *
- * The IP and layer 4 checksum must be set to 0 in the packet by
- * the caller.
- *
- * @param ipv4_hdr
- *   The pointer to the contiguous IPv4 header.
- * @param l4_hdr
- *   The pointer to the beginning of the L4 header.
- * @return
- *   The complemented checksum to set in the IP packet.
+ * @internal Calculate the non-complemented IPv4 L4 checksum
  */
 static inline uint16_t
-rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+__rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void 
*l4_hdr)
 {
uint32_t cksum;
uint32_t l3_len, l4_len;
@@ -374,16 +364,65 @@ rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr 
*ipv4_hdr, const void *l4_hdr)
cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
 
cksum = ((cksum & 0x) >> 16) + (cksum & 0x);
-   cksum = (~cksum) & 0x;
+
+   return (uint16_t)cksum;
+}
+
+/**
+ * Process the IPv4 UDP or TCP checksum.
+ *
+ * The IP and layer 4 checksum must be set to 0 in the packet by
+ * the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+{
+   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
+
+   cksum = ~cksum;
+
/*
-* Per RFC 768:If the computed checksum is zero for UDP,
+* Per RFC 768: If the computed checksum is zero for UDP,
 * it is transmitted as all ones
 * (the equivalent in one's complement arithmetic).
 */
if (cksum == 0 && ipv4_hdr->next_proto_id == IPPROTO_UDP)
cksum = 0x;
 
-   return (uint16_t)cksum;
+   return cksum;
+}
+
+/**
+ * Validate the IPv4 UDP or TCP checksum.
+ *
+ * In case of UDP, the caller must first check if udp_hdr->dgram_cksum is 0
+ * (i.e. no checksum).
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   Return 0 if the checksum is correct, else -1.
+ */
+__rte_experimental
+static inline int
+rte_ipv4_udptcp_cksum_verify(const struct rte_ipv4_hdr *ipv4_hdr,
+const void *l4_hdr)
+{
+   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
+
+   if (cksum != 0x)
+   return -1;
+
+   return 0;
 }
 
 /**
@@ -448,6 +487,25 @@ rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, 
uint64_t ol_flags)
return __rte_raw_cksum_reduce(sum);
 }
 
+/**
+ * @internal Calculate the non-complemented IPv4 L4 checksum
+ */
+static inline uint16_t
+__rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void 
*l4_hdr)
+{
+   uint32_t cksum;
+   uint32_t l4_len;
+
+   l4_len = rte_be_to_cp

[dpdk-dev] [PATCH v2 4/4] test/cksum: new test for L3/L4 checksum API

2021-06-30 Thread Olivier Matz

Add a simple unit test for checksum API.

Signed-off-by: Olivier Matz 
---
 MAINTAINERS   |   1 +
 app/test/autotest_data.py |   6 +
 app/test/meson.build  |   2 +
 app/test/test_cksum.c | 271 ++
 4 files changed, 280 insertions(+)
 create mode 100644 app/test/test_cksum.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..4347555ebc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1314,6 +1314,7 @@ Packet processing
 Network headers
 M: Olivier Matz 
 F: lib/net/
+F: app/test/test_cksum.c
 
 Packet CRC
 M: Jasvinder Singh 
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 11f9c8640c..302d6374c1 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -567,6 +567,12 @@
 "Func":default_autotest,
 "Report":  None,
 },
+{
+"Name":"Checksum autotest",
+"Command": "cksum_autotest",
+"Func":default_autotest,
+"Report":  None,
+},
 #
 #Please always keep all dump tests at the end and together!
 #
diff --git a/app/test/meson.build b/app/test/meson.build
index 0a5f425578..ef90b16f16 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -17,6 +17,7 @@ test_sources = files(
 'test_bitmap.c',
 'test_bpf.c',
 'test_byteorder.c',
+'test_cksum.c',
 'test_cmdline.c',
 'test_cmdline_cirbuf.c',
 'test_cmdline_etheraddr.c',
@@ -188,6 +189,7 @@ fast_tests = [
 ['atomic_autotest', false],
 ['bitops_autotest', true],
 ['byteorder_autotest', true],
+['cksum_autotest', true],
 ['cmdline_autotest', true],
 ['common_autotest', true],
 ['cpuflags_autotest', true],
diff --git a/app/test/test_cksum.c b/app/test/test_cksum.c
new file mode 100644
index 00..cd983d7c01
--- /dev/null
+++ b/app/test/test_cksum.c
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 6WIND S.A.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MEMPOOL_CACHE_SIZE  0
+#define MBUF_DATA_SIZE  256
+#define NB_MBUF 128
+
+/*
+ * Test L3/L4 checksum API.
+ */
+
+#define GOTO_FAIL(str, ...) do {   \
+   printf("cksum test FAILED (l.%d): <" str ">\n", \
+  __LINE__,  ##__VA_ARGS__);   \
+   goto fail;  \
+   } while (0)
+
+/* generated in scapy with Ether()/IP()/TCP())) */
+static const char test_cksum_ipv4_tcp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x45, 0x00,
+   0x00, 0x28, 0x00, 0x01, 0x00, 0x00, 0x40, 0x06,
+   0x7c, 0xcd, 0x7f, 0x00, 0x00, 0x01, 0x7f, 0x00,
+   0x00, 0x01, 0x00, 0x14, 0x00, 0x50, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x50, 0x02,
+   0x20, 0x00, 0x91, 0x7c, 0x00, 0x00,
+
+};
+
+/* generated in scapy with Ether()/IPv6()/TCP()) */
+static const char test_cksum_ipv6_tcp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x60, 0x00,
+   0x00, 0x00, 0x00, 0x14, 0x06, 0x40, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x14,
+   0x00, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x50, 0x02, 0x20, 0x00, 0x8f, 0x7d,
+   0x00, 0x00,
+};
+
+/* generated in scapy with Ether()/IP()/UDP()/Raw('x')) */
+static const char test_cksum_ipv4_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x45, 0x00,
+   0x00, 0x1d, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+   0x7c, 0xcd, 0x7f, 0x00, 0x00, 0x01, 0x7f, 0x00,
+   0x00, 0x01, 0x00, 0x35, 0x00, 0x35, 0x00, 0x09,
+   0x89, 0x6f, 0x78,
+};
+
+/* generated in scapy with Ether()/IPv6()/UDP()/Raw('x')) */
+static const char test_cksum_ipv6_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x86, 0xdd, 0x60, 0x00,
+   0x00, 0x00, 0x00, 0x09, 0x11, 0x40, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x35,
+   0x00, 0x35, 0x00, 0x09, 0x87, 0x70, 0x78,
+};
+
+/* generated in scapy with Ether()/IP(options='\x00')/UDP()/Raw('x')) */
+static const char test_cksum_ipv4_opts_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x46, 0x00,
+   0x00, 0x21, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+   0x7b, 0xc9, 0x7f, 0x00, 0x

Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev

2021-06-30 Thread Andrew Rybchenko


On 6/30/21 12:21 PM, Ferruh Yigit wrote:

Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums on
v21.11.
Backward compatibility macros will be added on v21.11 and they will be
removed on v22.11.

Signed-off-by: Ferruh Yigit 
---
Cc: Andrew Rybchenko 
Cc: Thomas Monjalon 
Cc: David Marchand 
Cc: Qi Z Zhang 
Cc: Raslan Darawsheh 
Cc: Ajit Khaparde 
Cc: Jerin Jacob Kollanukkaran 
---
  doc/guides/rel_notes/deprecation.rst | 5 +
  1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd723..ae79673e37e3 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -118,6 +118,11 @@ Deprecation Notices
consistent with existing outer header checksum status flag naming, which
should help in reducing confusion about its usage.
  
+* ethdev: Will add 'RTE_ETH_' prefix to all ethdev macros/enums in v21.11. Macros

+  will be added for backward compatibility. Backward compatibility macros will 
be
+  removed on v22.11. A few old backward compatibility macros from 2013 that 
does
+  not have proper prefix will be removed on v21.11.
+
  * i40e: As there are both i40evf and iavf pmd, the functions of them are
duplicated. And now more and more advanced features are developed on iavf.
To keep consistent with kernel driver's name



Acked-by: Andrew Rybchenko

[dpdk-dev] [PATCH v2] test/crypto: fix autotest function parameters

2021-06-30 Thread Rebecca Troy

Fixed parameters on autotest functions by removing comments.

Fixes: 202d375c60bc ("app/test: add cryptodev unit and performance tests")
Fixes: 4ed1e79e7819 ("test/crypto: add tests for virtio-crypto")
Fixes: 3aafc423cf4d ("snow3g: add driver for SNOW 3G library")
Fixes: 27a1c4714d54 ("app/test: add KASUMI crypto")
Fixes: 4c99481f49c4 ("app/test: add ZUC")
Fixes: c8e69fce7046 ("crypto/scheduler: add unit test")
Fixes: ae002048bbea ("test/crypto: add DPAA2 crypto functional test")
Fixes: b674d6d0381a ("test/crypto: add dpaa crypto test cases")
Fixes: a8dbd44d6b4c ("test/crypto: add CAAM JR validation cases")
Fixes: 4868f6591c6f ("test/crypto: add cases for raw datapath API")

Cc: declan.dohe...@intel.com
Cc: jianjay.z...@huawei.com
Cc: pablo.de.lara.gua...@intel.com
Cc: roy.fan.zh...@intel.com
Cc: gak...@marvell.com
Cc: hemant.agra...@nxp.com
Cc: sta...@dpdk.org

Signed-off-by: Rebecca Troy 
Acked-by: Ciara Power 
---
 app/test/test_cryptodev.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 39db52b17a..c18730f138 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -14521,19 +14521,19 @@ run_cryptodev_testsuite(const char *pmd_name)
 }
 
 static int
-test_cryptodev_qat(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_qat(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_QAT_SYM_PMD));
 }
 
 static int
-test_cryptodev_virtio(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_virtio(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_VIRTIO_PMD));
 }
 
 static int
-test_cryptodev_aesni_mb(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_aesni_mb(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
 }
@@ -14579,19 +14579,19 @@ test_cryptodev_null(void)
 }
 
 static int
-test_cryptodev_sw_snow3g(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_sw_snow3g(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_SNOW3G_PMD));
 }
 
 static int
-test_cryptodev_sw_kasumi(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_sw_kasumi(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_KASUMI_PMD));
 }
 
 static int
-test_cryptodev_sw_zuc(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_sw_zuc(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_ZUC_PMD));
 }
@@ -14611,7 +14611,7 @@ test_cryptodev_mrvl(void)
 #ifdef RTE_CRYPTO_SCHEDULER
 
 static int
-test_cryptodev_scheduler(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_scheduler(void)
 {
uint8_t ret, sched_i, j, i = 0, blk_start_idx = 0;
const enum blockcipher_test_type blk_suites[] = {
@@ -14719,13 +14719,13 @@ REGISTER_TEST_COMMAND(cryptodev_scheduler_autotest, 
test_cryptodev_scheduler);
 #endif
 
 static int
-test_cryptodev_dpaa2_sec(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_dpaa2_sec(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_DPAA2_SEC_PMD));
 }
 
 static int
-test_cryptodev_dpaa_sec(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_dpaa_sec(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_DPAA_SEC_PMD));
 }
@@ -14749,7 +14749,7 @@ test_cryptodev_octeontx2(void)
 }
 
 static int
-test_cryptodev_caam_jr(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_caam_jr(void)
 {
return run_cryptodev_testsuite(RTE_STR(CRYPTODEV_NAME_CAAM_JR_PMD));
 }
@@ -14767,7 +14767,7 @@ test_cryptodev_bcmfs(void)
 }
 
 static int
-test_cryptodev_qat_raw_api(void /*argv __rte_unused, int argc __rte_unused*/)
+test_cryptodev_qat_raw_api(void)
 {
int ret;
 
-- 
2.25.1

Re: [dpdk-dev] [PATCH v2] net: prepare the outer ipv4 hdr for checksum

2021-06-30 Thread Olivier Matz

Hi Mohsin,

Hope you are fine!
Please see my comments below.

On Wed, Jun 30, 2021 at 01:04:04PM +0200, Mohsin Kazmi wrote:
> Re: [PATCH v2] net: prepare the outer ipv4 hdr for checksum

I suggest to highlight that it this is the Intel-specific tx-prepare
function in the commit title. What about:

  net: fix Intel-specific Tx preparation for outer checksums

> Preparation the headers for the hardware offload
> misses the outer ipv4 checksum offload.
> It results in bad checksum computed by hardware NIC.
> 
> This patch fixes the issue by setting the outer ipv4
> checksum field to 0.
> 
> Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Mohsin Kazmi 
> Acked-by: Qi Zhang 
> ---
> 
> v2:
> * Update the commit message with Fixes.
> ---
>  lib/net/rte_net.h | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
> index 434435ffa2..e47365099e 100644
> --- a/lib/net/rte_net.h
> +++ b/lib/net/rte_net.h
> @@ -128,8 +128,18 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, 
> uint64_t ol_flags)
>   if (!(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK | PKT_TX_TCP_SEG)))
>   return 0;

I think this test should be updated too with PKT_TX_OUTER_IP_CKSUM.

>  
> - if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6))
> + if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6)) {
>   inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> + /*
> +  * prepare outer ipv4 header checksum by setting it to 0,
> +  * in order to be computed by hardware NICs.
> +  */
> + if (ol_flags & PKT_TX_OUTER_IP_CKSUM) {
> + ipv4_hdr = rte_pktmbuf_mtod_offset(m,
> + struct rte_ipv4_hdr *, m->outer_l2_len);
> + ipv4_hdr->hdr_checksum = 0;
> + }
> + }

What about outer L4 checksum? Does it requires the same than inner?

>  
>   /*
>* Check if headers are fragmented.
> -- 
> 2.17.1
>

Re: [dpdk-dev] [EXT] [PATCH] event/cnxk: fix clang warning on Arm

2021-06-30 Thread Jerin Jacob

On Fri, Jun 18, 2021 at 12:51 PM Pavan Nikhilesh Bhagavatula
 wrote:
>
>
>
> >-Original Message-
> >From: Ruifeng Wang 
> >Sent: Thursday, June 10, 2021 12:55 PM
> >To: Pavan Nikhilesh Bhagavatula ; Shijith
> >Thotton 
> >Cc: dev@dpdk.org; n...@arm.com; Ruifeng Wang
> >
> >Subject: [EXT] [PATCH] event/cnxk: fix clang warning on Arm
> >
> >External Email
> >
> >--
> >Build with Clang-10 has warning:
> >drivers/event/cnxk/cnxk_tim_worker.h:372:23: warning: value size
> >does not match register size specified by the constraint and modifier [-
> >Wasm-operand-widths]
> > : [rem] "=&r"(rem)
> >   ^
> >drivers/event/cnxk/cnxk_tim_worker.h:365:17: note: use constraint
> >modifier "w"
> > "  ldxr %[rem], [%[crem]]  \n"
> > ^~
> > %w[rem]
> >
> >Changed variable type to match register size, which placates clang.
> >
> >Fixes: 300b796262a1 ("event/cnxk: add timer arm routine")
> >Cc: pbhagavat...@marvell.com
> >
> >Signed-off-by: Ruifeng Wang 
>
> LGTM, thanks.
>
> Acked-by: Pavan Nikhilesh 


Applied to dpdk-next-net-eventdev/for-main. Thanks


>
> >---
> > drivers/event/cnxk/cnxk_tim_worker.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/drivers/event/cnxk/cnxk_tim_worker.h
> >b/drivers/event/cnxk/cnxk_tim_worker.h
> >index 7caeb1a8fb..78e36ffafe 100644
> >--- a/drivers/event/cnxk/cnxk_tim_worker.h
> >+++ b/drivers/event/cnxk/cnxk_tim_worker.h
> >@@ -320,7 +320,7 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring
> >*const tim_ring,
> >   struct cnxk_tim_ent *chunk;
> >   struct cnxk_tim_bkt *bkt;
> >   uint64_t lock_sema;
> >-  int16_t rem;
> >+  int64_t rem;
> >
> > __retry:
> >   cnxk_tim_get_target_bucket(tim_ring, rel_bkt, &bkt,
> >&mirr_bkt);
> >--
> >2.25.1
>

Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev

2021-06-30 Thread Jerin Jacob

On Wed, Jun 30, 2021 at 7:28 PM Andrew Rybchenko
 wrote:
>
> On 6/30/21 12:21 PM, Ferruh Yigit wrote:
> > Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums on
> > v21.11.
> > Backward compatibility macros will be added on v21.11 and they will be
> > removed on v22.11.
> >
> > Signed-off-by: Ferruh Yigit 
> > ---
> > Cc: Andrew Rybchenko 
> > Cc: Thomas Monjalon 
> > Cc: David Marchand 
> > Cc: Qi Z Zhang 
> > Cc: Raslan Darawsheh 
> > Cc: Ajit Khaparde 
> > Cc: Jerin Jacob Kollanukkaran 
> > ---
> >   doc/guides/rel_notes/deprecation.rst | 5 +
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst 
> > b/doc/guides/rel_notes/deprecation.rst
> > index 9584d6bfd723..ae79673e37e3 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -118,6 +118,11 @@ Deprecation Notices
> > consistent with existing outer header checksum status flag naming, which
> > should help in reducing confusion about its usage.
> >
> > +* ethdev: Will add 'RTE_ETH_' prefix to all ethdev macros/enums in v21.11. 
> > Macros
> > +  will be added for backward compatibility. Backward compatibility macros 
> > will be
> > +  removed on v22.11. A few old backward compatibility macros from 2013 
> > that does
> > +  not have proper prefix will be removed on v21.11.
> > +
> >   * i40e: As there are both i40evf and iavf pmd, the functions of them are
> > duplicated. And now more and more advanced features are developed on 
> > iavf.
> > To keep consistent with kernel driver's name
> >
>
> Acked-by: Andrew Rybchenko 
Acked-by: Jerin Jacob

Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev

2021-06-30 Thread Raslan Darawsheh


> -Original Message-
> From: Ajit Khaparde 
> Sent: Wednesday, June 30, 2021 6:12 PM
> To: Jerin Jacob 
> Cc: Andrew Rybchenko ; Ferruh Yigit
> ; Ray Kinsella ; Neil Horman
> ; dpdk-dev ; Andrew
> Rybchenko ; NBU-Contact-Thomas Monjalon
> ; David Marchand ;
> Qi Z Zhang ; Raslan Darawsheh
> ; Jerin Jacob Kollanukkaran 
> Subject: Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev
> 
> On Wed, Jun 30, 2021 at 8:03 AM Jerin Jacob 
> wrote:
> >
> > On Wed, Jun 30, 2021 at 7:28 PM Andrew Rybchenko
> >  wrote:
> > >
> > > On 6/30/21 12:21 PM, Ferruh Yigit wrote:
> > > > Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums
> on
> > > > v21.11.
> > > > Backward compatibility macros will be added on v21.11 and they will be
> > > > removed on v22.11.
> > > >
> > > > Signed-off-by: Ferruh Yigit 
> > > > ---
> > > > Cc: Andrew Rybchenko 
> > > > Cc: Thomas Monjalon 
> > > > Cc: David Marchand 
> > > > Cc: Qi Z Zhang 
> > > > Cc: Raslan Darawsheh 
> > > > Cc: Ajit Khaparde 
> > > > Cc: Jerin Jacob Kollanukkaran 
> > > > ---
> > > >   doc/guides/rel_notes/deprecation.rst | 5 +
> > > >   1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> > > > index 9584d6bfd723..ae79673e37e3 100644
> > > > --- a/doc/guides/rel_notes/deprecation.rst
> > > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > > @@ -118,6 +118,11 @@ Deprecation Notices
> > > > consistent with existing outer header checksum status flag naming,
> which
> > > > should help in reducing confusion about its usage.
> > > >
> > > > +* ethdev: Will add 'RTE_ETH_' prefix to all ethdev macros/enums in
> v21.11. Macros
> > > > +  will be added for backward compatibility. Backward compatibility
> macros will be
> > > > +  removed on v22.11. A few old backward compatibility macros from
> 2013 that does
> > > > +  not have proper prefix will be removed on v21.11.
> > > > +
> > > >   * i40e: As there are both i40evf and iavf pmd, the functions of them
> are
> > > > duplicated. And now more and more advanced features are
> developed on iavf.
> > > > To keep consistent with kernel driver's name
> > > >
> > >
> > > Acked-by: Andrew Rybchenko 
> > Acked-by: Jerin Jacob 
> Acked-by: Ajit Khaparde 
Acked-by: Raslan Darawsheh

Re: [dpdk-dev] [PATCH] event/dsw: flag proper eventdev adapter capabilities

2021-06-30 Thread Jerin Jacob

On Mon, Jun 14, 2021 at 3:54 PM Mattias Rönnblom
 wrote:
>
> Set the appropriate capability flags for the RX, crypto and timer
> eventdev adapters to use.
>
> Signed-off-by: Mattias Rönnblom 
> Tested-by: Heng Wang 

Applied to dpdk-next-net-eventdev/for-main. Thanks


> ---
>  drivers/event/dsw/dsw_evdev.c | 31 +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/drivers/event/dsw/dsw_evdev.c b/drivers/event/dsw/dsw_evdev.c
> index 320a3784c..2301a4b7a 100644
> --- a/drivers/event/dsw/dsw_evdev.c
> +++ b/drivers/event/dsw/dsw_evdev.c
> @@ -370,6 +370,34 @@ dsw_close(struct rte_eventdev *dev)
> return 0;
>  }
>
> +static int
> +dsw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev __rte_unused,
> +   const struct rte_eth_dev *eth_dev __rte_unused,
> +   uint32_t *caps)
> +{
> +   *caps = RTE_EVENT_ETH_RX_ADAPTER_SW_CAP;
> +   return 0;
> +}
> +
> +static int
> +dsw_timer_adapter_caps_get(const struct rte_eventdev *dev __rte_unused,
> +  uint64_t flags  __rte_unused, uint32_t *caps,
> +  const struct rte_event_timer_adapter_ops **ops)
> +{
> +   *caps = 0;
> +   *ops = NULL;
> +   return 0;
> +}
> +
> +static int
> +dsw_crypto_adapter_caps_get(const struct rte_eventdev *dev  __rte_unused,
> +   const struct rte_cryptodev *cdev  __rte_unused,
> +   uint32_t *caps)
> +{
> +   *caps = RTE_EVENT_CRYPTO_ADAPTER_SW_CAP;
> +   return 0;
> +}
> +
>  static struct rte_eventdev_ops dsw_evdev_ops = {
> .port_setup = dsw_port_setup,
> .port_def_conf = dsw_port_def_conf,
> @@ -384,6 +412,9 @@ static struct rte_eventdev_ops dsw_evdev_ops = {
> .dev_start = dsw_start,
> .dev_stop = dsw_stop,
> .dev_close = dsw_close,
> +   .eth_rx_adapter_caps_get = dsw_eth_rx_adapter_caps_get,
> +   .timer_adapter_caps_get = dsw_timer_adapter_caps_get,
> +   .crypto_adapter_caps_get = dsw_crypto_adapter_caps_get,
> .xstats_get = dsw_xstats_get,
> .xstats_get_names = dsw_xstats_get_names,
> .xstats_get_by_name = dsw_xstats_get_by_name
> --
> 2.25.1
>

[dpdk-dev] [PATCH] test/crypto: rename slave to worker

2021-06-30 Thread Rebecca Troy

Modifies the scheduler tests in the crypto unit test suite
to replace the usage of the word 'slave' with the more
appropriate word 'worker'.

The scheduler test functions were modified as follows:
test_scheduler_attach_slave_op is now called
test_scheduler_attach_worker_op,
test_scheduler_detach_slave_op is
test_scheduler_detach_worker_op.

Signed-off-by: Rebecca Troy 
---
 app/test/test_cryptodev.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 39db52b17a..1725d6154c 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -13474,7 +13474,7 @@ scheduler_testsuite_setup(void)
 }
 
 static int
-test_scheduler_attach_slave_op(void)
+test_scheduler_attach_worker_op(void)
 {
struct crypto_testsuite_params *ts_params = &testsuite_params;
uint8_t sched_id = ts_params->valid_devs[0];
@@ -13584,7 +13584,7 @@ test_scheduler_attach_slave_op(void)
 }
 
 static int
-test_scheduler_detach_slave_op(void)
+test_scheduler_detach_worker_op(void)
 {
struct crypto_testsuite_params *ts_params = &testsuite_params;
uint8_t sched_id = ts_params->valid_devs[0];
@@ -13650,7 +13650,7 @@ test_scheduler_mode_pkt_size_distr_op(void)
 static int
 scheduler_multicore_testsuite_setup(void)
 {
-   if (test_scheduler_attach_slave_op() < 0)
+   if (test_scheduler_attach_worker_op() < 0)
return TEST_SKIPPED;
if (test_scheduler_mode_op(CDEV_SCHED_MODE_MULTICORE) < 0)
return TEST_SKIPPED;
@@ -13660,7 +13660,7 @@ scheduler_multicore_testsuite_setup(void)
 static int
 scheduler_roundrobin_testsuite_setup(void)
 {
-   if (test_scheduler_attach_slave_op() < 0)
+   if (test_scheduler_attach_worker_op() < 0)
return TEST_SKIPPED;
if (test_scheduler_mode_op(CDEV_SCHED_MODE_ROUNDROBIN) < 0)
return TEST_SKIPPED;
@@ -13670,7 +13670,7 @@ scheduler_roundrobin_testsuite_setup(void)
 static int
 scheduler_failover_testsuite_setup(void)
 {
-   if (test_scheduler_attach_slave_op() < 0)
+   if (test_scheduler_attach_worker_op() < 0)
return TEST_SKIPPED;
if (test_scheduler_mode_op(CDEV_SCHED_MODE_FAILOVER) < 0)
return TEST_SKIPPED;
@@ -13680,7 +13680,7 @@ scheduler_failover_testsuite_setup(void)
 static int
 scheduler_pkt_size_distr_testsuite_setup(void)
 {
-   if (test_scheduler_attach_slave_op() < 0)
+   if (test_scheduler_attach_worker_op() < 0)
return TEST_SKIPPED;
if (test_scheduler_mode_op(CDEV_SCHED_MODE_PKT_SIZE_DISTR) < 0)
return TEST_SKIPPED;
@@ -13690,7 +13690,7 @@ scheduler_pkt_size_distr_testsuite_setup(void)
 static void
 scheduler_mode_testsuite_teardown(void)
 {
-   test_scheduler_detach_slave_op();
+   test_scheduler_detach_worker_op();
 }
 
 #endif /* RTE_CRYPTO_SCHEDULER */
@@ -14652,12 +14652,12 @@ test_cryptodev_scheduler(void /*argv __rte_unused, 
int argc __rte_unused*/)
static struct unit_test_suite scheduler_config = {
.suite_name = "Crypto Device Scheduler Config Unit Test Suite",
.unit_test_cases = {
-   TEST_CASE(test_scheduler_attach_slave_op),
+   TEST_CASE(test_scheduler_attach_worker_op),
TEST_CASE(test_scheduler_mode_multicore_op),
TEST_CASE(test_scheduler_mode_roundrobin_op),
TEST_CASE(test_scheduler_mode_failover_op),
TEST_CASE(test_scheduler_mode_pkt_size_distr_op),
-   TEST_CASE(test_scheduler_detach_slave_op),
+   TEST_CASE(test_scheduler_detach_worker_op),
 
TEST_CASES_END() /**< NULL terminate array */
}
-- 
2.25.1

Re: [dpdk-dev] [PATCH] app/eventdev: add option to enable per port pool

2021-06-30 Thread Jerin Jacob

On Tue, Jun 15, 2021 at 4:02 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> Add option to configure unique mempool for each ethernet device
> port. Can be used with `pipeline_atq` and `pipeline_queue` tests.
>
> Signed-off-by: Pavan Nikhilesh 
> ---
>  app/test-eventdev/evt_common.h   |  1 +
>  app/test-eventdev/evt_options.c  |  9 
>  app/test-eventdev/evt_options.h  |  1 +
>  app/test-eventdev/test_pipeline_common.c | 52 +---
>  app/test-eventdev/test_pipeline_common.h |  2 +-
>  doc/guides/tools/testeventdev.rst|  8 
>  6 files changed, 58 insertions(+), 15 deletions(-)
>
> diff --git a/app/test-eventdev/evt_common.h b/app/test-eventdev/evt_common.h
> index 0e228258e7..28afb114b3 100644
> --- a/app/test-eventdev/evt_common.h
> +++ b/app/test-eventdev/evt_common.h
> @@ -55,6 +55,7 @@ struct evt_options {
> uint8_t timdev_cnt;
> uint8_t nb_timer_adptrs;
> uint8_t timdev_use_burst;
> +   uint8_t per_port_pool;
> uint8_t sched_type_list[EVT_MAX_STAGES];
> uint16_t mbuf_sz;
> uint16_t wkr_deq_dep;
> diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
> index 061b63e12e..bfa3840dbc 100644
> --- a/app/test-eventdev/evt_options.c
> +++ b/app/test-eventdev/evt_options.c
> @@ -297,6 +297,12 @@ evt_parse_eth_queues(struct evt_options *opt, const char 
> *arg)
> return ret;
>  }
>
> +evt_parse_per_port_pool(struct evt_options *opt, const char *arg 
> __rte_unused)

Syntax issue with the old compiler on return value.
http://mails.dpdk.org/archives/test-report/2021-June/198858.html


> +{
> +   opt->per_port_pool = 1;
> +   return 0;
> +}
> +
>  static void
>  usage(char *program)
>  {
> @@ -333,6 +339,7 @@ usage(char *program)
> "\t--enable_vector: enable event vectorization.\n"
> "\t--vector_size  : Max vector size.\n"
> "\t--vector_tmo_ns: Max vector timeout in nanoseconds\n"
> +   "\t--per_port_pool: Configure unique pool per ethdev 
> port\n"
> );
> printf("available tests:\n");
> evt_test_dump_names();
> @@ -408,6 +415,7 @@ static struct option lgopts[] = {
> { EVT_ENA_VECTOR,  0, 0, 0 },
> { EVT_VECTOR_SZ,   1, 0, 0 },
> { EVT_VECTOR_TMO,  1, 0, 0 },
> +   { EVT_PER_PORT_POOL,   0, 0, 0 },
> { EVT_HELP,0, 0, 0 },
> { NULL,0, 0, 0 }
>  };
> @@ -446,6 +454,7 @@ evt_opts_parse_long(int opt_idx, struct evt_options *opt)
> { EVT_ENA_VECTOR, evt_parse_ena_vector},
> { EVT_VECTOR_SZ, evt_parse_vector_size},
> { EVT_VECTOR_TMO, evt_parse_vector_tmo_ns},
> +   { EVT_PER_PORT_POOL, evt_parse_per_port_pool},
> };
>
> for (i = 0; i < RTE_DIM(parsermap); i++) {
> diff --git a/app/test-eventdev/evt_options.h b/app/test-eventdev/evt_options.h
> index 1cea2a3e11..6436200b40 100644
> --- a/app/test-eventdev/evt_options.h
> +++ b/app/test-eventdev/evt_options.h
> @@ -46,6 +46,7 @@
>  #define EVT_ENA_VECTOR   ("enable_vector")
>  #define EVT_VECTOR_SZ("vector_size")
>  #define EVT_VECTOR_TMO   ("vector_tmo_ns")
> +#define EVT_PER_PORT_POOL   ("per_port_pool")
>  #define EVT_HELP ("help")
>
>  void evt_options_default(struct evt_options *opt);
> diff --git a/app/test-eventdev/test_pipeline_common.c 
> b/app/test-eventdev/test_pipeline_common.c
> index d5ef90500f..6ee530d4cd 100644
> --- a/app/test-eventdev/test_pipeline_common.c
> +++ b/app/test-eventdev/test_pipeline_common.c
> @@ -259,9 +259,10 @@ pipeline_ethdev_setup(struct evt_test *test, struct 
> evt_options *opt)
> }
>
> for (j = 0; j < opt->eth_queues; j++) {
> -   if (rte_eth_rx_queue_setup(i, j, NB_RX_DESC,
> -  rte_socket_id(), &rx_conf,
> -  t->pool) < 0) {
> +   if (rte_eth_rx_queue_setup(
> +   i, j, NB_RX_DESC, rte_socket_id(), 
> &rx_conf,
> +   opt->per_port_pool ? t->pool[i] :
> + t->pool[0]) < 
> 0) {
> evt_err("Failed to setup eth port [%d] 
> rx_queue: %d.",
> i, 0);
> return -EINVAL;
> @@ -569,18 +570,35 @@ pipeline_mempool_setup(struct evt_test *test, struct 
> evt_options *opt)
> if (data_size  > opt->mbuf_sz)
> opt->mbuf_sz = data_size;
> }
> +   if (opt->per_port_pool) {
> +   char name[RTE_MEMPOOL_NAMESIZE];
> +
> +   snprintf(name, RTE_MEMPOOL_NAMESIZE, "%s-%d"

Re: [dpdk-dev] [PATCH v2] test/crypto: fix autotest function parameters

2021-06-30 Thread Hemant Agrawal

Acked-by: Hemant Agrawal

[dpdk-dev] [PATCH] doc: fix build on Windows with meson 0.58

2021-06-30 Thread Dmitry Kozlyuk

The `doc` target used `echo` as its command.
On Windows, `echo` is always a shell built-in, there is no binary.
Starting from meson 0.58, `run_target()` always searches for command
executable and no longer accepts `echo` as such on Windows.
Replace plain `echo` with a Python one-liner.

Fixes: d02a2dab2dfb ("doc: support building HTML guides with meson")
Cc: Bruce Richardson 
Cc: Luca Boccassi 
Cc: sta...@dpdk.org

Reported-by: Rob Scheepens 
Signed-off-by: Dmitry Kozlyuk 
---
Sorry for the noise, sent to stable@ instead of dev@ first.

 doc/meson.build | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/meson.build b/doc/meson.build
index 959606b965..abd7d70421 100644
--- a/doc/meson.build
+++ b/doc/meson.build
@@ -11,5 +11,6 @@ if doc_targets.length() == 0
 else
 message = 'Building docs:'
 endif
-run_target('doc', command: ['echo', message, doc_target_names],
+echo = [py3, '-c', 'import sys; print(*sys.argv[1:])']
+run_target('doc', command: [echo, message, doc_target_names],
 depends: doc_targets)
-- 
2.29.3

Re: [dpdk-dev] [dpdk-ci] [PATCH v2 2/2] drivers: add octeontx crypto adapter data path

2021-06-30 Thread Brandon Lo

Hi Akhil,

I believe the FreeBSD 13 failure appeared because new requirements
were added for drivers/event/octeontx.
The ABI reference was taken at the v21.05 release which was able to
build this driver at the time.
I will try to look for a way to produce a real ABI test.

Thanks,
Brandon

On Wed, Jun 30, 2021 at 4:54 AM Akhil Goyal  wrote:
>
> > Added support for crypto adapter OP_FORWARD mode.
> >
> > As OcteonTx CPT crypto completions could be out of order, each crypto op
> > is enqueued to CPT, dequeued from CPT and enqueued to SSO one-by-one.
> >
> > Signed-off-by: Shijith Thotton 
> > ---
> This patch shows a CI warning for FreeBSD, but was not able to locate the 
> error/warning in the logs.
> Can anybody confirm what is the issue?
>
> http://mails.dpdk.org/archives/test-report/2021-June/200637.html
>
> Regards,
> Akhil



-- 

Brandon Lo

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

b...@iol.unh.edu

www.iol.unh.edu

Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test

2021-06-30 Thread Tyler Retzlaff

On Tue, Jun 29, 2021 at 05:04:55PM +, Honnappa Nagarahalli wrote:
> 
> > 
> > 18/06/2021 01:26, Honnappa Nagarahalli:
> > > > On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > > > > Since C11 memory model is adopted in DPDK now[1], use GCC's atomic
> > > > > builtins in test cases.
> > > >
> > > > as previously discussed these atomics are not "C11" they are direct
> > > > use of gcc builtins. please don't incorporate C11 into the title of
> > > > the patches or commit messages since it isn't.
> > >
> > > GCC supports 2 types of built-in atomics, __atomic_xxx[1] and
> > > __sync_xxx [2].
> > > We need a way to distinguish between them.
> > > We are using "C11" as [1] says they match C++11 memory model.
> > 
> > I agree it would be more correct to mention "compiler builtin"
> > as it is not strictly the C11 API.
> The log already mentions "GCC's C11 atomic builtins". I think that is correct 
> enough and represents the change correctly.

it's misleading and does not attract the correct reviewers particularly
due to prominence in the commit/mail subject.

please change it to "Use GCC atomic builtins" which describes clearly
the actual change without ambiguity.  using "C11" implies the patch is
adding code that uses C11 stdatomic.h and it doesn't.

Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test

2021-06-30 Thread Honnappa Nagarahalli



> > >
> > > 18/06/2021 01:26, Honnappa Nagarahalli:
> > > > > On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > > > > > Since C11 memory model is adopted in DPDK now[1], use GCC's
> > > > > > atomic builtins in test cases.
> > > > >
> > > > > as previously discussed these atomics are not "C11" they are
> > > > > direct use of gcc builtins. please don't incorporate C11 into
> > > > > the title of the patches or commit messages since it isn't.
> > > >
> > > > GCC supports 2 types of built-in atomics, __atomic_xxx[1] and
> > > > __sync_xxx [2].
> > > > We need a way to distinguish between them.
> > > > We are using "C11" as [1] says they match C++11 memory model.
> > >
> > > I agree it would be more correct to mention "compiler builtin"
> > > as it is not strictly the C11 API.
> > The log already mentions "GCC's C11 atomic builtins". I think that is 
> > correct
> enough and represents the change correctly.
> 
> it's misleading and does not attract the correct reviewers particularly due to
> prominence in the commit/mail subject.
> 
> please change it to "Use GCC atomic builtins" which describes clearly the
> actual change without ambiguity.  using "C11" implies the patch is adding
> code that uses C11 stdatomic.h and it doesn't.
As I mentioned earlier in this thread, GCC supports 2 types of atomics. "Use 
GCC atomic builtins" does not help distinguish between them. In "GCC's C11 
atomic builtins" - "C11" indicates which atomics we are using, "atomic 
builtins" indicates that we are NOT using APIs from stdatomic.h

Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev

2021-06-30 Thread Ajit Khaparde

On Wed, Jun 30, 2021 at 8:03 AM Jerin Jacob  wrote:
>
> On Wed, Jun 30, 2021 at 7:28 PM Andrew Rybchenko
>  wrote:
> >
> > On 6/30/21 12:21 PM, Ferruh Yigit wrote:
> > > Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums on
> > > v21.11.
> > > Backward compatibility macros will be added on v21.11 and they will be
> > > removed on v22.11.
> > >
> > > Signed-off-by: Ferruh Yigit 
> > > ---
> > > Cc: Andrew Rybchenko 
> > > Cc: Thomas Monjalon 
> > > Cc: David Marchand 
> > > Cc: Qi Z Zhang 
> > > Cc: Raslan Darawsheh 
> > > Cc: Ajit Khaparde 
> > > Cc: Jerin Jacob Kollanukkaran 
> > > ---
> > >   doc/guides/rel_notes/deprecation.rst | 5 +
> > >   1 file changed, 5 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/deprecation.rst 
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index 9584d6bfd723..ae79673e37e3 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -118,6 +118,11 @@ Deprecation Notices
> > > consistent with existing outer header checksum status flag naming, 
> > > which
> > > should help in reducing confusion about its usage.
> > >
> > > +* ethdev: Will add 'RTE_ETH_' prefix to all ethdev macros/enums in 
> > > v21.11. Macros
> > > +  will be added for backward compatibility. Backward compatibility 
> > > macros will be
> > > +  removed on v22.11. A few old backward compatibility macros from 2013 
> > > that does
> > > +  not have proper prefix will be removed on v21.11.
> > > +
> > >   * i40e: As there are both i40evf and iavf pmd, the functions of them are
> > > duplicated. And now more and more advanced features are developed on 
> > > iavf.
> > > To keep consistent with kernel driver's name
> > >
> >
> > Acked-by: Andrew Rybchenko 
> Acked-by: Jerin Jacob 
Acked-by: Ajit Khaparde

Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test

2021-06-30 Thread Tyler Retzlaff

On Wed, Jun 30, 2021 at 07:06:31PM +, Honnappa Nagarahalli wrote:
> 
> 
> As I mentioned earlier in this thread, GCC supports 2 types of atomics. "Use 
> GCC atomic builtins" does not help distinguish between them. In "GCC's C11 
> atomic builtins" - "C11" indicates which atomics we are using, "atomic 
> builtins" indicates that we are NOT using APIs from stdatomic.h

if you need a term to distinguish the two sets of atomics in gcc you can
qualify it with "Memory Model Aware" which is straight from the gcc
manual.

Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev

2021-06-30 Thread Tyler Retzlaff

On Wed, Jun 30, 2021 at 03:14:51PM +, Raslan Darawsheh wrote:
> 
> > -Original Message-
> > From: Ajit Khaparde 
> > Sent: Wednesday, June 30, 2021 6:12 PM
> > To: Jerin Jacob 
> > Cc: Andrew Rybchenko ; Ferruh Yigit
> > ; Ray Kinsella ; Neil Horman
> > ; dpdk-dev ; Andrew
> > Rybchenko ; NBU-Contact-Thomas Monjalon
> > ; David Marchand ;
> > Qi Z Zhang ; Raslan Darawsheh
> > ; Jerin Jacob Kollanukkaran 
> > Subject: Re: [dpdk-dev] [PATCH] doc: announce common prefix for ethdev
> > 
> > On Wed, Jun 30, 2021 at 8:03 AM Jerin Jacob 
> > wrote:
> > >
> > > On Wed, Jun 30, 2021 at 7:28 PM Andrew Rybchenko
> > >  wrote:
> > > >
> > > > On 6/30/21 12:21 PM, Ferruh Yigit wrote:
> > > > > Announce adding 'RTE_ETH_' prefix to all public ethdev macros/enums
> > on
> > > > > v21.11.
> > > > > Backward compatibility macros will be added on v21.11 and they will be
> > > > > removed on v22.11.
> > > > >
> > > > > Signed-off-by: Ferruh Yigit 
> > > > > ---
> > > > > Cc: Andrew Rybchenko 
> > > > > Cc: Thomas Monjalon 
> > > > > Cc: David Marchand 
> > > > > Cc: Qi Z Zhang 
> > > > > Cc: Raslan Darawsheh 
> > > > > Cc: Ajit Khaparde 
> > > > > Cc: Jerin Jacob Kollanukkaran 
> > > > > ---
> > > > >   doc/guides/rel_notes/deprecation.rst | 5 +
> > > > >   1 file changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > > > > index 9584d6bfd723..ae79673e37e3 100644
> > > > > --- a/doc/guides/rel_notes/deprecation.rst
> > > > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > > > @@ -118,6 +118,11 @@ Deprecation Notices
> > > > > consistent with existing outer header checksum status flag naming,
> > which
> > > > > should help in reducing confusion about its usage.
> > > > >
> > > > > +* ethdev: Will add 'RTE_ETH_' prefix to all ethdev macros/enums in
> > v21.11. Macros
> > > > > +  will be added for backward compatibility. Backward compatibility
> > macros will be
> > > > > +  removed on v22.11. A few old backward compatibility macros from
> > 2013 that does
> > > > > +  not have proper prefix will be removed on v21.11.
> > > > > +
> > > > >   * i40e: As there are both i40evf and iavf pmd, the functions of them
> > are
> > > > > duplicated. And now more and more advanced features are
> > developed on iavf.
> > > > > To keep consistent with kernel driver's name
> > > > >
> > > >
> > > > Acked-by: Andrew Rybchenko 
> > > Acked-by: Jerin Jacob 
> > Acked-by: Ajit Khaparde 
> Acked-by: Raslan Darawsheh 
Acked-by: Tyler Retzlaff

Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs

2021-06-30 Thread Tyler Retzlaff

On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
> 
> 
> >> +Promotion to stable
> >> +~~~
> >> +
> >> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable 
> >> API
> >> +once a maintainer and/or the original contributor is satisfied that the 
> >> API is
> >> +reasonably mature. In exceptional circumstances, should an API still be
> > 
> > this seems vague and arbitrary. is there a way we can have a more
> > quantitative metric for what "reasonably mature" means.
> > 
> >> +classified as ``experimental`` after two years and is without any 
> >> prospect of
> >> +becoming part of the stable API. The API will then become a candidate for
> >> +removal, to avoid the acculumation of abandoned symbols.
> > 
> > i think with the above comment the basis for removal then depends on
> > whatever metric is used to determine maturity. 
> > if it is still changing
> > then it seems like it is useful and still evolving so perhaps should not
> > be removed but hasn't changed but doesn't meet the metric for being made
> > stable then perhaps it becomes a candidate for removal.
> 
> Good idea. 
> 
> I think it is reasonable to add a clause that indicates that any change 
> to the "API signature" would reset the clock.

a time based strategy works but i guess the follow-on to that is how is
the clock tracked and how does it get updated? i don't think trying to
troll through git history will be effective.

one nit, i think "api signature" doesn't cover all cases of what i would
regard as change. i would prefer to define it as "no change where api/abi
compatibility or semantic change occurred"? which is a lot more strict
but in practice is necessary to support binaries when abi/api is stable.

i.e. if a recompile is necessary with or without code change then it's a
change.

> 
> However equally any changes to the implementation do not reset the clock.
> 
> Would that work?

that works for me.

> 
> > 
> >> +
> >> +The promotion or removal of symbols will typically form part of a 
> >> conversation
> >> +between the maintainer and the original contributor.
> > 
> > this should extend beyond just symbols. there are other changes that
> > impact the abi where exported symbols don't change. e.g. additions to
> > return values sets.> 
> > thanks for working on this.
> >

Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test

2021-06-30 Thread Honnappa Nagarahalli



> >
> > As I mentioned earlier in this thread, GCC supports 2 types of
> > atomics. "Use GCC atomic builtins" does not help distinguish between
> > them. In "GCC's C11 atomic builtins" - "C11" indicates which atomics
> > we are using, "atomic builtins" indicates that we are NOT using APIs
> > from stdatomic.h
> 
> if you need a term to distinguish the two sets of atomics in gcc you can 
> qualify
> it with "Memory Model Aware" which is straight from the gcc manual.
"Memory model aware" sounds too generic. The same page [1] also makes it clear 
that the built-in functions match the requirements for the C11 memory model.

There are also several patches merged in the past which do not use the term 
"memory model aware". I would prefer to be consistent.

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test

2021-06-30 Thread Tyler Retzlaff

On Wed, Jun 30, 2021 at 08:25:44PM +, Honnappa Nagarahalli wrote:
> 
> 
> > >
> > > As I mentioned earlier in this thread, GCC supports 2 types of
> > > atomics. "Use GCC atomic builtins" does not help distinguish between
> > > them. In "GCC's C11 atomic builtins" - "C11" indicates which atomics
> > > we are using, "atomic builtins" indicates that we are NOT using APIs
> > > from stdatomic.h
> > 
> > if you need a term to distinguish the two sets of atomics in gcc you can 
> > qualify
> > it with "Memory Model Aware" which is straight from the gcc manual.
> "Memory model aware" sounds too generic. The same page [1] also makes it 
> clear that the built-in functions match the requirements for the C11 memory 
> model.

allow me to put your interpretation of the manual that you linked side
by side with what the manual text actually says verbatim.

your text from above
  "built-in functions match the requirements for the C11 memory model."

the actual text from your link
  "built-in functions approximately match the requirements for the C++11 memory 
model."

* you've chosen to drop approximately from the wording to try and make
  your argument.

* you've also chosen to substitute C11 in place of C++11. again
  presumably for the same reason.

in fact the entire page does not mention C11 even once, it also goes on
to highlight a specific deviation from C++11 with this excerpt "because
of a deficiency in C++11's semantics for memory_order_consume"

> There are also several patches merged in the past which do not use the term 
> "memory model aware". I would prefer to be consistent.

i prefer the history represent the change. that previous submitters and
reviewers lacked precision is not my concern nor is consistency a reason
to continue documenting history incorrectly.

i'm waiting to ack the change, it's up to you. you've already spent more
time arguing than it would have taken to submit a v2 correcting the
problem.

> 
> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test

2021-06-30 Thread Honnappa Nagarahalli



> >
> > > >
> > > > As I mentioned earlier in this thread, GCC supports 2 types of
> > > > atomics. "Use GCC atomic builtins" does not help distinguish
> > > > between them. In "GCC's C11 atomic builtins" - "C11" indicates
> > > > which atomics we are using, "atomic builtins" indicates that we
> > > > are NOT using APIs from stdatomic.h
> > >
> > > if you need a term to distinguish the two sets of atomics in gcc you
> > > can qualify it with "Memory Model Aware" which is straight from the gcc
> manual.
> > "Memory model aware" sounds too generic. The same page [1] also makes
> it clear that the built-in functions match the requirements for the C11
> memory model.
> 
> allow me to put your interpretation of the manual that you linked side by side
> with what the manual text actually says verbatim.
> 
> your text from above
>   "built-in functions match the requirements for the C11 memory model."
> 
> the actual text from your link
>   "built-in functions approximately match the requirements for the C++11
> memory model."
> 
> * you've chosen to drop approximately from the wording to try and make
>   your argument.
I am not sure how this makes a difference to our arguments. For ex: there are 
no other built in functions that "exactly" match the C++11 memory model 
supported by GCC.

> 
> * you've also chosen to substitute C11 in place of C++11. again
>   presumably for the same reason.
> 
> in fact the entire page does not mention C11 even once, it also goes on to
> highlight a specific deviation from C++11 with this excerpt "because of a
> deficiency in C++11's semantics for memory_order_consume"
I do not have a problem to call it C++11. IMO, calling it "GCC's C++11 ..." 
will address this deviation and the approximation.

> 
> > There are also several patches merged in the past which do not use the term
> "memory model aware". I would prefer to be consistent.
> 
> i prefer the history represent the change. that previous submitters and
> reviewers lacked precision is not my concern nor is consistency a reason to
> continue documenting history incorrectly.
Ok. As I mentioned, it is just my preference.

> 
> i'm waiting to ack the change, it's up to you. you've already spent more time
> arguing than it would have taken to submit a v2 correcting the problem.
I am not arguing for the sake of arguing. You are trying to correct few 
mistakes here (I truly appreciate that) and I am trying to explain my POV and 
making corrections as needed. I am sure we will conclude soon.

> 
> >
> > [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

Re: [dpdk-dev] [PATCH v1] lib/eal: enforce alarm APIs parameters check

2021-06-30 Thread Dmitry Kozlyuk

Hi Jie,

2021-06-23 17:36 (UTC-0700), Jie Zhou:
> From: Jie Zhou 
> 
> lib/eal alarm APIs rte_eal_alarm_set and rte_eal_alarm_cancel
> on Windows do not check parameters to fail fast for invalid
> parameters, which captured by DPDK UT alarm_autotest.

Please use past tense to describe situation before the patch.
A nit, but browsing the log, I see that errors are usually "caught"
rather then "captured"; consistency would be nice.

> 
> Enforce Windows lib/eal alarm APIs parameters check and log
> invalid parameter info.

Fixes tag needed.

> Signed-off-by: Jie Zhou 
> Signed-off-by: Jie Zhou 
> 
> ---
>  lib/eal/windows/eal_alarm.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
> index f5bf88715a..7bb79ae869 100644
> --- a/lib/eal/windows/eal_alarm.c
> +++ b/lib/eal/windows/eal_alarm.c
> @@ -4,6 +4,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -91,6 +92,22 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback 
> cb_fn, void *cb_arg)
>   LARGE_INTEGER deadline;
>   int ret;
>  
> + /* Check if us is valid */
> + if (us < 1 || us >(UINT64_MAX - US_PER_S)) {

This condition is specific to Linux EAL. In fact, it's not very useful even
there, because actual upper bound for `us` depends on current time.
No bounds are specified in API description at all.
Windows check would be different, but these considerations remain valid.

Maybe it's alarm_autotest or API description that needs adjustments,
but not the implementation. I understand that you're enabling UT for Windows
and not correcting tests themselves, but I'm against inserting checks known
to be incorrect.

> + RTE_LOG(ERR, EAL, "Invalid us: %" PRIu64 "\n"
> + "Valid us range is 1 to (UINT64_MAX - US_PER_S)\n",
> + us);

Why does Windows need these messages, while Linux and FreeBSD don't?
How will printing API contract here help the user who gets the message?

> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + /* Check if callback is not NULL */
> + if (!cb_fn) {

Pointers (`cb_fn`) must be checked for `NULL` explicitly.
You won't need an obvious comment after that.

> + RTE_LOG(ERR, EAL, "NULL callback\n");
> + ret = -EINVAL;
> + goto exit;
> + }
> +
>   /* Calculate deadline ASAP, unit of measure = 100ns. */
>   GetSystemTimePreciseAsFileTime(&ft);
>   deadline.LowPart = ft.dwLowDateTime;
> @@ -180,6 +197,12 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void 
> *cb_arg)
>   bool executing;
>  
>   removed = 0;
> +
> + if (!cb_fn) {
> + RTE_LOG(ERR, EAL, "NULL callback\n");
> + return -EINVAL;
> + }
> +
>   do {
>   executing = false;
>  

Please also fix other style issues:
http://mails.dpdk.org/archives/test-report/2021-June/200580.html

Re: [dpdk-dev] Question about 'rxm->hash.rss' and 'mb->hash.fdir'

2021-06-30 Thread Min Hu (Connor)


Hi, Slava

在 2021/6/30 19:44, Slava Ovsiienko 写道:

Hi,


-Original Message-
From: Min Hu (Connor) 
Sent: Wednesday, June 30, 2021 14:22
To: Ferruh Yigit ; dev@dpdk.org; NBU-Contact-Thomas
Monjalon ; Andrew Rybchenko

Cc: Beilei Xing ; Matan Azrad ;
Shahaf Shuler ; Slava Ovsiienko

Subject: Re: Question about 'rxm->hash.rss' and 'mb->hash.fdir'

Hi, Beilei, Matan, Shahaf, Viacheslav,

how about your opinion?

在 2021/6/30 17:34, Ferruh Yigit 写道:

On 6/30/2021 3:45 AM, Min Hu (Connor) wrote:

Hi, all
  one question about 'rxm->hash.rss' and 'mb->hash.fdir'.

  In Rx recv packets function,
  'rxm->hash.rss' will report rss hash result from Rx desc.
  'rxm->hash.fdir' will report filter identifier from Rx desc.

  But function implementation differs from some PMDs. for example:
  i40e, MLX5 report the two at the same time if pkt_flags is set,like:
**
      if (pkt_flags & PKT_RX_RSS_HASH) {
      rxm->hash.rss =
rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
      }
      if (pkt_flags & PKT_RX_FDIR) {
      mb->hash.fdir.hi =
      rte_le_to_cpu_32(rxdp->wb.qword3.hi_dword.fd_id);
      }


  While, ixgbe only report one of the two. like:
**
      if (likely(pkt_flags & PKT_RX_RSS_HASH))
      mb->hash.rss = rte_le_to_cpu_32(
      rxdp[j].wb.lower.hi_dword.rss);
      else if (pkt_flags & PKT_RX_FDIR) {
      mb->hash.fdir.hash = rte_le_to_cpu_16(
      rxdp[j].wb.lower.hi_dword.csum_ip.csum) &
      IXGBE_ATR_HASH_MASK;
      mb->hash.fdir.id = rte_le_to_cpu_16(
      rxdp[j].wb.lower.hi_dword.csum_ip.ip_id);
      }

  So, what is application scenario for 'rxm->hash.rss' and
'mb->hash.fdir', that is, why the two should be reported? How about
reporting the two at the same time?
  Thanks for  your reply.



Hi Connor,

mbuf->hash is union, so it is not possible to set both 'hash.rss' & 'hash.fdir'.


hash.rss is uint32_t and shares the memory with hash.dir.lo.
hash.dir.hi is untouched by access to hash.rss.
Hence, IIUC, we can provide both valid hash.rss and hash.fdir.hi at the same 
time.


if reported at the same time, what do users（or APP）use them for ?

At least mlx5 provides both (at least if CQE compression option allows it).
RSS hash is provided in the hash.rss, and MARK RTE Flow action result is
reported in hash.fdir.hi in independent way.



I assume for i40e & mlx5 case 'pkt_flags' indicate which one is valid
and only one is set in practice. Cc'ed driver mainteriners for more comment.


Thanks Ferruh,
another question, why does user need this information:  rxm->hash.rss
or mb->hash.fdir.hi ? what is the function?


IIRC, hash.rss is the lower bits if calculated hash function result over the 
packet.
hash.fdir.hi is the result of MARK RTE Flow action (at least for mlx5).


Thanks for your reply.

With best regards,
Slava

Re: [dpdk-dev] Question about 'rxm->hash.rss' and 'mb->hash.fdir'

2021-06-30 Thread Xing, Beilei



> -Original Message-
> From: Min Hu (Connor) 
> Sent: Wednesday, June 30, 2021 7:22 PM
> To: Yigit, Ferruh ; dev@dpdk.org; Thomas Monjalon
> ; Andrew Rybchenko
> 
> Cc: Xing, Beilei ; Matan Azrad
> ; shah...@nvidia.com; viachesl...@nvidia.com
> Subject: Re: Question about 'rxm->hash.rss' and 'mb->hash.fdir'
> 
> Hi, Beilei, Matan, Shahaf, Viacheslav,
> 
>   how about your opinion?

Agree with Ferruh.

> 
> 在 2021/6/30 17:34, Ferruh Yigit 写道:
> > On 6/30/2021 3:45 AM, Min Hu (Connor) wrote:
> >> Hi, all
> >>  one question about 'rxm->hash.rss' and 'mb->hash.fdir'.
> >>
> >>  In Rx recv packets function,
> >>  'rxm->hash.rss' will report rss hash result from Rx desc.
> >>  'rxm->hash.fdir' will report filter identifier from Rx desc.
> >>
> >>  But function implementation differs from some PMDs. for example:
> >>  i40e, MLX5 report the two at the same time if pkt_flags is set,like:
> >> **
> >>      if (pkt_flags & PKT_RX_RSS_HASH) {
> >>      rxm->hash.rss =
> >> rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
> >>      }
> >>      if (pkt_flags & PKT_RX_FDIR) {
> >>      mb->hash.fdir.hi =
> >>      rte_le_to_cpu_32(rxdp->wb.qword3.hi_dword.fd_id);
> >>      }
> >> 
> >>
> >>  While, ixgbe only report one of the two. like:
> >> **
> >>      if (likely(pkt_flags & PKT_RX_RSS_HASH))
> >>      mb->hash.rss = rte_le_to_cpu_32(
> >>      rxdp[j].wb.lower.hi_dword.rss);
> >>      else if (pkt_flags & PKT_RX_FDIR) {
> >>      mb->hash.fdir.hash = rte_le_to_cpu_16(
> >>      rxdp[j].wb.lower.hi_dword.csum_ip.csum) &
> >>      IXGBE_ATR_HASH_MASK;
> >>      mb->hash.fdir.id = rte_le_to_cpu_16(
> >>      rxdp[j].wb.lower.hi_dword.csum_ip.ip_id);
> >>      }
> >> 
> >>  So, what is application scenario for 'rxm->hash.rss' and
> >> 'mb->hash.fdir', that is, why the two should be reported? How about
> >> reporting the two at the same time?
> >>  Thanks for  your reply.
> >
> >
> > Hi Connor,
> >
> > mbuf->hash is union, so it is not possible to set both 'hash.rss' & 
> > 'hash.fdir'.
> >
> > I assume for i40e & mlx5 case 'pkt_flags' indicate which one is valid
> > and only one is set in practice. Cc'ed driver mainteriners for more comment.
> 
> Thanks Ferruh,
>   another question, why does user need this information:  rxm-
> >hash.rss or mb->hash.fdir.hi ? what is the function?
> 
> > .
> >

Re: [dpdk-dev] [PATCH v3] vhost: enable IOMMU for async vhost

2021-06-30 Thread Ding, Xuan

Hi Maxime,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, June 29, 2021 5:23 PM
> To: Ding, Xuan ; Xia, Chenbo 
> Cc: dev@dpdk.org; Hu, Jiayu ; Pai G, Sunil
> ; Richardson, Bruce ; Van
> Haaren, Harry ; Liu, Yong ;
> Jiang, Cheng1 
> Subject: Re: [PATCH v3] vhost: enable IOMMU for async vhost
> 
> Hi Xuan,
> 
> On 6/22/21 8:18 AM, Ding, Xuan wrote:
> > Hi Maxime,
> >
> > Replies are inline.
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Saturday, June 19, 2021 12:18 AM
> >> To: Ding, Xuan ; Xia, Chenbo 
> >> Cc: dev@dpdk.org; Hu, Jiayu ; Pai G, Sunil
> >> ; Richardson, Bruce ;
> Van
> >> Haaren, Harry ; Liu, Yong 
> >> Subject: Re: [PATCH v3] vhost: enable IOMMU for async vhost
> >>
> >> Hi Xuan,
> >>
> >> On 6/3/21 7:30 PM, xuan.d...@intel.com wrote:
> >>> From: Xuan Ding 
> >>>
> >>> For async copy, it is unsafe to directly use the physical address.
> >>> And current address translation from GPA to HPA via SW also takes
> >>> CPU cycles, these can all benefit from IOMMU.
> >>>
> >>> Since the existing DMA engine supports to use platform IOMMU,
> >>> this patch enables IOMMU for async vhost, which defines IOAT
> >>> devices to use virtual address instead of physical address.
> >>
> >> We have to keep in mind a generic DMA api is coming, and maybe we want
> >> a SW implementation of a dmadev based on memcpy at least for
> >> testing/debugging purpose.
> >
> > I noticed the generic dmadev model is under discussion. To support a SW
> > implementation, the VA mode support is needed, this is also the problem
> > that this patch hopes to solve. Traditionally, DMA engine can only use
> > physical address in PA mode.
> >
> >>
> >>> When set memory table, the frontend's memory will be mapped
> >>> to the default container of DPDK where IOAT devices have been
> >>> added into. When DMA copy fails, the virtual address provided
> >>> to IOAT devices also allow us fallback to SW copy or PA copy.
> >>>
> >>> With IOMMU enabled, to use IOAT devices:
> >>> 1. IOAT devices must be binded to vfio-pci, rather than igb_uio.
> >>> 2. DPDK must use "--iova-mode=va".
> >>
> >> I think this is problematic, at least we need to check the right iova
> >> mode has been selected, but even with doing that it is limiting.
> >
> > As a library, vhost is not aware of the device selected address(PA or VA)
> > and current DPDK iova mode. To some extent, this patch is a proposal.
> 
> If I'm not mistaken, the DMA device driver init should fail if it does
> not support the DPDK IOVA mode.
> 
> Then, on Vhost lib side, you should be able to get the IOVA mode by
> using the rte_eal_iova_mode() API.

Get your point, if the Vhost lib is able to get the IOVA mode, I think it is
possible to be compatible with different DPDK IOVA mode. I will work out
a new patch to pass iova to callback instead of virtual address only.

> 
> > With device fed with VA, SW fallback can be supported.
> > And VA can also be translated to PA through rte_mem_virt2iova().
> > Finally, the address selected by the device is determined by callback.
> > Not vice versa.
> >
> > If the DMA callback implementer follows this design, SW fallback can be
> supported.
> > I would be very grateful if you could provide some insights for this 
> > design. :)
> 
> TBH, I find the async design too much complicated.
> Having some descriptors handled by the DMA engine, others by the CPU
> makes it extremly hard to debug. Also, it makes Vhost library use less
> deterministic.

Here is to consider the difference in copy efficiency. In the case of small 
packages
(less than threshold), the performance of CPU copy is better. When the package
length is bigger than the threshold, the DMA engine copy will get a big 
performance
improvement.

> 
> >>
> >> What prevent us to reuse add_guest_pages() alogrithm to implement
> >> IOVA_AS_PA?
> >
> > If IOVA is PA, it's not easy to translate PA to VA to support SW 
> > implementation.
> 
> What prevent you to use dev->guest_pages[] in that case to do the
> translations?

Yes, you are right, use rte_mem_iova2virt() can help to do so.

> 
> > Until now, I don't have any good ideas to be compatible with IOVA_AS_PA
> > and IOVA_AS_VA at the same time, because it requires vhost library to
> > select PA/VA for DMA device according to different DPDK iova mode.
> 
> If the DMA device claims to support IOVA_AS_PA at probe time, it should
> be useable by the vhost library. It might not be the more efficient
> mode, but we cannot just have a comment in the documenation saying that
> IOVA_AS_VA is the only supported mode, without any safety check in the
> code itself.

Compatible with IOVA_AS_PA is possible, the reason for IOVA_AS_VA design
is because VA is easier to operate. Thus some compatibility is sacrificed.

I will adopt your suggestion, thanks very much!

Regards,
Xuan

> 
> Regards,
> Maxime
> 
> > Thanks,
> > Xuan
> >
> >>
> >>>
> >>> Signed-off-by: Xuan Ding 
> >>> ---
> >>>
> >>> v3:
> >>> * Fixed som

[dpdk-dev] [PATCH] net/mlx5: fix match MPLS over GRE with key

2021-06-30 Thread Xiaoyu Min

Currently PMD needs previous layer information in order to set
corresponding match field for MPLSoGRE or MPLSoUDP.

GRE_KEY item is missing as supported previous layer when translate
item MPLS, which causes flow[1] cannot match MPLS over GRE traffic.

According to RFC4023, MPLS over GRE tunnel with optional key
field needs to be supported too.

By adding missing GRE_KEY as supported previous layer fix problem.

[1]:
flow create 0 ingress pattern eth / ipv6 / gre k_bit is 1 / gre_key /
mpls label is 966138 / end actions queue index 1 / mark id 0xa / end

Fixes: a7a0365565a4 ("net/mlx5: match GRE key and present bits")
Cc: sta...@dpdk.org

Signed-off-by: Xiaoyu Min 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a04a3c2bb8..feeeaf6a1d 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -9027,6 +9027,8 @@ flow_dv_translate_item_mpls(void *matcher, void *key,
 MLX5_UDP_PORT_MPLS);
break;
case MLX5_FLOW_LAYER_GRE:
+   /* Fall-through. */
+   case MLX5_FLOW_LAYER_GRE_KEY:
MLX5_SET(fte_match_set_misc, misc_m, gre_protocol, 0x);
MLX5_SET(fte_match_set_misc, misc_v, gre_protocol,
 RTE_ETHER_TYPE_MPLS);
-- 
2.32.0

[dpdk-dev] [PATCH v2] app/eventdev: add option to enable per port pool

2021-06-30 Thread pbhagavatula

From: Pavan Nikhilesh 

Add option to configure unique mempool for each ethernet device
port. Can be used with `pipeline_atq` and `pipeline_queue` tests.

Signed-off-by: Pavan Nikhilesh 
---
 v2 Changes:
 - Fix compilation.
 - Rebase on next-event.

 app/test-eventdev/evt_common.h   |  1 +
 app/test-eventdev/evt_options.c  | 10 +
 app/test-eventdev/evt_options.h  |  1 +
 app/test-eventdev/test_pipeline_common.c | 52 +---
 app/test-eventdev/test_pipeline_common.h |  2 +-
 doc/guides/tools/testeventdev.rst|  8 
 6 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/app/test-eventdev/evt_common.h b/app/test-eventdev/evt_common.h
index 0e228258e..28afb114b 100644
--- a/app/test-eventdev/evt_common.h
+++ b/app/test-eventdev/evt_common.h
@@ -55,6 +55,7 @@ struct evt_options {
uint8_t timdev_cnt;
uint8_t nb_timer_adptrs;
uint8_t timdev_use_burst;
+   uint8_t per_port_pool;
uint8_t sched_type_list[EVT_MAX_STAGES];
uint16_t mbuf_sz;
uint16_t wkr_deq_dep;
diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
index 061b63e12..b0bcbc6c9 100644
--- a/app/test-eventdev/evt_options.c
+++ b/app/test-eventdev/evt_options.c
@@ -297,6 +297,13 @@ evt_parse_eth_queues(struct evt_options *opt, const char 
*arg)
return ret;
 }

+static int
+evt_parse_per_port_pool(struct evt_options *opt, const char *arg __rte_unused)
+{
+   opt->per_port_pool = 1;
+   return 0;
+}
+
 static void
 usage(char *program)
 {
@@ -333,6 +340,7 @@ usage(char *program)
"\t--enable_vector: enable event vectorization.\n"
"\t--vector_size  : Max vector size.\n"
"\t--vector_tmo_ns: Max vector timeout in nanoseconds\n"
+   "\t--per_port_pool: Configure unique pool per ethdev port\n"
);
printf("available tests:\n");
evt_test_dump_names();
@@ -408,6 +416,7 @@ static struct option lgopts[] = {
{ EVT_ENA_VECTOR,  0, 0, 0 },
{ EVT_VECTOR_SZ,   1, 0, 0 },
{ EVT_VECTOR_TMO,  1, 0, 0 },
+   { EVT_PER_PORT_POOL,   0, 0, 0 },
{ EVT_HELP,0, 0, 0 },
{ NULL,0, 0, 0 }
 };
@@ -446,6 +455,7 @@ evt_opts_parse_long(int opt_idx, struct evt_options *opt)
{ EVT_ENA_VECTOR, evt_parse_ena_vector},
{ EVT_VECTOR_SZ, evt_parse_vector_size},
{ EVT_VECTOR_TMO, evt_parse_vector_tmo_ns},
+   { EVT_PER_PORT_POOL, evt_parse_per_port_pool},
};

for (i = 0; i < RTE_DIM(parsermap); i++) {
diff --git a/app/test-eventdev/evt_options.h b/app/test-eventdev/evt_options.h
index 1cea2a3e1..6436200b4 100644
--- a/app/test-eventdev/evt_options.h
+++ b/app/test-eventdev/evt_options.h
@@ -46,6 +46,7 @@
 #define EVT_ENA_VECTOR   ("enable_vector")
 #define EVT_VECTOR_SZ("vector_size")
 #define EVT_VECTOR_TMO   ("vector_tmo_ns")
+#define EVT_PER_PORT_POOL   ("per_port_pool")
 #define EVT_HELP ("help")

 void evt_options_default(struct evt_options *opt);
diff --git a/app/test-eventdev/test_pipeline_common.c 
b/app/test-eventdev/test_pipeline_common.c
index d5ef90500..6ee530d4c 100644
--- a/app/test-eventdev/test_pipeline_common.c
+++ b/app/test-eventdev/test_pipeline_common.c
@@ -259,9 +259,10 @@ pipeline_ethdev_setup(struct evt_test *test, struct 
evt_options *opt)
}

for (j = 0; j < opt->eth_queues; j++) {
-   if (rte_eth_rx_queue_setup(i, j, NB_RX_DESC,
-  rte_socket_id(), &rx_conf,
-  t->pool) < 0) {
+   if (rte_eth_rx_queue_setup(
+   i, j, NB_RX_DESC, rte_socket_id(), &rx_conf,
+   opt->per_port_pool ? t->pool[i] :
+ t->pool[0]) < 0) {
evt_err("Failed to setup eth port [%d] 
rx_queue: %d.",
i, 0);
return -EINVAL;
@@ -569,18 +570,35 @@ pipeline_mempool_setup(struct evt_test *test, struct 
evt_options *opt)
if (data_size  > opt->mbuf_sz)
opt->mbuf_sz = data_size;
}
+   if (opt->per_port_pool) {
+   char name[RTE_MEMPOOL_NAMESIZE];
+
+   snprintf(name, RTE_MEMPOOL_NAMESIZE, "%s-%d",
+test->name, i);
+   t->pool[i] = rte_pktmbuf_pool_create(
+   name, /* mempool name */
+   opt->pool_sz, /* number of elements*/
+   0,/* cache size*/
+

1 2 >

1 - 100 of 111 matches

Mail list logo