date:20230412

Re: Hardware timestamps

2023-04-12 Thread David Marchand

Hello,

On Wed, Apr 12, 2023 at 8:57 AM Игорь К  wrote:
>
> I have tried to get hw timestamps of packets on Intel X550T(10G) and 
> AQC107(10G).
> But RTE_ETH_RX_OFFLOAD_TIMESTAMP = 0 in dev_info.rx_offload_capa.
> Tell me is it possible to get hw timestamps on these NICs with DPDK?

Looking at the doc, and the code, none of those drivers (net/ixgbe,
net/atlantic) seem to support this feature.
I'll let the maintainers reply on the feasibility.

> Which 10G NICs support this option?

The documentation from the main repository provides a list of per
driver features:
https://doc.dpdk.org/guides/nics/overview.html

For an already released DPDK, the link becomes
https://doc.dpdk.org/guides-/nics/overview.html.
Like, for example: https://doc.dpdk.org/guides-22.11/nics/overview.html

-- 
David Marchand

RE: [EXT] [PATCH v2 25/44] net/octeontx: fix segment fault when parse devargs

2023-04-12 Thread Harman Kalra

Hi,

Thanks for fixing the seg fault.

Acked-by: Harman Kalra 

Thanks
Harman

> -Original Message-
> From: Chengwen Feng 
> Sent: Monday, March 20, 2023 2:51 PM
> To: tho...@monjalon.net; ferruh.yi...@amd.com; Harman Kalra
> ; Santosh Shukla
> ; Jerin Jacob
> 
> Cc: dev@dpdk.org
> Subject: [EXT] [PATCH v2 25/44] net/octeontx: fix segment fault when parse
> devargs
> 
> External Email
> 
> --
> The rte_kvargs_process() was used to parse KV pairs, it also supports to
> parse 'only keys' (e.g. socket_id) type. And the callback function parameter
> 'value' is NULL when parsed 'only keys'.
> 
> This patch fixes segment fault when parse input args with 'only keys'.
> 
> Fixes: f7be70e5130e ("net/octeontx: add net device probe and remove")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Chengwen Feng 
> ---
>  drivers/net/octeontx/octeontx_ethdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/octeontx/octeontx_ethdev.c
> b/drivers/net/octeontx/octeontx_ethdev.c
> index d52a3e73d5..d11f359c7b 100644
> --- a/drivers/net/octeontx/octeontx_ethdev.c
> +++ b/drivers/net/octeontx/octeontx_ethdev.c
> @@ -70,6 +70,9 @@ parse_integer_arg(const char *key __rte_unused,  {
>   int *i = (int *)extra_args;
> 
> + if (value == NULL)
> + return -EINVAL;
> +
>   *i = atoi(value);
>   if (*i < 0) {
>   octeontx_log_err("argument has to be positive.");
> --
> 2.17.1

Re: [PATCH v2] net: fix return type of IPv6 L4 packet checksum

2023-04-12 Thread Thomas Monjalon

06/04/2023 11:49, eagost...@nvidia.com:
> From: Elena Agostini 
> 
> Function returns 0 or -1 but the return type is uint16_t.
> 
> Fixes: d178f693bbfe ("net: add UDP/TCP checksum in mbuf segments")
> Cc: xiaoyun...@intel.com
> 
> Signed-off-by: Elena Agostini 

The title should be about IPv4, not IPv6.

Re: [PATCH v2] net: fix return type of IPv6 L4 packet checksum

2023-04-12 Thread Thomas Monjalon

12/04/2023 10:22, Thomas Monjalon:
> 06/04/2023 11:49, eagost...@nvidia.com:
> > From: Elena Agostini 
> > 
> > Function returns 0 or -1 but the return type is uint16_t.
> > 
> > Fixes: d178f693bbfe ("net: add UDP/TCP checksum in mbuf segments")
> > Cc: xiaoyun...@intel.com
> > 
> > Signed-off-by: Elena Agostini 
> 
> The title should be about IPv4, not IPv6.

Applied with this fix and adding Cc: sta...@dpdk.org

[PATCH v3] net: fix return type of IPv4 L4 packet checksum

2023-04-12 Thread eagostini

From: Elena Agostini 

Function returns 0 or -1 but the return type is uint16_t.

Fixes: d178f693bbfe ("net: add UDP/TCP checksum in mbuf segments")
Cc: xiaoyun...@intel.com

Signed-off-by: Elena Agostini 

---
V2:
   added fixline and fixed cc address
V3:
   title changed
---
 lib/net/rte_ip.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index a310e9d498..e7106256aa 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -514,7 +514,7 @@ rte_ipv4_udptcp_cksum_verify(const struct rte_ipv4_hdr 
*ipv4_hdr,
  *   Return 0 if the checksum is correct, else -1.
  */
 __rte_experimental
-static inline uint16_t
+static inline int
 rte_ipv4_udptcp_cksum_mbuf_verify(const struct rte_mbuf *m,
  const struct rte_ipv4_hdr *ipv4_hdr,
  uint16_t l4_off)
-- 
2.34.1

[PATCH v2 0/3] Enable iavf Rx Timestamp offload on vector path

2023-04-12 Thread Zhichao Zeng

Enable timestamp offload with the command '--enable-rx-timestamp',
pay attention that getting Rx timestamp offload will drop the performance.

---
v2: fix compile warning and SSE path

Zhichao Zeng (3):
  net/iavf: support Rx timestamp offload on AVX512
  net/iavf: support Rx timestamp offload on AVX2
  net/iavf: support Rx timestamp offload on SSE

 drivers/net/iavf/iavf_rxtx.h|   3 +-
 drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 196 ++-
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 203 +++-
 drivers/net/iavf/iavf_rxtx_vec_common.h |   3 -
 drivers/net/iavf/iavf_rxtx_vec_sse.c| 161 ++-
 5 files changed, 546 insertions(+), 20 deletions(-)

-- 
2.25.1

[PATCH v2 1/3] net/iavf: support Rx timestamp offload on AVX512

2023-04-12 Thread Zhichao Zeng

This patch enables Rx timestamp offload on AVX512 data path.

Enable timestamp offload with the command '--enable-rx-timestamp',
pay attention that getting Rx timestamp offload will drop the performance.

Signed-off-by: Wenjun Wu 
Signed-off-by: Zhichao Zeng 

---
v2: fix compile warning
---
 drivers/net/iavf/iavf_rxtx.h|   3 +-
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 203 +++-
 drivers/net/iavf/iavf_rxtx_vec_common.h |   3 -
 3 files changed, 200 insertions(+), 9 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx.h b/drivers/net/iavf/iavf_rxtx.h
index 09e2127db0..97b5e86f6e 100644
--- a/drivers/net/iavf/iavf_rxtx.h
+++ b/drivers/net/iavf/iavf_rxtx.h
@@ -44,7 +44,8 @@
RTE_ETH_RX_OFFLOAD_CHECKSUM |\
RTE_ETH_RX_OFFLOAD_SCTP_CKSUM |  \
RTE_ETH_RX_OFFLOAD_VLAN |\
-   RTE_ETH_RX_OFFLOAD_RSS_HASH)
+   RTE_ETH_RX_OFFLOAD_RSS_HASH |\
+   RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 
 /**
  * According to the vlan capabilities returned by the driver and FW, the vlan 
tci
diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c 
b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
index bd2788121b..c0a4fce120 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c
@@ -16,18 +16,20 @@
 /**
  * If user knows a specific offload is not enabled by APP,
  * the macro can be commented to save the effort of fast path.
- * Currently below 2 features are supported in RX path,
+ * Currently below 6 features are supported in RX path,
  * 1, checksum offload
  * 2, VLAN/QINQ stripping
  * 3, RSS hash
  * 4, packet type analysis
  * 5, flow director ID report
+ * 6, timestamp offload
  
**/
 #define IAVF_RX_CSUM_OFFLOAD
 #define IAVF_RX_VLAN_OFFLOAD
 #define IAVF_RX_RSS_OFFLOAD
 #define IAVF_RX_PTYPE_OFFLOAD
 #define IAVF_RX_FDIR_OFFLOAD
+#define IAVF_RX_TS_OFFLOAD
 
 static __rte_always_inline void
 iavf_rxq_rearm(struct iavf_rx_queue *rxq)
@@ -587,9 +589,9 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct 
iavf_rx_queue *rxq,
bool offload)
 {
struct iavf_adapter *adapter = rxq->vsi->adapter;
-
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
uint64_t offloads = adapter->dev_data->dev_conf.rxmode.offloads;
-
+#endif
 #ifdef IAVF_RX_PTYPE_OFFLOAD
const uint32_t *type_table = adapter->ptype_tbl;
 #endif
@@ -618,6 +620,25 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct 
iavf_rx_queue *rxq,
  rte_cpu_to_le_32(1 << IAVF_RX_FLEX_DESC_STATUS0_DD_S)))
return 0;
 
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
+#ifdef IAVF_RX_TS_OFFLOAD
+   uint8_t inflection_point = 0;
+   bool is_tsinit = false;
+   __m256i hw_low_last;
+
+   if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+   uint64_t sw_cur_time = rte_get_timer_cycles() / 
(rte_get_timer_hz() / 1000);
+
+   if (unlikely(sw_cur_time - rxq->hw_time_update > 4)) {
+   hw_low_last = _mm256_setzero_si256();
+   is_tsinit = 1;
+   } else {
+   hw_low_last = _mm256_set_epi32(0, 0, 0, 0, 0, 0, 0, 
(uint32_t)rxq->phc_time);
+   }
+   }
+#endif
+#endif
+
/* constants used in processing loop */
const __m512i crc_adjust =
_mm512_set_epi32
@@ -1081,12 +1102,13 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct 
iavf_rx_queue *rxq,
 
 #ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
if (offload) {
-#ifdef IAVF_RX_RSS_OFFLOAD
+#if defined(IAVF_RX_RSS_OFFLOAD) || defined(IAVF_RX_TS_OFFLOAD)
/**
 * needs to load 2nd 16B of each desc for RSS hash 
parsing,
 * will cause performance drop to get into this context.
 */
if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH ||
+   offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP ||
rxq->rx_flags & 
IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
/* load bottom half of every 32B desc */
const __m128i raw_desc_bh7 =
@@ -1138,6 +1160,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct 
iavf_rx_queue *rxq,

(_mm256_castsi128_si256(raw_desc_bh0),
 raw_desc_bh1, 1);
 
+#ifdef IAVF_RX_RSS_OFFLOAD
if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) {
/**
 * to shift the 32b RSS hash value to 
the
@@ -1278,7 +1301,125 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct 
iavf_rx_queue *rxq,

[PATCH v2 2/3] net/iavf: support Rx timestamp offload on AVX2

2023-04-12 Thread Zhichao Zeng

This patch enables Rx timestamp offload on AVX2 data path.

Enable timestamp offload with the command '--enable-rx-timestamp',
pay attention that getting Rx timestamp offload will drop the performance.

Signed-off-by: Zhichao Zeng 

---
v2: fix compile warning
---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c | 196 +-
 1 file changed, 189 insertions(+), 7 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c 
b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
index b4ebac9d34..12bbfba431 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
@@ -525,8 +525,9 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue 
*rxq,
 #define IAVF_DESCS_PER_LOOP_AVX 8
 
struct iavf_adapter *adapter = rxq->vsi->adapter;
-
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
uint64_t offloads = adapter->dev_data->dev_conf.rxmode.offloads;
+#endif
const uint32_t *type_table = adapter->ptype_tbl;
 
const __m256i mbuf_init = _mm256_set_epi64x(0, 0,
@@ -553,6 +554,22 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue 
*rxq,
rte_cpu_to_le_32(1 << IAVF_RX_FLEX_DESC_STATUS0_DD_S)))
return 0;
 
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
+   bool is_tsinit = false;
+   uint8_t inflection_point = 0;
+   __m256i hw_low_last;
+   if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+   uint64_t sw_cur_time = rte_get_timer_cycles() / 
(rte_get_timer_hz() / 1000);
+
+   if (unlikely(sw_cur_time - rxq->hw_time_update > 4)) {
+   hw_low_last = _mm256_setzero_si256();
+   is_tsinit = 1;
+   } else {
+   hw_low_last = _mm256_set_epi32(0, 0, 0, 0, 0, 0, 0, 
rxq->phc_time);
+   }
+   }
+#endif
+
/* constants used in processing loop */
const __m256i crc_adjust =
_mm256_set_epi16
@@ -957,11 +974,12 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct 
iavf_rx_queue *rxq,
 
 #ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
/**
-* needs to load 2nd 16B of each desc for RSS hash parsing,
+* needs to load 2nd 16B of each desc,
 * will cause performance drop to get into this context.
 */
-   if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH ||
-   rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
+   if (offloads & (RTE_ETH_RX_OFFLOAD_RSS_HASH |
+   RTE_ETH_RX_OFFLOAD_TIMESTAMP) ||
+   rxq->rx_flags & 
IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
/* load bottom half of every 32B desc */
const __m128i raw_desc_bh7 =
_mm_load_si128
@@ -1043,7 +1061,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct 
iavf_rx_queue *rxq,
mb4_5 = _mm256_or_si256(mb4_5, rss_hash4_5);
mb2_3 = _mm256_or_si256(mb2_3, rss_hash2_3);
mb0_1 = _mm256_or_si256(mb0_1, rss_hash0_1);
-   }
+   } /* if() on RSS hash parsing */
 
if (rxq->rx_flags & 
IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
/* merge the status/error-1 bits into one 
register */
@@ -1122,8 +1140,121 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct 
iavf_rx_queue *rxq,
mb4_5 = _mm256_or_si256(mb4_5, vlan_tci4_5);
mb2_3 = _mm256_or_si256(mb2_3, vlan_tci2_3);
mb0_1 = _mm256_or_si256(mb0_1, vlan_tci0_1);
-   }
-   } /* if() on RSS hash parsing */
+   } /* if() on Vlan parsing */
+
+   if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+   uint32_t mask = 0x;
+   __m256i ts;
+   __m256i ts_low = _mm256_setzero_si256();
+   __m256i ts_low1;
+   __m256i ts_low2;
+   __m256i max_ret;
+   __m256i cmp_ret;
+   uint8_t ret = 0;
+   uint8_t shift = 8;
+   __m256i ts_desp_mask = _mm256_set_epi32(mask, 
0, 0, 0, mask, 0, 0, 0);
+   __m256i cmp_mask = _mm256_set1_epi32(mask);
+   __m256i ts_permute_mask = _mm256_set_epi32(7, 
3, 6, 2, 5, 1, 4, 0);
+
+   ts = _mm256_and_si256(raw_desc_bh0_1, 
ts_desp_mask);
+   ts_low = _mm256_or_si256(ts_low, 
_mm256_srli_si256(ts, 3 * 4));
+   ts = _mm256_and_si256(raw_desc_bh2_3, 
ts_desp_mask);
+

[PATCH v2 3/3] net/iavf: support Rx timestamp offload on SSE

2023-04-12 Thread Zhichao Zeng

This patch enables Rx timestamp offload on SSE data path.

Enable timestamp offload with the command '--enable-rx-timestamp',
pay attention that getting Rx timestamp offload will drop the performance.

Signed-off-by: Zhichao Zeng 

---
v2: fix compile warning and timestamp error
---
 drivers/net/iavf/iavf_rxtx_vec_sse.c | 161 ++-
 1 file changed, 157 insertions(+), 4 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c 
b/drivers/net/iavf/iavf_rxtx_vec_sse.c
index 3f30be01aa..f01fda1ec8 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_sse.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c
@@ -392,6 +392,11 @@ flex_desc_to_olflags_v(struct iavf_rx_queue *rxq, __m128i 
descs[4],
_mm_extract_epi32(fdir_id0_3, 3);
} /* if() on fdir_enabled */
 
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
+   if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+   flags = _mm_or_si128(flags, 
_mm_set1_epi32(iavf_timestamp_dynflag));
+#endif
+
/**
 * At this point, we have the 4 sets of flags in the low 16-bits
 * of each 32-bit value in flags.
@@ -723,7 +728,9 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
int pos;
uint64_t var;
struct iavf_adapter *adapter = rxq->vsi->adapter;
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
uint64_t offloads = adapter->dev_data->dev_conf.rxmode.offloads;
+#endif
const uint32_t *ptype_tbl = adapter->ptype_tbl;
__m128i crc_adjust = _mm_set_epi16
(0, 0, 0,   /* ignore non-length fields */
@@ -793,6 +800,24 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
  rte_cpu_to_le_32(1 << IAVF_RX_FLEX_DESC_STATUS0_DD_S)))
return 0;
 
+#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
+   uint8_t inflection_point = 0;
+   bool is_tsinit = false;
+   __m128i hw_low_last = _mm_set_epi32(0, 0, 0, (uint32_t)rxq->phc_time);
+
+   if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+   uint64_t sw_cur_time = rte_get_timer_cycles() / 
(rte_get_timer_hz() / 1000);
+
+   if (unlikely(sw_cur_time - rxq->hw_time_update > 4)) {
+   hw_low_last = _mm_setzero_si128();
+   is_tsinit = 1;
+   } else {
+   hw_low_last = _mm_set_epi32(0, 0, 0, 
(uint32_t)rxq->phc_time);
+   }
+   }
+
+#endif
+
/**
 * Compile-time verify the shuffle mask
 * NOTE: some field positions already verified above, but duplicated
@@ -895,11 +920,12 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
 
 #ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
/**
-* needs to load 2nd 16B of each desc for RSS hash parsing,
+* needs to load 2nd 16B of each desc,
 * will cause performance drop to get into this context.
 */
-   if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH ||
-   rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
+   if (offloads & (RTE_ETH_RX_OFFLOAD_RSS_HASH |
+   RTE_ETH_RX_OFFLOAD_TIMESTAMP) ||
+   rxq->rx_flags & 
IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) {
/* load bottom half of every 32B desc */
descs_bh[3] = _mm_load_si128
((void *)(&rxdp[3].wb.status_error1));
@@ -964,7 +990,94 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq,
pkt_mb2 = _mm_or_si128(pkt_mb2, vlan_tci2);
pkt_mb1 = _mm_or_si128(pkt_mb1, vlan_tci1);
pkt_mb0 = _mm_or_si128(pkt_mb0, vlan_tci0);
-   }
+   } /* if() on Vlan parsing */
+
+   if (offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+   uint32_t mask = 0x;
+   __m128i ts;
+   __m128i ts_low = _mm_setzero_si128();
+   __m128i ts_low1;
+   __m128i max_ret;
+   __m128i cmp_ret;
+   uint8_t ret = 0;
+   uint8_t shift = 4;
+   __m128i ts_desp_mask = _mm_set_epi32(mask, 0, 0, 0);
+   __m128i cmp_mask = _mm_set1_epi32(mask);
+
+   ts = _mm_and_si128(descs_bh[0], ts_desp_mask);
+   ts_low = _mm_or_si128(ts_low, _mm_srli_si128(ts, 3 * 
4));
+   ts = _mm_and_si128(descs_bh[1], ts_desp_mask);
+   ts_low = _mm_or_si128(ts_low, _mm_srli_si128(ts, 2 * 
4));
+   ts = _mm_and_si128(descs_bh[2], ts_desp_mask);
+   ts_low = _mm_or_si128(ts_low, _mm_srli_si128(ts, 1 * 
4));
+   ts = _mm_and_si128(descs_bh[3], ts_desp_mask);
+   ts_low = _mm_or_si128(ts_low,

[DPDK] heap memory fragmentation issue

2023-04-12 Thread wuchangsheng (C)

Hello:

When using rte_malloc and rte_free to request and release memory 
repeatedly, the usage of large pages gradually increases.

Checking the relevant source code shows that memory requests and releases are 
started from the head of the freelist chain list of the heap. Memory 
fragmentation seems to result from this, which is considered because the memory 
recently released may be in the cache, and requesting this memory at the time 
of allocation may achieve higher performance?

How does the community consider the heap's memory fragmentation issue? Is there 
a future plan for memory fragmentation optimization?

[PATCH v2 0/3] Enable iavf Rx Timestamp offload on vector path

2023-04-12 Thread Zhichao Zeng

Enable timestamp offload with the command '--enable-rx-timestamp',
pay attention that getting Rx timestamp offload will drop the performance.

---
v2: fix compile warning and SSE path

Zhichao Zeng (3):
  net/iavf: support Rx timestamp offload on AVX512
  net/iavf: support Rx timestamp offload on AVX2
  net/iavf: support Rx timestamp offload on SSE

 drivers/net/iavf/iavf_rxtx.h|   3 +-
 drivers/net/iavf/iavf_rxtx_vec_avx2.c   | 196 ++-
 drivers/net/iavf/iavf_rxtx_vec_avx512.c | 203 +++-
 drivers/net/iavf/iavf_rxtx_vec_common.h |   3 -
 drivers/net/iavf/iavf_rxtx_vec_sse.c| 161 ++-
 5 files changed, 546 insertions(+), 20 deletions(-)

-- 
2.25.1

RE: [EXT] Re: [PATCH 1/1] app/mldev: add internal function for file read

2023-04-12 Thread Srikanth Yalavarthi

> -Original Message-
> From: Stephen Hemminger 
> Sent: 28 March 2023 21:22
> To: Srikanth Yalavarthi 
> Cc: Anup Prabhu ; dev@dpdk.org; Shivah Shankar
> Shankar Narayan Rao ; Prince Takkar
> ; Srikanth Yalavarthi 
> Subject: [EXT] Re: [PATCH 1/1] app/mldev: add internal function for file read
> 
> External Email
> 
> --
> On Thu, 23 Mar 2023 08:28:01 -0700
> Srikanth Yalavarthi  wrote:
> 
> > +   if (fseek(fp, 0, SEEK_END) == 0) {
> > +   file_size = ftell(fp);
> > +   if (file_size == -1) {
> > +   ret = -EIO;
> > +   goto error;
> > +   }
> > +
> > +   file_buffer = rte_malloc(NULL, file_size,
> RTE_CACHE_LINE_SIZE);
> > +   if (file_buffer == NULL) {
> > +   ml_err("Failed to allocate memory: %s\n", file);
> > +   ret = -ENOMEM;
> > +   goto error;
> > +   }
> > +
> > +   if (fseek(fp, 0, SEEK_SET) != 0) {
> > +   ret = -EIO;
> > +   goto error;
> > +   }
> > +
> > +   if (fread(file_buffer, sizeof(char), file_size, fp) != (unsigned
> long)file_size) {
> > +   ml_err("Failed to read file : %s\n", file);
> > +   ret = -EIO;
> > +   goto error;
> > +   }
> > +   fclose(fp);
> > +   } else {
> > +   ret = -EIO;
> > +   goto error;
> > +   }
> > +
> > +   *buffer = file_buffer;
> > +   *size = file_size;
> > +
> > +   return 0;
> 
> Granted this only test code, but is the slowest way to do this.
> Stdio is buffered (in 4K chunks). And using rte_malloc comes from
> hugepages.
> 
> Three levels of improvement are possible:
>   1. don't use rte_malloc() use malloc() instead.
Agree on this. Will update in next version.

>   2. use direct system call for I/O
>   3. use mmap() to directly map in the file instead read
Agree on the improvements.
But, considering that this is a test code and these operations are done in 
slow-path, I would prefer to have the implementation based on C library calls 
rather than using system calls.

Also, using system calls may not make this code portable? Though we are not 
supporting this app on platforms other than Linux, as of now.
Pls let me know what you think.

Re: [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc

2023-04-12 Thread Bruce Richardson

On Tue, Apr 11, 2023 at 01:34:14PM -0700, Tyler Retzlaff wrote:
> On Tue, Apr 11, 2023 at 11:24:07AM +0100, Bruce Richardson wrote:
> > On Wed, Apr 05, 2023 at 05:45:19PM -0700, Tyler Retzlaff wrote:
> > > Windows does not support versioned symbols. Fortunately Windows also
> > > doesn't have an exported stable ABI.
> > > 
> > > Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> > > and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> > > functions.
> > > 
> > > Windows does have a way to achieve similar versioning for symbols but it
> > > is not a simple #define so it will be done as a work package later.
> > > 
> > > Signed-off-by: Tyler Retzlaff 
> > > ---
> > >  lib/telemetry/telemetry_data.c | 16 
> > >  1 file changed, 16 insertions(+)
> > > 
> > > diff --git a/lib/telemetry/telemetry_data.c 
> > > b/lib/telemetry/telemetry_data.c
> > > index 2bac2de..284c16e 100644
> > > --- a/lib/telemetry/telemetry_data.c
> > > +++ b/lib/telemetry/telemetry_data.c
> > > @@ -82,8 +82,16 @@
> > >  /* mark the v23 function as the older version, and v24 as the default 
> > > version */
> > >  VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
> > >  BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
> > > +#ifndef RTE_TOOLCHAIN_MSVC
> > >  MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> > >   int64_t x), rte_tel_data_add_array_int_v24);
> > > +#else
> > > +int
> > > +rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
> > > +{
> > > + return rte_tel_data_add_array_int_v24(d, x);
> > > +}
> > > +#endif
> > >  
> > 
> > Can't see any general way to do this from the versioning header file, so
> > agree that we need some changes here. Rather than defining a public
> > funcion, we could keep the diff reduced by just using a macro alias here,
> > right? For example:
> > 
> > #ifdef RTE_TOOLCHAIN_MSVC
> > #define rte_tel_data_add_array_int rte_tel_data_add_array_int_v24
> > #else
> > MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> > int64_t x), rte_tel_data_add_array_int_v24);
> > #endif
> > 
> > If this is a temporary measure, I'd tend towards the shortest solution that
> > can work. However, no strong opinions, so, either using functions as you
> > have it, or macros:
> 
> so i have to leave it as it is the reason being the version.map ->
> exports.def generation does not handle this. the .def only contains the
> rte_tel_data_add_array_int symbol. if we expand it away to the _v24 name
> the link will fail.
> 

Ah, thanks for clarifying

> let's consume the change as-is for now and i will work on the
> generalized solution when changes are integrated that actually make the
> windows dso/dll functional.
> 

Sure, good for now. Keep my ack on any future versions.
> > 
> > Acked-by: Bruce Richardson

Re: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Ferruh Yigit

On 4/11/2023 7:51 AM, Guo, Junfeng wrote:

Hi Junfeng, message moved down.

> 
>> -Original Message-
>> From: Rushil Gupta 
>> Sent: Tuesday, April 11, 2023 12:59
>> To: Zhang, Qi Z ; ferruh.yi...@amd.com
>> Cc: Richardson, Bruce ; dev@dpdk.org;
>> Rushil Gupta ; Guo, Junfeng
>> 
>> Subject: [PATCH 1/1] net/gve: update base code for DQO
>>
>> Update gve base code to support DQO.
>>
>> This patch is based on this:
>> https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*
>>
>> Signed-off-by: Rushil Gupta 
>> Signed-off-by: Junfeng Guo 
> Hi Ferruh & Bruce,
> 
> This patch contains few lines change for the MIT licensed gve base code.
> Note that there is no new files added, just some minor code update.
> 
> Do we need to ask for special approval from the Tech Board for this?
> Please help give some advice and also help review this patch. Thanks!
> 

Once the MIT license exception is in place, as far as I know no more
approval is required per change.

> BTW, Google will also help replace all the base code under MIT license
> with the ones under BSD-3 license soon, which would make things more
> easier.
> 

Is this different from base code under DPDK is changing license [1] ?


[1]
https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&archive=both

Re: [PATCH v4 02/14] eal: use rtm and xtest intrinsics

2023-04-12 Thread Bruce Richardson

On Tue, Apr 11, 2023 at 02:12:16PM -0700, Tyler Retzlaff wrote:
> Inline assembly is not supported for MSVC x64. Convert code to use
> _xend, _xabort and _xtest intrinsics.
> 
> Signed-off-by: Tyler Retzlaff 
> ---

Subject to the CI not reporting any errors:

Acked-by: Bruce Richardson 

>  config/x86/meson.build|  6 ++
>  lib/eal/x86/include/rte_rtm.h | 18 +-
>  2 files changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/config/x86/meson.build b/config/x86/meson.build
> index 54345c4..4c0b06c 100644
> --- a/config/x86/meson.build
> +++ b/config/x86/meson.build
> @@ -30,6 +30,12 @@ if cc.get_define('__SSE4_2__', args: machine_args) == ''
>  machine_args += '-msse4'
>  endif
>  
> +# enable restricted transactional memory intrinsics
> +# https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
> +if cc.get_id() != 'msvc'
> +machine_args += '-mrtm'
> +endif
> +
>  base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
>  foreach f:base_flags
>  compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> diff --git a/lib/eal/x86/include/rte_rtm.h b/lib/eal/x86/include/rte_rtm.h
> index 36bf498..b84e58e 100644
> --- a/lib/eal/x86/include/rte_rtm.h
> +++ b/lib/eal/x86/include/rte_rtm.h
> @@ -5,6 +5,7 @@
>  #ifndef _RTE_RTM_H_
>  #define _RTE_RTM_H_ 1
>  
> +#include 
>  
>  /* Official RTM intrinsics interface matching gcc/icc, but works
> on older gcc compatible compilers and binutils. */
> @@ -28,31 +29,22 @@
>  static __rte_always_inline
>  unsigned int rte_xbegin(void)
>  {
> - unsigned int ret = RTE_XBEGIN_STARTED;
> -
> - asm volatile(".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory");
> - return ret;
> + return _xbegin();
>  }
>  
>  static __rte_always_inline
>  void rte_xend(void)
>  {
> -  asm volatile(".byte 0x0f,0x01,0xd5" ::: "memory");
> + _xend();
>  }
>  
>  /* not an inline function to workaround a clang bug with -O0 */
> -#define rte_xabort(status) do { \
> - asm volatile(".byte 0xc6,0xf8,%P0" :: "i" (status) : "memory"); \
> -} while (0)
> +#define rte_xabort(status) _xabort(status)
>  
>  static __rte_always_inline
>  int rte_xtest(void)
>  {
> - unsigned char out;
> -
> - asm volatile(".byte 0x0f,0x01,0xd6 ; setnz %0" :
> - "=r" (out) :: "memory");
> - return out;
> + return _xtest();
>  }
>  
>  #ifdef __cplusplus
> -- 
> 1.8.3.1
>

Re: [PATCH v4 03/14] eal: use barrier intrinsics

2023-04-12 Thread Bruce Richardson

On Tue, Apr 11, 2023 at 02:12:17PM -0700, Tyler Retzlaff wrote:
> Inline assembly is not supported for MSVC x64 instead expand
> rte_compiler_barrier as _ReadWriteBarrier and for rte_smp_mb
> _m_mfence intrinsics.
> 
> Signed-off-by: Tyler Retzlaff 

Acked-by: Bruce Richardson 

One whitespace line deletion below which can be dropped from diff if doing
a new revision.
> ---
>  lib/eal/include/generic/rte_atomic.h | 4 
>  lib/eal/x86/include/rte_atomic.h | 5 -
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eal/include/generic/rte_atomic.h 
> b/lib/eal/include/generic/rte_atomic.h
> index 234b268..e973184 100644
> --- a/lib/eal/include/generic/rte_atomic.h
> +++ b/lib/eal/include/generic/rte_atomic.h
> @@ -116,9 +116,13 @@
>   * Guarantees that operation reordering does not occur at compile time
>   * for operations directly before and after the barrier.
>   */
> +#ifndef RTE_TOOLCHAIN_MSVC
>  #define  rte_compiler_barrier() do { \
>   asm volatile ("" : : : "memory");   \
>  } while(0)
> +#else
> +#define rte_compiler_barrier() _ReadWriteBarrier()
> +#endif
>  
>  /**
>   * Synchronization fence between threads based on the specified memory order.
> diff --git a/lib/eal/x86/include/rte_atomic.h 
> b/lib/eal/x86/include/rte_atomic.h
> index f2ee1a9..ca733c5 100644
> --- a/lib/eal/x86/include/rte_atomic.h
> +++ b/lib/eal/x86/include/rte_atomic.h
> @@ -28,7 +28,6 @@
>  #define  rte_rmb() _mm_lfence()
>  
>  #define rte_smp_wmb() rte_compiler_barrier()
> -

  ^^ unnecessary drop

>  #define rte_smp_rmb() rte_compiler_barrier()
>  
>  /*
> @@ -66,11 +65,15 @@
>  static __rte_always_inline void
>  rte_smp_mb(void)
>  {
> +#ifndef RTE_TOOLCHAIN_MSVC
>  #ifdef RTE_ARCH_I686
>   asm volatile("lock addl $0, -128(%%esp); " ::: "memory");
>  #else
>   asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
>  #endif
> +#else
> + _mm_mfence();
> +#endif
>  }
>  
>  #define rte_io_mb() rte_mb()
> -- 
> 1.8.3.1
>

Re: [PATCH v4 06/14] eal: use prefetch intrinsics

2023-04-12 Thread Bruce Richardson

On Tue, Apr 11, 2023 at 02:12:20PM -0700, Tyler Retzlaff wrote:
> Inline assembly is not supported for MSVC x64 instead use _mm_prefetch
> and _mm_cldemote intrinsics.
> 
> Signed-off-by: Tyler Retzlaff 
> ---

Acked-by: Bruce Richardson 

One comment inline below for future consideration.

>  lib/eal/x86/include/rte_prefetch.h | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/lib/eal/x86/include/rte_prefetch.h 
> b/lib/eal/x86/include/rte_prefetch.h
> index 7fd01c4..1391af0 100644
> --- a/lib/eal/x86/include/rte_prefetch.h
> +++ b/lib/eal/x86/include/rte_prefetch.h
> @@ -13,6 +13,7 @@
>  #include 
>  #include "generic/rte_prefetch.h"
>  
> +#ifndef RTE_TOOLCHAIN_MSVC
>  static inline void rte_prefetch0(const volatile void *p)
>  {
>   asm volatile ("prefetcht0 %[p]" : : [p] "m" (*(const volatile char 
> *)p));
> @@ -43,6 +44,34 @@ static inline void rte_prefetch_non_temporal(const 
> volatile void *p)
>  {
>   asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
>  }
> +#else
> +static inline void rte_prefetch0(const volatile void *p)
> +{
> + _mm_prefetch(p, 1);
> +}
> +
> +static inline void rte_prefetch1(const volatile void *p)
> +{
> + _mm_prefetch(p, 2);
> +}
> +
> +static inline void rte_prefetch2(const volatile void *p)
> +{
> + _mm_prefetch(p, 3);
> +}
> +
> +static inline void rte_prefetch_non_temporal(const volatile void *p)
> +{
> + _mm_prefetch(p, 0);
> +}

For these prefetch instructions, I'm not sure there is any reason why we
can't drop the inline assembly versions. The instructions are very old at
this point and should be widely supported by all compilers we use.

Rather than using hard-coded 1, 2, 3 values in the prefetch calls, I
believe there should be defines for the levels: "_MM_HINT_T0",
"_MM_HINT_T1" etc.

> +__rte_experimental
> +static inline void
> +rte_cldemote(const volatile void *p)
> +{
> + _mm_cldemote(p);
> +}
> +#endif
> +
>  
>  #ifdef __cplusplus
>  }
> -- 
> 1.8.3.1
>

RE: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Guo, Junfeng

> -Original Message-
> From: Ferruh Yigit 
> Sent: Wednesday, April 12, 2023 16:50
> To: Guo, Junfeng ; Richardson, Bruce
> 
> Cc: dev@dpdk.org; Zhang, Qi Z ; Rushil Gupta
> 
> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> 
> On 4/11/2023 7:51 AM, Guo, Junfeng wrote:
> 
> Hi Junfeng, message moved down.
> 
> >
> >> -Original Message-
> >> From: Rushil Gupta 
> >> Sent: Tuesday, April 11, 2023 12:59
> >> To: Zhang, Qi Z ; ferruh.yi...@amd.com
> >> Cc: Richardson, Bruce ; dev@dpdk.org;
> >> Rushil Gupta ; Guo, Junfeng
> >> 
> >> Subject: [PATCH 1/1] net/gve: update base code for DQO
> >>
> >> Update gve base code to support DQO.
> >>
> >> This patch is based on this:
> >> https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*
> >>
> >> Signed-off-by: Rushil Gupta 
> >> Signed-off-by: Junfeng Guo 
> > Hi Ferruh & Bruce,
> >
> > This patch contains few lines change for the MIT licensed gve base code.
> > Note that there is no new files added, just some minor code update.
> >
> > Do we need to ask for special approval from the Tech Board for this?
> > Please help give some advice and also help review this patch. Thanks!
> >
> 
> Once the MIT license exception is in place, as far as I know no more
> approval is required per change.

Got it, thanks the comment!

Then we may also need your help to review, as well as the coming patch
set for GVE PMD enhancement for DPDK 23.07. Thanks in advance!

> 
> > BTW, Google will also help replace all the base code under MIT license
> > with the ones under BSD-3 license soon, which would make things more
> > easier.
> >
> 
> Is this different from base code under DPDK is changing license [1] ?
> 
> 
> [1]
> https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar
> chive=both
> 

The patch set of the above link only contains the processing of replace the
MIT licensed base code with the BSD-3 licensed base code. After some
discussion, we think Google is in the right place to do that work. And they
are working on that now.

This patch is mainly for the feature upstreaming of DPDK 23.07. It contains
only the code part, following previous license statements, without any
license change.

This patch is separated and sent by Google, to ensure there is no license
violation.

BTW, about the feature of GVE PMD enhancement, the rest code are all
about BSD-3 licensed files, and that patch set will be sent out soon.

Thanks!

Re: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Ferruh Yigit

On 4/12/2023 10:09 AM, Guo, Junfeng wrote:
> 
> 
>> -Original Message-
>> From: Ferruh Yigit 
>> Sent: Wednesday, April 12, 2023 16:50
>> To: Guo, Junfeng ; Richardson, Bruce
>> 
>> Cc: dev@dpdk.org; Zhang, Qi Z ; Rushil Gupta
>> 
>> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
>>
>> On 4/11/2023 7:51 AM, Guo, Junfeng wrote:
>>
>> Hi Junfeng, message moved down.
>>
>>>
 -Original Message-
 From: Rushil Gupta 
 Sent: Tuesday, April 11, 2023 12:59
 To: Zhang, Qi Z ; ferruh.yi...@amd.com
 Cc: Richardson, Bruce ; dev@dpdk.org;
 Rushil Gupta ; Guo, Junfeng
 
 Subject: [PATCH 1/1] net/gve: update base code for DQO

 Update gve base code to support DQO.

 This patch is based on this:
 https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*

 Signed-off-by: Rushil Gupta 
 Signed-off-by: Junfeng Guo 
>>> Hi Ferruh & Bruce,
>>>
>>> This patch contains few lines change for the MIT licensed gve base code.
>>> Note that there is no new files added, just some minor code update.
>>>
>>> Do we need to ask for special approval from the Tech Board for this?
>>> Please help give some advice and also help review this patch. Thanks!
>>>
>>
>> Once the MIT license exception is in place, as far as I know no more
>> approval is required per change.
> 
> Got it, thanks the comment!
> 
> Then we may also need your help to review, as well as the coming patch
> set for GVE PMD enhancement for DPDK 23.07. Thanks in advance!
> 
>>
>>> BTW, Google will also help replace all the base code under MIT license
>>> with the ones under BSD-3 license soon, which would make things more
>>> easier.
>>>
>>
>> Is this different from base code under DPDK is changing license [1] ?
>>
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar
>> chive=both
>>
> 
> The patch set of the above link only contains the processing of replace the
> MIT licensed base code with the BSD-3 licensed base code. After some
> discussion, we think Google is in the right place to do that work. And they
> are working on that now.
> 

Is the Google GVE driver [2] in the process of changing license from MIT
to BSD-3?


[2]
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve



> This patch is mainly for the feature upstreaming of DPDK 23.07. It contains
> only the code part, following previous license statements, without any
> license change.
> 
> This patch is separated and sent by Google, to ensure there is no license
> violation.
> 
> BTW, about the feature of GVE PMD enhancement, the rest code are all
> about BSD-3 licensed files, and that patch set will be sent out soon.
> 
> Thanks!

RE: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Guo, Junfeng



> -Original Message-
> From: Ferruh Yigit 
> Sent: Wednesday, April 12, 2023 17:35
> To: Guo, Junfeng ; Richardson, Bruce
> 
> Cc: dev@dpdk.org; Zhang, Qi Z ; Rushil Gupta
> 
> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> 
> On 4/12/2023 10:09 AM, Guo, Junfeng wrote:
> >
> >
> >> -Original Message-
> >> From: Ferruh Yigit 
> >> Sent: Wednesday, April 12, 2023 16:50
> >> To: Guo, Junfeng ; Richardson, Bruce
> >> 
> >> Cc: dev@dpdk.org; Zhang, Qi Z ; Rushil Gupta
> >> 
> >> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> >>
> >> On 4/11/2023 7:51 AM, Guo, Junfeng wrote:
> >>
> >> Hi Junfeng, message moved down.
> >>
> >>>
>  -Original Message-
>  From: Rushil Gupta 
>  Sent: Tuesday, April 11, 2023 12:59
>  To: Zhang, Qi Z ; ferruh.yi...@amd.com
>  Cc: Richardson, Bruce ;
> dev@dpdk.org;
>  Rushil Gupta ; Guo, Junfeng
>  
>  Subject: [PATCH 1/1] net/gve: update base code for DQO
> 
>  Update gve base code to support DQO.
> 
>  This patch is based on this:
> 
> https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*
> 
>  Signed-off-by: Rushil Gupta 
>  Signed-off-by: Junfeng Guo 
> >>> Hi Ferruh & Bruce,
> >>>
> >>> This patch contains few lines change for the MIT licensed gve base
> code.
> >>> Note that there is no new files added, just some minor code update.
> >>>
> >>> Do we need to ask for special approval from the Tech Board for this?
> >>> Please help give some advice and also help review this patch. Thanks!
> >>>
> >>
> >> Once the MIT license exception is in place, as far as I know no more
> >> approval is required per change.
> >
> > Got it, thanks the comment!
> >
> > Then we may also need your help to review, as well as the coming patch
> > set for GVE PMD enhancement for DPDK 23.07. Thanks in advance!
> >
> >>
> >>> BTW, Google will also help replace all the base code under MIT
> license
> >>> with the ones under BSD-3 license soon, which would make things
> more
> >>> easier.
> >>>
> >>
> >> Is this different from base code under DPDK is changing license [1] ?
> >>
> >>
> >> [1]
> >>
> https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar
> >> chive=both
> >>
> >
> > The patch set of the above link only contains the processing of replace
> the
> > MIT licensed base code with the BSD-3 licensed base code. After some
> > discussion, we think Google is in the right place to do that work. And
> they
> > are working on that now.
> >
> 
> Is the Google GVE driver [2] in the process of changing license from MIT
> to BSD-3?
> 
> 
> [2]
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/tree/v1.3.0/google/gve
> 

I'm not sure, I don't know much about Google's plans. 
Maybe they could provide some info here. Thanks!

@Rushil Gupta 

> 
> 
> > This patch is mainly for the feature upstreaming of DPDK 23.07. It
> contains
> > only the code part, following previous license statements, without any
> > license change.
> >
> > This patch is separated and sent by Google, to ensure there is no license
> > violation.
> >
> > BTW, about the feature of GVE PMD enhancement, the rest code are all
> > about BSD-3 licensed files, and that patch set will be sent out soon.
> >
> > Thanks!

[RFC] app/testpmd: use RSS conf from software when configuring DCB

2023-04-12 Thread Min Zhou

In the testpmd command, we have to stop the port firstly before configuring
the DCB. However, some PMDs may execute a hardware reset during the port
stop, such as ixgbe. Some kind of reset operations of PMD could clear the
configurations of RSS in the hardware register. This would cause the loss
of RSS configurations that were set during the testpmd initialization. As
a result, I find that I cannot enable RSS and DCB at the same time in the
testpmd command when using Intel 82599 NIC.

Although this patch can solve the problem I encountered, is there any risk
of using rss conf from software instead of reading from the hardware
register when configuring DCB?

Signed-off-by: Min Zhou 
---
 app/test-pmd/testpmd.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5cb6f92523..3c382267b8 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -4247,14 +4247,12 @@ const uint16_t vlan_tags[] = {
 };
 
 static  int
-get_eth_dcb_conf(portid_t pid, struct rte_eth_conf *eth_conf,
+get_eth_dcb_conf(portid_t pid __rte_unused, struct rte_eth_conf *eth_conf,
 enum dcb_mode_enable dcb_mode,
 enum rte_eth_nb_tcs num_tcs,
 uint8_t pfc_en)
 {
uint8_t i;
-   int32_t rc;
-   struct rte_eth_rss_conf rss_conf;
 
/*
 * Builds up the correct configuration for dcb+vt based on the vlan 
tags array
@@ -4296,12 +4294,6 @@ get_eth_dcb_conf(portid_t pid, struct rte_eth_conf 
*eth_conf,
struct rte_eth_dcb_tx_conf *tx_conf =
ð_conf->tx_adv_conf.dcb_tx_conf;
 
-   memset(&rss_conf, 0, sizeof(struct rte_eth_rss_conf));
-
-   rc = rte_eth_dev_rss_hash_conf_get(pid, &rss_conf);
-   if (rc != 0)
-   return rc;
-
rx_conf->nb_tcs = num_tcs;
tx_conf->nb_tcs = num_tcs;
 
@@ -4313,7 +4305,6 @@ get_eth_dcb_conf(portid_t pid, struct rte_eth_conf 
*eth_conf,
eth_conf->rxmode.mq_mode =
(enum rte_eth_rx_mq_mode)
(rx_mq_mode & RTE_ETH_MQ_RX_DCB_RSS);
-   eth_conf->rx_adv_conf.rss_conf = rss_conf;
eth_conf->txmode.mq_mode = RTE_ETH_MQ_TX_DCB;
}
 
-- 
2.31.1

Re: [PATCH] dmadev: add tracepoints

2023-04-12 Thread Bruce Richardson

On Wed, Apr 12, 2023 at 02:48:08AM +, Chengwen Feng wrote:
> Add tracepoints at important APIs for tracing support.
> 
> Signed-off-by: Chengwen Feng 
> ---
>  lib/dmadev/meson.build   |   2 +-
>  lib/dmadev/rte_dmadev.c  |  39 ++--
>  lib/dmadev/rte_dmadev.h  |  56 ---
>  lib/dmadev/rte_dmadev_trace.h| 133 +++
>  lib/dmadev/rte_dmadev_trace_fp.h | 113 +++
>  lib/dmadev/rte_dmadev_trace_points.c |  59 
>  lib/dmadev/version.map   |  10 ++
>  7 files changed, 391 insertions(+), 21 deletions(-)
>  create mode 100644 lib/dmadev/rte_dmadev_trace.h
>  create mode 100644 lib/dmadev/rte_dmadev_trace_fp.h
>  create mode 100644 lib/dmadev/rte_dmadev_trace_points.c
> 
For completeness, do you have any numbers for the performance impact (if
any) to the DMA dataplane APIs with this tracing added?

/Bruce

[PATCH v1] power: support amd-pstate cpufreq driver

2023-04-12 Thread Sivaprasad Tummala

amd-pstate introduces a new CPU frequency control mechanism for AMD
processors using the ACPI Collaborative Performance Power Control
feature for a finer grained frequency management.

Patch to add support for amd-pstate driver.

Signed-off-by: Sivaprasad Tummala 
---
 app/test/test_power.c  |   1 +
 app/test/test_power_cpufreq.c  |   5 +-
 doc/guides/rel_notes/release_23_07.rst |   3 +
 examples/l3fwd-power/main.c|   1 +
 lib/power/meson.build  |   1 +
 lib/power/power_amd_pstate_cpufreq.c   | 698 +
 lib/power/power_amd_pstate_cpufreq.h   | 219 
 lib/power/rte_power.c  |  26 +
 lib/power/rte_power.h  |   3 +-
 lib/power/rte_power_pmd_mgmt.c |   6 +-
 10 files changed, 958 insertions(+), 5 deletions(-)
 create mode 100644 lib/power/power_amd_pstate_cpufreq.c
 create mode 100644 lib/power/power_amd_pstate_cpufreq.h

diff --git a/app/test/test_power.c b/app/test/test_power.c
index b7b5561348..11781a5866 100644
--- a/app/test/test_power.c
+++ b/app/test/test_power.c
@@ -134,6 +134,7 @@ test_power(void)
const enum power_management_env envs[] = {PM_ENV_ACPI_CPUFREQ,
PM_ENV_KVM_VM,
PM_ENV_PSTATE_CPUFREQ,
+   PM_ENV_AMD_PSTATE_CPUFREQ,
PM_ENV_CPPC_CPUFREQ};
 
unsigned int i;
diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c
index 4d013cd7bb..9a14e6ad6a 100644
--- a/app/test/test_power_cpufreq.c
+++ b/app/test/test_power_cpufreq.c
@@ -85,7 +85,8 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool 
turbo)
freq_conv = cur_freq;
 
env = rte_power_get_env();
-   if (env == PM_ENV_CPPC_CPUFREQ || env == PM_ENV_PSTATE_CPUFREQ) 
{
+   if (env == PM_ENV_CPPC_CPUFREQ || env == PM_ENV_PSTATE_CPUFREQ 
||
+   env == PM_ENV_AMD_PSTATE_CPUFREQ) {
/* convert the frequency to nearest 10 value
 * Ex: if cur_freq=1396789 then freq_conv=140
 * Ex: if cur_freq=800030 then freq_conv=80
@@ -502,7 +503,7 @@ test_power_cpufreq(void)
/* Test environment configuration */
env = rte_power_get_env();
if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ) &&
-   (env != PM_ENV_CPPC_CPUFREQ)) {
+   (env != PM_ENV_CPPC_CPUFREQ) && (env != 
PM_ENV_AMD_PSTATE_CPUFREQ)) {
printf("Unexpectedly got an environment other than 
ACPI/PSTATE\n");
goto fail_all;
}
diff --git a/doc/guides/rel_notes/release_23_07.rst 
b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..9e714cacae 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -55,6 +55,9 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+   * **Added amd-pstate driver support to power management library.**
+
+Added support for amd-pstate driver which works on AMD Zen processors.
 
 Removed Items
 -
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 3f01cbd9e2..16495824e3 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -2245,6 +2245,7 @@ init_power_library(void)
env = rte_power_get_env();
if (env != PM_ENV_ACPI_CPUFREQ &&
env != PM_ENV_PSTATE_CPUFREQ &&
+   env != PM_ENV_AMD_PSTATE_CPUFREQ &&
env != PM_ENV_CPPC_CPUFREQ) {
RTE_LOG(ERR, POWER,
"Only ACPI, PSTATE and CPPC mode are 
supported\n");
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 1ce8b7c07d..532aa4fbd6 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -18,6 +18,7 @@ sources = files(
 'power_cppc_cpufreq.c',
 'power_kvm_vm.c',
 'power_pstate_cpufreq.c',
+'power_amd_pstate_cpufreq.c',
 'rte_power.c',
 'rte_power_intel_uncore.c',
 'rte_power_pmd_mgmt.c',
diff --git a/lib/power/power_amd_pstate_cpufreq.c 
b/lib/power/power_amd_pstate_cpufreq.c
new file mode 100644
index 00..aa6868aaa2
--- /dev/null
+++ b/lib/power/power_amd_pstate_cpufreq.c
@@ -0,0 +1,698 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2021 Intel Corporation
+ * Copyright(c) 2021 Arm Limited
+ * Copyright(c) 2023 Amd Limited
+ */
+
+#include 
+
+#include 
+
+#include "power_amd_pstate_cpufreq.h"
+#include "power_common.h"
+
+/* macros used for rounding frequency to nearest 1000 */
+#define FREQ_ROUNDING_DELTA 500
+#define ROUND_FREQ_TO_N_1000 1000
+
+#define POWER_CONVERT_TO_DECIMAL 10
+
+#define POWER_GOVERNOR_USERSPACE "userspace"
+#define POWER_S

[PATCH] net/ixgbe: consider DCB/VMDq conf when getting RSS conf

2023-04-12 Thread Min Zhou

The mrqe field of MRQC register is an enum. From the Intel 82599 datasheet,
we know that these values below for the mrqe field are all related to RSS
configuration:
b = RSS disabled.
0001b = RSS only -- Single set of RSS 16 queues.
0010b = DCB enabled and RSS disabled -- 8 TCs, each allocated 1 queue.
0011b = DCB enabled and RSS disabled -- 4 TCs, each allocated 1 queue.
0100b = DCB and RSS -- 8 TCs, each allocated 16 RSS queues.
0101b = DCB and RSS -- 4 TCs, each allocated 16 RSS queues.
1000b = Virtualization only -- 64 pools, no RSS, each pool allocated
2 queues.
1010b = Virtualization and RSS -- 32 pools, each allocated 4 RSS queues.
1011b = Virtualization and RSS -- 64 pools, each allocated 2 RSS queues.

The ixgbe pmd will check whether the rss is enabled or not when getting
rss conf. So, beside comparing the value of mrqe field with xxx0b and
xxx1b, we also needto consider the other configurations, such as
DCB + RSS or VMDQ + RSS. Otherwise, we may not get the correct rss conf
in some cases, such as when we use DCB and RSS with 8 TCs which corresponds
to 0100b for the mrqe field.

Signed-off-by: Min Zhou 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 91 ++
 1 file changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index c9d6ca9efe..1eff0053ed 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -3461,18 +3461,89 @@ static uint8_t rss_intel_key[40] = {
0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA,
 };
 
+/*
+ * This function removes the rss configuration in the mrqe field of MRQC
+ * register and tries to maintain other configurations in the field, such
+ * DCB and Virtualization.
+ *
+ * The MRQC register supplied in section 7.1.2.8.3 of the Intel 82599 
datasheet.
+ * From the datasheet, we know that the mrqe field is an enum. So, masking the
+ * mrqe field with '~IXGBE_MRQC_RSSEN' may not completely disable rss
+ * configuration. For example, the value of mrqe is equal to 0101b when DCB and
+ * RSS with 4 TCs configured, however 'mrqe &= ~0x01' is equal to 0100b which
+ * corresponds to DCB and RSS with 8 TCs.
+ */
+static void
+ixgbe_mrqc_rss_remove(struct ixgbe_hw *hw)
+{
+   uint32_t mrqc;
+   uint32_t mrqc_reg;
+   uint32_t mrqe_val;
+
+   mrqc_reg = ixgbe_mrqc_reg_get(hw->mac.type);
+   mrqc = IXGBE_READ_REG(hw, mrqc_reg);
+   mrqe_val = mrqc & IXGBE_MRQC_MRQE_MASK;
+
+   switch (mrqe_val) {
+   case IXGBE_MRQC_RSSEN:
+   /* Completely disable rss */
+   mrqe_val = 0;
+   break;
+   case IXGBE_MRQC_RTRSS8TCEN:
+   mrqe_val = IXGBE_MRQC_RT8TCEN;
+   break;
+   case IXGBE_MRQC_RTRSS4TCEN:
+   mrqe_val = IXGBE_MRQC_RT4TCEN;
+   break;
+   case IXGBE_MRQC_VMDQRSS64EN:
+   /* FIXME. Can 32 pools with rss convert to 64 pools without rss? */
+   case IXGBE_MRQC_VMDQRSS32EN:
+   mrqe_val = IXGBE_MRQC_VMDQEN;
+   break;
+   default:
+   /* No rss configured, leave it as it is */
+   break;
+   }
+   mrqc = (mrqc & ~IXGBE_MRQC_MRQE_MASK) | mrqe_val;
+   IXGBE_WRITE_REG(hw, mrqc_reg, mrqc);
+}
+
 static void
 ixgbe_rss_disable(struct rte_eth_dev *dev)
 {
struct ixgbe_hw *hw;
+
+   hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   /* Remove the rss configuration and maintain the other configurations */
+   ixgbe_mrqc_rss_remove(hw);
+}
+
+/*
+ * This function checks whether the rss is enabled or not by comparing the mrqe
+ * field with some RSS related enums and also considers the configurations for
+ * DCB + RSS and Virtualization + RSS. It is necessary for getting the correct
+ * rss hash configurations from the RSS Field Enable field of MRQC register
+ * when both RSS and DCB/VMDQ are used.
+ */
+static bool
+ixgbe_rss_enabled(struct ixgbe_hw *hw)
+{
uint32_t mrqc;
uint32_t mrqc_reg;
+   uint32_t mrqe_val;
 
-   hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
mrqc_reg = ixgbe_mrqc_reg_get(hw->mac.type);
mrqc = IXGBE_READ_REG(hw, mrqc_reg);
-   mrqc &= ~IXGBE_MRQC_RSSEN;
-   IXGBE_WRITE_REG(hw, mrqc_reg, mrqc);
+   mrqe_val = mrqc & IXGBE_MRQC_MRQE_MASK;
+
+   if (mrqe_val == IXGBE_MRQC_RSSEN ||
+   mrqe_val == IXGBE_MRQC_RTRSS8TCEN ||
+   mrqe_val == IXGBE_MRQC_RTRSS4TCEN ||
+   mrqe_val == IXGBE_MRQC_VMDQRSS64EN ||
+   mrqe_val == IXGBE_MRQC_VMDQRSS32EN)
+   return true;
+
+   return false;
 }
 
 static void
@@ -3530,9 +3601,7 @@ ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf)
 {
struct ixgbe_hw *hw;
-   uint32_t mrqc;
uint64_t rss_hf;
-   uint

RE: [PATCH v4 02/14] eal: use rtm and xtest intrinsics

2023-04-12 Thread Konstantin Ananyev




> Inline assembly is not supported for MSVC x64. Convert code to use
> _xend, _xabort and _xtest intrinsics.
> 
> Signed-off-by: Tyler Retzlaff 
> ---
>  config/x86/meson.build|  6 ++
>  lib/eal/x86/include/rte_rtm.h | 18 +-
>  2 files changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/config/x86/meson.build b/config/x86/meson.build
> index 54345c4..4c0b06c 100644
> --- a/config/x86/meson.build
> +++ b/config/x86/meson.build
> @@ -30,6 +30,12 @@ if cc.get_define('__SSE4_2__', args: machine_args) == ''
>  machine_args += '-msse4'
>  endif
> 
> +# enable restricted transactional memory intrinsics
> +# https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
> +if cc.get_id() != 'msvc'
> +machine_args += '-mrtm'
> +endif
> +
>  base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
>  foreach f:base_flags
>  compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> diff --git a/lib/eal/x86/include/rte_rtm.h b/lib/eal/x86/include/rte_rtm.h
> index 36bf498..b84e58e 100644
> --- a/lib/eal/x86/include/rte_rtm.h
> +++ b/lib/eal/x86/include/rte_rtm.h
> @@ -5,6 +5,7 @@
>  #ifndef _RTE_RTM_H_
>  #define _RTE_RTM_H_ 1
> 
> +#include 
> 
>  /* Official RTM intrinsics interface matching gcc/icc, but works
> on older gcc compatible compilers and binutils. */
> @@ -28,31 +29,22 @@
>  static __rte_always_inline
>  unsigned int rte_xbegin(void)
>  {
> - unsigned int ret = RTE_XBEGIN_STARTED;
> -
> - asm volatile(".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory");
> - return ret;
> + return _xbegin();
>  }
> 
>  static __rte_always_inline
>  void rte_xend(void)
>  {
> -  asm volatile(".byte 0x0f,0x01,0xd5" ::: "memory");
> + _xend();
>  }
> 
>  /* not an inline function to workaround a clang bug with -O0 */
> -#define rte_xabort(status) do { \
> - asm volatile(".byte 0xc6,0xf8,%P0" :: "i" (status) : "memory"); \
> -} while (0)
> +#define rte_xabort(status) _xabort(status)
> 
>  static __rte_always_inline
>  int rte_xtest(void)
>  {
> - unsigned char out;
> -
> - asm volatile(".byte 0x0f,0x01,0xd6 ; setnz %0" :
> - "=r" (out) :: "memory");
> - return out;
> + return _xtest();
>  }
> 
>  #ifdef __cplusplus
> --

Acked-by: Konstantin Ananyev 
 

> 1.8.3.1

RE: [PATCH v4 01/14] eal: use rdtsc intrinsic

2023-04-12 Thread Konstantin Ananyev




> Inline assembly is not supported for MSVC x64. Convert code to use
> __rdtsc intrinsic.
> 
> Signed-off-by: Tyler Retzlaff 
> ---
>  lib/eal/x86/include/rte_cycles.h | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/eal/x86/include/rte_cycles.h 
> b/lib/eal/x86/include/rte_cycles.h
> index a461a4d..cca5122 100644
> --- a/lib/eal/x86/include/rte_cycles.h
> +++ b/lib/eal/x86/include/rte_cycles.h
> @@ -6,6 +6,12 @@
>  #ifndef _RTE_CYCLES_X86_64_H_
>  #define _RTE_CYCLES_X86_64_H_
> 
> +#ifndef RTE_TOOLCHAIN_MSVC
> +#include 
> +#else
> +#include 
> +#endif
> +
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
> @@ -23,6 +29,7 @@
>  static inline uint64_t
>  rte_rdtsc(void)
>  {
> +#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
>   union {
>   uint64_t tsc_64;
>   RTE_STD_C11
> @@ -32,7 +39,6 @@
>   };
>   } tsc;
> 
> -#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
>   if (unlikely(rte_cycles_vmware_tsc_map)) {
>   /* ecx = 0x1 corresponds to the physical TSC for VMware */
>   asm volatile("rdpmc" :
> @@ -42,11 +48,7 @@
>   return tsc.tsc_64;
>   }
>  #endif
> -
> - asm volatile("rdtsc" :
> -  "=a" (tsc.lo_32),
> -  "=d" (tsc.hi_32));
> - return tsc.tsc_64;
> + return __rdtsc();
>  }
> 
>  static inline uint64_t
> --

Acked-by: Konstantin Ananyev 
 

> 1.8.3.1

RE: [PATCH] dmadev: add tracepoints

2023-04-12 Thread Morten Brørup

> From: Chengwen Feng [mailto:fengcheng...@huawei.com]
> Sent: Wednesday, 12 April 2023 04.48
> 
> Add tracepoints at important APIs for tracing support.
> 
> Signed-off-by: Chengwen Feng 
> ---

[...]

> diff --git a/lib/dmadev/rte_dmadev_trace.h b/lib/dmadev/rte_dmadev_trace.h
> new file mode 100644
> index 00..0dae78ca15
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_trace.h
> @@ -0,0 +1,133 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 HiSilicon Limited
> + */
> +
> +#ifndef RTE_DMADEV_TRACE_H
> +#define RTE_DMADEV_TRACE_H
> +
> +/**
> + * @file
> + *
> + * API for dmadev trace support.
> + */
> +
> +#include 
> +
> +#include "rte_dmadev.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_info_get,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, struct rte_dma_info *dev_info),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_string(dev_info->dev_name);
> + rte_trace_point_emit_u64(dev_info->dev_capa);
> + rte_trace_point_emit_u16(dev_info->max_vchans);
> + rte_trace_point_emit_u16(dev_info->max_desc);
> + rte_trace_point_emit_u16(dev_info->min_desc);
> + rte_trace_point_emit_u16(dev_info->max_sges);
> + rte_trace_point_emit_i16(dev_info->numa_node);
> + rte_trace_point_emit_u16(dev_info->nb_vchans);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_configure,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, const struct rte_dma_conf
> *dev_conf,
> +  int ret),
> + int enable_silent = (int)dev_conf->enable_silent;
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_u16(dev_conf->nb_vchans);
> + rte_trace_point_emit_int(enable_silent);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_start,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, int ret),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_stop,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, int ret),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_close,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, int ret),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_vchan_setup,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, uint16_t vchan,
> +  const struct rte_dma_vchan_conf *conf, int ret),
> + int src_port_type = conf->src_port.port_type;
> + int dst_port_type = conf->dst_port.port_type;
> + int direction = conf->direction;
> + uint64_t src_pcie_cfg;
> + uint64_t dst_pcie_cfg;
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_u16(vchan);
> + rte_trace_point_emit_int(direction);
> + rte_trace_point_emit_u16(conf->nb_desc);
> + rte_trace_point_emit_int(src_port_type);
> + memcpy(&src_pcie_cfg, &conf->src_port.pcie, sizeof(uint64_t));
> + rte_trace_point_emit_u64(src_pcie_cfg);
> + memcpy(&dst_pcie_cfg, &conf->dst_port.pcie, sizeof(uint64_t));
> + rte_trace_point_emit_int(dst_port_type);
> + rte_trace_point_emit_u64(dst_pcie_cfg);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_stats_get,

This should be a fast path trace point.
For reference, ethdev considers rte_eth_stats_get() a fast path function.

> + RTE_TRACE_POINT_ARGS(int16_t dev_id, uint16_t vchan,
> +  struct rte_dma_stats *stats, int ret),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_u16(vchan);
> + rte_trace_point_emit_u64(stats->submitted);
> + rte_trace_point_emit_u64(stats->completed);
> + rte_trace_point_emit_u64(stats->errors);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_stats_reset,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, uint16_t vchan, int ret),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_u16(vchan);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_vchan_status,

This should be a fast path trace point.
For reference, ethdev considers rte_eth_link_get() a fast path function.

> + RTE_TRACE_POINT_ARGS(int16_t dev_id, uint16_t vchan,
> +  enum rte_dma_vchan_status *status, int ret),
> + int vchan_status = *status;
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_u16(vchan);
> + rte_trace_point_emit_int(vchan_status);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +RTE_TRACE_POINT(
> + rte_dma_trace_dump,
> + RTE_TRACE_POINT_ARGS(int16_t dev_id, FILE *f, int ret),
> + rte_trace_point_emit_i16(dev_id);
> + rte_trace_point_emit_ptr(f);
> + rte_trace_point_emit_int(ret);
> +)
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_DMADEV_

Re: [RFC 00/27] Add VDUSE support to Vhost library

2023-04-12 Thread Ferruh Yigit

On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.

Hi Maxime,

Is this a replacement to the existing DPDK vDPA framework? What is the
plan for long term?

[PATCH] net/mlx5: fix lro update tcp header cksum error

2023-04-12 Thread jiangheng (G)

csum is the sum of three 16 bits value
it must be folded twice to ensure that the upper 16 bits are 0
---
 drivers/net/mlx5/mlx5_rx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index a2be523e9e..ae537dfffa 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -1090,6 +1090,7 @@ mlx5_lro_update_tcp_hdr(struct rte_tcp_hdr 
*__rte_restrict tcp,
tcp->cksum = 0;
csum += rte_raw_cksum(tcp, (tcp->data_off >> 4) * 4);
csum = ((csum & 0x) >> 16) + (csum & 0x);
+   csum = ((csum & 0x) >> 16) + (csum & 0x);
csum = (~csum) & 0x;
if (csum == 0)
csum = 0x;
--
2.27.0

Re: [PATCH v3] net/sfc: stop misuse of Rx ingress m-port metadata on EF100

2023-04-12 Thread Ferruh Yigit

On 3/12/2023 10:54 AM, Ivan Malov wrote:
> The driver supports representor functionality. In it,
> packets coming from VFs to the dedicated back-end Rx
> queue get demultiplexed into front-end Rx queues of
> representor ethdevs as per the per-packet metadata
> indicating logical HW ingress ports. On transmit,
> packets are provided with symmetrical metadata
> by front-end Tx queues, and the back-end queue
> transforms the data into so-called Tx override
> descriptors. These let the packets bypass flow
> lookup and go directly to the represented VFs.
> 
> However, in the Rx part, the driver extracts
> the said metadata on every HW Rx queue, that
> is, not just on the one used by representors.
> Doing so leads to a buggy behaviour. It is
> revealed by operating testpmd as follows:
> 
> dpdk-testpmd -a :c6:00.0 -a :c6:00.1 -- -i
> 
> testpmd> flow create 0 transfer pattern port_representor \
>  port_id is 0 / end actions port_representor port_id 1 / end
> Flow rule #0 created
> 
> testpmd> set fwd io
> testpmd> start tx_first
> 
> testpmd> flow destroy 0 rule 0
> Flow rule #0 destroyed
> 
> testpmd> stop
> 
>   -- Forward statistics for port 0  -
>   RX-packets: 19196498   RX-dropped: 0 RX-total: 19196498
>   TX-packets: 19196535   TX-dropped: 0 TX-total: 19196535
>   ---
> 
>   -- Forward statistics for port 1  -
>   RX-packets: 19196503   RX-dropped: 0 RX-total: 19196503
>   TX-packets: 19196530   TX-dropped: 0 TX-total: 19196530
>   ---
> 
> In this scenario, two physical functions of the adapter
> do not have any corresponding "back-to-back" forwarder
> on peer host. Packets transmitted from port 0 can only
> be forwarded to port 1 by means of a special flow rule.
> 
> The flow rule indeed works, but destroying it does not
> stop forwarding. Port statistics carry on incrementing.
> 
> Also, it is apparent that forwarding in the opposite
> direction must not have worked in this case as the
> flow is meant to target only one of the directions.
> 
> Because of the bug, the first 32 mbufs received
> as a result of the flow rule operation have the
> said metadata present. In io mode, testpmd does
> not tamper with mbufs and passes them directly
> to transmit path, so this data remains in them
> instructing the PMD to override destinations
> of the packets via Tx option descriptors.
> 
> Expected behaviour is as follows:
> 
>   -- Forward statistics for port 0  -
>   RX-packets: 0  RX-dropped: 0 RX-total: 0
>   TX-packets: 15787496   TX-dropped: 0 TX-total: 15787496
>   ---
> 
>   -- Forward statistics for port 1  -
>   RX-packets: 15787464   RX-dropped: 0 RX-total: 15787464
>   TX-packets: 32 TX-dropped: 0 TX-total: 32
>   ---
> 
> These figures show the rule work only for one direction.
> Also, removing the flow shall cause forwarding to cease.
> 
> Provided patch fixes the bug accordingly.
> 
> Fixes: d0f981a3efd8 ("net/sfc: handle ingress mport in EF100 Rx prefix")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Malov 
> Reviewed-by: Andy Moreton 
> ---
> v3: extra rework after review feedback
> v2: address community review notes
> 
>  drivers/net/sfc/sfc_dp_rx.h|  1 +
>  drivers/net/sfc/sfc_ef100_rx.c | 18 ++
>  drivers/net/sfc/sfc_rx.c   |  3 +++
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/sfc/sfc_dp_rx.h b/drivers/net/sfc/sfc_dp_rx.h
> index 246adbd87c..8a504bdcf1 100644
> --- a/drivers/net/sfc/sfc_dp_rx.h
> +++ b/drivers/net/sfc/sfc_dp_rx.h
> @@ -69,6 +69,7 @@ struct sfc_dp_rx_qcreate_info {
>   /** Receive queue flags initializer */
>   unsigned intflags;
>  #define SFC_RXQ_FLAG_RSS_HASH0x1
> +#define SFC_RXQ_FLAG_INGRESS_MPORT   0x2
>  
>   /** Rx queue size */
>   unsigned intrxq_entries;
> diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
> index b7e3397f77..e323156a26 100644
> --- a/drivers/net/sfc/sfc_ef100_rx.c
> +++ b/drivers/net/sfc/sfc_ef100_rx.c
> @@ -823,6 +823,9 @@ sfc_ef100_rx_qcreate(uint16_t port_id, uint16_t queue_id,
>   if (rxq->nic_dma_info->nb_regions > 0)
>   rxq->flags |= SFC_EF100_RXQ_NIC_DMA_MAP;
>  
> + if (info->flags & SFC_RXQ_FLAG_INGRESS_MPORT)
> + rxq->flags |= SFC_EF100_RXQ_INGRESS_MPORT;
> +
>   sfc_ef100_rx_debug(rxq, "RxQ doorbell is %p", rxq->doorbell);
>  
>   *dp_rxqp = &rxq->dp;
> @@ -889,11 +892,18 @@ sfc_ef100_rx_qstart(struct sfc_d

RE: [PATCH v4 06/14] eal: use prefetch intrinsics

2023-04-12 Thread Konstantin Ananyev




> On Tue, Apr 11, 2023 at 02:12:20PM -0700, Tyler Retzlaff wrote:
> > Inline assembly is not supported for MSVC x64 instead use _mm_prefetch
> > and _mm_cldemote intrinsics.
> >
> > Signed-off-by: Tyler Retzlaff 
> > ---
> 
> Acked-by: Bruce Richardson 
> 
> One comment inline below for future consideration.
> 
> >  lib/eal/x86/include/rte_prefetch.h | 29 +
> >  1 file changed, 29 insertions(+)
> >
> > diff --git a/lib/eal/x86/include/rte_prefetch.h 
> > b/lib/eal/x86/include/rte_prefetch.h
> > index 7fd01c4..1391af0 100644
> > --- a/lib/eal/x86/include/rte_prefetch.h
> > +++ b/lib/eal/x86/include/rte_prefetch.h
> > @@ -13,6 +13,7 @@
> >  #include 
> >  #include "generic/rte_prefetch.h"
> >
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  static inline void rte_prefetch0(const volatile void *p)
> >  {
> > asm volatile ("prefetcht0 %[p]" : : [p] "m" (*(const volatile char 
> > *)p));
> > @@ -43,6 +44,34 @@ static inline void rte_prefetch_non_temporal(const 
> > volatile void *p)
> >  {
> > asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
> >  }
> > +#else
> > +static inline void rte_prefetch0(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 1);
> > +}
> > +
> > +static inline void rte_prefetch1(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 2);
> > +}
> > +
> > +static inline void rte_prefetch2(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 3);
> > +}
> > +
> > +static inline void rte_prefetch_non_temporal(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 0);
> > +}
> 
> For these prefetch instructions, I'm not sure there is any reason why we
> can't drop the inline assembly versions. The instructions are very old at
> this point and should be widely supported by all compilers we use.
> 
> Rather than using hard-coded 1, 2, 3 values in the prefetch calls, I
> believe there should be defines for the levels: "_MM_HINT_T0",
> "_MM_HINT_T1" etc.

+1
 

> 
> > +__rte_experimental
> > +static inline void
> > +rte_cldemote(const volatile void *p)
> > +{
> > +   _mm_cldemote(p);
> > +}
> > +#endif
> > +
> >
> >  #ifdef __cplusplus
> >  }
> > --
> > 1.8.3.1
> >

RE: [PATCH v4 03/14] eal: use barrier intrinsics

2023-04-12 Thread Konstantin Ananyev



> Inline assembly is not supported for MSVC x64 instead expand
> rte_compiler_barrier as _ReadWriteBarrier and for rte_smp_mb
> _m_mfence intrinsics.
> 
> Signed-off-by: Tyler Retzlaff 
> ---
>  lib/eal/include/generic/rte_atomic.h | 4 
>  lib/eal/x86/include/rte_atomic.h | 5 -
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eal/include/generic/rte_atomic.h 
> b/lib/eal/include/generic/rte_atomic.h
> index 234b268..e973184 100644
> --- a/lib/eal/include/generic/rte_atomic.h
> +++ b/lib/eal/include/generic/rte_atomic.h
> @@ -116,9 +116,13 @@
>   * Guarantees that operation reordering does not occur at compile time
>   * for operations directly before and after the barrier.
>   */
> +#ifndef RTE_TOOLCHAIN_MSVC
>  #define  rte_compiler_barrier() do { \
>   asm volatile ("" : : : "memory");   \
>  } while(0)
> +#else
> +#define rte_compiler_barrier() _ReadWriteBarrier()
> +#endif
> 
>  /**
>   * Synchronization fence between threads based on the specified memory order.
> diff --git a/lib/eal/x86/include/rte_atomic.h 
> b/lib/eal/x86/include/rte_atomic.h
> index f2ee1a9..ca733c5 100644
> --- a/lib/eal/x86/include/rte_atomic.h
> +++ b/lib/eal/x86/include/rte_atomic.h
> @@ -28,7 +28,6 @@
>  #define  rte_rmb() _mm_lfence()
> 
>  #define rte_smp_wmb() rte_compiler_barrier()
> -
>  #define rte_smp_rmb() rte_compiler_barrier()
> 
>  /*
> @@ -66,11 +65,15 @@
>  static __rte_always_inline void
>  rte_smp_mb(void)
>  {
> +#ifndef RTE_TOOLCHAIN_MSVC
>  #ifdef RTE_ARCH_I686
>   asm volatile("lock addl $0, -128(%%esp); " ::: "memory");
>  #else
>   asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
>  #endif
> +#else
> + _mm_mfence();
> +#endif
>  }
> 
>  #define rte_io_mb() rte_mb()
> --

Acked-by: Konstantin Ananyev 
 

> 1.8.3.1

[PATCH] common/mlx5: enable opration in iova virtual address mode

2023-04-12 Thread Viacheslav Ovsiienko

The ConnectX NIC series hardware provides the advanced internal
MMU option and can operate directly ob virtual addresses, host
software should not do any virtual-to-physical address translations.
It means the mlx5 PMDs can operate in DPDK IOVA VA (virtual address)
mode transparently.

To force IOVA VA mode the DPDK should be built with meson option:

  enable_iova_as_pa=false

With this option only drivers supporting IOVA VA mode are enabled.
This patch marks mlx5 drivers with require_iova_in_mbuf flag,
thus allowing their compilation for IOVA VA mode.

Signed-off-by: Viacheslav Ovsiienko 
---
 drivers/common/mlx5/meson.build   | 2 ++
 drivers/compress/mlx5/meson.build | 2 ++
 drivers/crypto/mlx5/meson.build   | 2 ++
 drivers/net/mlx5/meson.build  | 2 ++
 drivers/regex/mlx5/meson.build| 2 ++
 5 files changed, 10 insertions(+)

diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 9dc809f192..26c6d80fe1 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -42,6 +42,8 @@ else
 cflags += [ '-UPEDANTIC' ]
 endif
 
+require_iova_in_mbuf = false
+
 mlx5_config = configuration_data()
 subdir(exec_env)
 configure_file(output: 'mlx5_autoconf.h', configuration: mlx5_config)
diff --git a/drivers/compress/mlx5/meson.build 
b/drivers/compress/mlx5/meson.build
index c906f2d7a2..26a0e0cd09 100644
--- a/drivers/compress/mlx5/meson.build
+++ b/drivers/compress/mlx5/meson.build
@@ -28,3 +28,5 @@ foreach option:cflags_options
 cflags += option
 endif
 endforeach
+
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index a830a4c7b9..a2691ec0f0 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -30,3 +30,5 @@ foreach option:cflags_options
 cflags += option
 endif
 endforeach
+
+require_iova_in_mbuf = false
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index dba911693e..fde7241f7e 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -77,6 +77,8 @@ else
 cflags += [ '-UPEDANTIC' ]
 endif
 
+require_iova_in_mbuf = false
+
 testpmd_sources += files('mlx5_testpmd.c')
 
 subdir(exec_env)
diff --git a/drivers/regex/mlx5/meson.build b/drivers/regex/mlx5/meson.build
index 87404101b9..0f4ca46f44 100644
--- a/drivers/regex/mlx5/meson.build
+++ b/drivers/regex/mlx5/meson.build
@@ -32,3 +32,5 @@ foreach option:cflags_options
 cflags += option
 endif
 endforeach
+
+require_iova_in_mbuf = false
-- 
2.18.1

RE: NVIDIA mlx5 PMD roadmap question

2023-04-12 Thread Slava Ovsiienko

Hi, Morten

Thank you for pointing out this opportunity - to enable IOVA VA mode support in 
mlx5 drivers.
Now we propose this patch:
http://patches.dpdk.org/project/dpdk/patch/20230412125320.8585-1-viachesl...@nvidia.com/

The mlx5 hardware has its own MMU facility, no extra efforts are needed to 
support VA from PMD side.

With best regards,
Slava

> -Original Message-
> From: Morten Brørup 
> Sent: среда, 7 декабря 2022 г. 10:48
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org
> Subject: NVIDIA mlx5 PMD roadmap question
> 
> NVIDIA mlx5 PMD maintainers,
> 
> Are there any plans for NVIDIA to support build time IOVA_AS_VA mode (i.e.
> building with -Denable_iova_as_pa=false) in the mlx5 PMD?
> 
> I expect it to provide higher performance under certain conditions.
> 
> 
> Med venlig hilsen / Kind regards,
> -Morten Brørup

RE: [PATCH] common/mlx5: enable opration in iova virtual address mode

2023-04-12 Thread Morten Brørup

> From: Viacheslav Ovsiienko [mailto:viachesl...@nvidia.com]
> Sent: Wednesday, 12 April 2023 14.53
> 
> The ConnectX NIC series hardware provides the advanced internal
> MMU option and can operate directly ob virtual addresses, host
> software should not do any virtual-to-physical address translations.
> It means the mlx5 PMDs can operate in DPDK IOVA VA (virtual address)
> mode transparently.
> 
> To force IOVA VA mode the DPDK should be built with meson option:
> 
>   enable_iova_as_pa=false
> 
> With this option only drivers supporting IOVA VA mode are enabled.
> This patch marks mlx5 drivers with require_iova_in_mbuf flag,
> thus allowing their compilation for IOVA VA mode.
> 
> Signed-off-by: Viacheslav Ovsiienko 
> ---

Excellent.

Acked-by: Morten Brørup

Re: [PATCH v1 1/2] dts: fabric requirements

2023-04-12 Thread Juraj Linkeš

On Tue, Apr 11, 2023 at 4:48 PM Thomas Monjalon  wrote:
>
> 04/04/2023 13:51, Juraj Linkeš:
> > On Mon, Apr 3, 2023 at 5:18 PM Thomas Monjalon  wrote:
> >
> > > 03/04/2023 16:56, Juraj Linkeš:
> > > > On Mon, Apr 3, 2023 at 2:33 PM Thomas Monjalon 
> > > wrote:
> > > >
> > > > > 03/04/2023 13:46, Juraj Linkeš:
> > > > > > Replace pexpect with Fabric.
> > > > >
> > > > > You should squash these lines with the move to Fabric.
> > > > >
> > > > > > Signed-off-by: Juraj Linkeš 
> > > > > > ---
> > > > > >  dts/poetry.lock| 553
> > > +++--
> > > > >
> > > > > Do we really need *all* these lines?
> > > > > I see a lot of lines about Windows and MacOSX which are not supported
> > > in
> > > > > DTS.
> > > > > It is so long that it looks impossible to review.
> > > > >
> > > > >
> > > > This is a generated file and doesn't need to be reviewed.
> > >
> > > In general, I don't like storing generated files.
> > >
> >
> > Me neither, but this one is specifically designed to be stored in a
> > repository:
> > https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
> >
> >
> > >
> > > > I separated the
> > > > dependencies part so that the code part is easier to review. If you
> > > want, I
> > > > can squash the two commits.
> > >
> > > What happens if we manually remove the useless lines?
> > >
> > >
> > The lock file is there so that everyone installs exactly the same versions
> > of dependencies. We can specify the versions of dependencies in
> > pyproject.toml, but we won't control the versions of dependencies of
> > dependencies this way. If we remove the changes to the lock file, then we
> > won't be storing tested versions, everyone would be using slightly
> > different versions and we may potentially need to address versioning issues
> > in the future - best to prevent that with a lock file.
>
> You didn't answer about removing the usuless lines, like unneeded Windows 
> support.
>

Do you mean the list of files from macos and windows? I tried removing
those from mypy and testing it and it looks like it didn't have an
impact, but I don't know the inner workings of poetry and the lock
file to test it properly (i.e. to rule out any breakages). What would
be the reason for removing those? Seems like it has more downsides (we
could potentially break something and it's extra work) than updsides
(as this is a generated file, I don't really see any).

[PATCH v4] net/sfc: stop misuse of Rx ingress m-port metadata on EF100

2023-04-12 Thread Ivan Malov

The driver supports representor functionality. In it,
packets coming from VFs to the dedicated back-end Rx
queue get demultiplexed into front-end Rx queues of
representor ethdevs as per the per-packet metadata
indicating logical HW ingress ports. On transmit,
packets are provided with symmetrical metadata
by front-end Tx queues, and the back-end queue
transforms the data into so-called Tx override
descriptors. These let the packets bypass flow
lookup and go directly to the represented VFs.

However, in the Rx part, the driver extracts
the said metadata on every HW Rx queue, that
is, not just on the one used by representors.
Doing so leads to a buggy behaviour. It is
revealed by operating testpmd as follows:

dpdk-testpmd -a :c6:00.0 -a :c6:00.1 -- -i

testpmd> flow create 0 transfer pattern port_representor \
 port_id is 0 / end actions port_representor port_id 1 / end
Flow rule #0 created

testpmd> set fwd io
testpmd> start tx_first

testpmd> flow destroy 0 rule 0
Flow rule #0 destroyed

testpmd> stop

  -- Forward statistics for port 0  -
  RX-packets: 19196498   RX-dropped: 0 RX-total: 19196498
  TX-packets: 19196535   TX-dropped: 0 TX-total: 19196535
  ---

  -- Forward statistics for port 1  -
  RX-packets: 19196503   RX-dropped: 0 RX-total: 19196503
  TX-packets: 19196530   TX-dropped: 0 TX-total: 19196530
  ---

In this scenario, two physical functions of the adapter
do not have any corresponding "back-to-back" forwarder
on peer host. Packets transmitted from port 0 can only
be forwarded to port 1 by means of a special flow rule.

The flow rule indeed works, but destroying it does not
stop forwarding. Port statistics carry on incrementing.

Also, it is apparent that forwarding in the opposite
direction must not have worked in this case as the
flow is meant to target only one of the directions.

Because of the bug, the first 32 mbufs received
as a result of the flow rule operation have the
said metadata present. In io mode, testpmd does
not tamper with mbufs and passes them directly
to transmit path, so this data remains in them
instructing the PMD to override destinations
of the packets via Tx option descriptors.

Expected behaviour is as follows:

  -- Forward statistics for port 0  -
  RX-packets: 0  RX-dropped: 0 RX-total: 0
  TX-packets: 15787496   TX-dropped: 0 TX-total: 15787496
  ---

  -- Forward statistics for port 1  -
  RX-packets: 15787464   RX-dropped: 0 RX-total: 15787464
  TX-packets: 32 TX-dropped: 0 TX-total: 32
  ---

These figures show the rule work only for one direction.
Also, removing the flow shall cause forwarding to cease.

Provided patch fixes the bug accordingly.

Fixes: d0f981a3efd8 ("net/sfc: handle ingress mport in EF100 Rx prefix")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Malov 
Reviewed-by: Andy Moreton 
Acked-by: Andrew Rybchenko 
---
v4: a minor edit to ensure the patch applies correctly
v3: extra rework after review feedback
v2: address community review notes

 drivers/net/sfc/sfc_dp_rx.h|  1 +
 drivers/net/sfc/sfc_ef100_rx.c | 18 ++
 drivers/net/sfc/sfc_rx.c   |  3 +++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/sfc/sfc_dp_rx.h b/drivers/net/sfc/sfc_dp_rx.h
index 246adbd87c..8a504bdcf1 100644
--- a/drivers/net/sfc/sfc_dp_rx.h
+++ b/drivers/net/sfc/sfc_dp_rx.h
@@ -69,6 +69,7 @@ struct sfc_dp_rx_qcreate_info {
/** Receive queue flags initializer */
unsigned intflags;
 #define SFC_RXQ_FLAG_RSS_HASH  0x1
+#define SFC_RXQ_FLAG_INGRESS_MPORT 0x2
 
/** Rx queue size */
unsigned intrxq_entries;
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index 16cd8524d3..37b754fa33 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -810,6 +810,9 @@ sfc_ef100_rx_qcreate(uint16_t port_id, uint16_t queue_id,
if (rxq->nic_dma_info->nb_regions > 0)
rxq->flags |= SFC_EF100_RXQ_NIC_DMA_MAP;
 
+   if (info->flags & SFC_RXQ_FLAG_INGRESS_MPORT)
+   rxq->flags |= SFC_EF100_RXQ_INGRESS_MPORT;
+
sfc_ef100_rx_debug(rxq, "RxQ doorbell is %p", rxq->doorbell);
 
*dp_rxqp = &rxq->dp;
@@ -876,11 +879,18 @@ sfc_ef100_rx_qstart(struct sfc_dp_rxq *dp_rxq, unsigned 
int evq_read_ptr,
else
rxq->flags &= ~SFC_EF100_RXQ_USER_MARK;
 
+
+   /*
+* At the moment, this feature is used only
+

RE: [PATCH] net/mlx5: fix lro update tcp header cksum error

2023-04-12 Thread Slava Ovsiienko

Hi,  Jiangheng

You are right, the corner case of sum of 3 is 0x1 gives the wrong result.
Could you,  please, format the patch according to the rules and send v2 ?
- add Fixes: tag with reference to appropriate commit
- add Cc: sta...@dpdk.org
- fix typos in commit message - capitalize sentences, add trailing points, etc.

With best regards,
Slava

> -Original Message-
> From: jiangheng (G) 
> Sent: среда, 12 апреля 2023 г. 14:39
> To: dev@dpdk.org; Matan Azrad ; Slava Ovsiienko
> 
> Subject: [PATCH] net/mlx5: fix lro update tcp header cksum error
> 
> csum is the sum of three 16 bits value
> it must be folded twice to ensure that the upper 16 bits are 0
> ---
>  drivers/net/mlx5/mlx5_rx.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c index
> a2be523e9e..ae537dfffa 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -1090,6 +1090,7 @@ mlx5_lro_update_tcp_hdr(struct rte_tcp_hdr
> *__rte_restrict tcp,
> tcp->cksum = 0;
> csum += rte_raw_cksum(tcp, (tcp->data_off >> 4) * 4);
> csum = ((csum & 0x) >> 16) + (csum & 0x);
> +   csum = ((csum & 0x) >> 16) + (csum & 0x);
> csum = (~csum) & 0x;
> if (csum == 0)
> csum = 0x;
> --
> 2.27.0

RE: 20.11.8 patches review and test

2023-04-12 Thread Ali Alnubani

> -Original Message-
> From: luca.bocca...@gmail.com 
> Sent: Friday, March 31, 2023 9:20 PM
> To: sta...@dpdk.org
> Cc: dev@dpdk.org; Abhishek Marathe ;
> Ali Alnubani ; benjamin.wal...@intel.com; David
> Christensen ; Hemant Agrawal
> ; Ian Stokes ; Jerin
> Jacob ; John McNamara ;
> Ju-Hyoung Lee ; Kevin Traynor
> ; Luca Boccassi ; Pei Zhang
> ; qian.q...@intel.com; Raslan Darawsheh
> ; NBU-Contact-Thomas Monjalon (EXTERNAL)
> ; Yanghang Liu ;
> yuan.p...@intel.com; zhaoyan.c...@intel.com
> Subject: 20.11.8 patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for stable release 20.11.8.
> 
> The planned date for the final release is April 17th.
> 
> Please help with testing and validation of your use cases and report
> any issues/results with reply-all to this mail. For the final release
> the fixes and reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.8-rc1
> 
> These patches are located at branch 20.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Luca Boccassi
> 
> ---

Hello,

We ran the following functional tests with Nvidia hardware on v20.11.8-rc1:
- Basic functionality:
  Send and receive multiple types of traffic.
- testpmd xstats counter test.
- testpmd timestamp test.
- Changing/checking link status through testpmd.
- rte_flow tests.
- Some RSS tests.
- VLAN filtering, stripping and insertion tests.
- Checksum and TSO tests.
- ptype tests.
- link_status_interrupt example application tests.
- l3fwd-power example application tests.
- Multi-process example applications tests.
- Hardware LRO tests.

Functional tests ran on:
- NIC: ConnectX-6 Dx / OS: Ubuntu 20.04 / Driver: MLNX_OFED_LINUX-5.9-0.5.6.0 / 
Firmware: 22.36.1010
- NIC: ConnectX-7 / OS: Ubuntu 20.04 / Driver: MLNX_OFED_LINUX-5.9-0.5.6.0 / 
Firmware: 22.36.1010
- DPU: BlueField-2 / DOCA SW version: 1.5.1 / Firmware: 24.35.2000

Additionally, we ran compilation tests with multiple configurations in the 
following OS/driver combinations:
- Ubuntu 20.04.5 with MLNX_OFED_LINUX-5.9-0.5.6.0.
- Ubuntu 20.04.5 with rdma-core master (f0a079f).
- Ubuntu 20.04.5 with rdma-core v28.0.
- Ubuntu 18.04.6 with rdma-core v17.1.
- Ubuntu 18.04.6 with rdma-core master (f0a079f) (i386).
- Fedora 37 with rdma-core v41.0.
- Fedora 39 (Rawhide) with rdma-core v44.0.
- CentOS 7 7.9.2009 with rdma-core master (f0a079f).
- CentOS 7 7.9.2009 with MLNX_OFED_LINUX-5.9-0.5.6.0.
- CentOS 8 8.4.2105 with rdma-core master (f0a079f).
- OpenSUSE Leap 15.4 with rdma-core v38.1.
- Windows Server 2019 with Clang 11.0.0.

We don't see new issues caused by the changes in this release.

Please note that not all the functional tests mentioned above fall under "Basic 
functionality with testpmd" like reported in the release notes for previous 
releases:
https://git.dpdk.org/dpdk-stable/commit/?h=v20.11.7&id=62865fef48cb93042e8b9f85821eb02e1031e8f0
Some of them test other applications.

Thanks,
Ali

Re: [PATCH v4 06/14] eal: use prefetch intrinsics

2023-04-12 Thread Tyler Retzlaff

On Wed, Apr 12, 2023 at 10:05:57AM +0100, Bruce Richardson wrote:
> On Tue, Apr 11, 2023 at 02:12:20PM -0700, Tyler Retzlaff wrote:
> > Inline assembly is not supported for MSVC x64 instead use _mm_prefetch
> > and _mm_cldemote intrinsics.
> > 
> > Signed-off-by: Tyler Retzlaff 
> > ---
> 
> Acked-by: Bruce Richardson 
> 
> One comment inline below for future consideration.
> 
> >  lib/eal/x86/include/rte_prefetch.h | 29 +
> >  1 file changed, 29 insertions(+)
> > 
> > diff --git a/lib/eal/x86/include/rte_prefetch.h 
> > b/lib/eal/x86/include/rte_prefetch.h
> > index 7fd01c4..1391af0 100644
> > --- a/lib/eal/x86/include/rte_prefetch.h
> > +++ b/lib/eal/x86/include/rte_prefetch.h
> > @@ -13,6 +13,7 @@
> >  #include 
> >  #include "generic/rte_prefetch.h"
> >  
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  static inline void rte_prefetch0(const volatile void *p)
> >  {
> > asm volatile ("prefetcht0 %[p]" : : [p] "m" (*(const volatile char 
> > *)p));
> > @@ -43,6 +44,34 @@ static inline void rte_prefetch_non_temporal(const 
> > volatile void *p)
> >  {
> > asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
> >  }
> > +#else
> > +static inline void rte_prefetch0(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 1);
> > +}
> > +
> > +static inline void rte_prefetch1(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 2);
> > +}
> > +
> > +static inline void rte_prefetch2(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 3);
> > +}
> > +
> > +static inline void rte_prefetch_non_temporal(const volatile void *p)
> > +{
> > +   _mm_prefetch(p, 0);
> > +}
> 
> For these prefetch instructions, I'm not sure there is any reason why we
> can't drop the inline assembly versions. The instructions are very old at
> this point and should be widely supported by all compilers we use.
> 
> Rather than using hard-coded 1, 2, 3 values in the prefetch calls, I
> believe there should be defines for the levels: "_MM_HINT_T0",
> "_MM_HINT_T1" etc.

hm, i did not know about these and i bet they fix the problem i had.
i.e. if i use e.g. bare '1' i would not get the same prefetch codegen on
gcc/msvc but these defines probably resolve that problem.

let me take another look at this one.

> 
> > +__rte_experimental
> > +static inline void
> > +rte_cldemote(const volatile void *p)
> > +{
> > +   _mm_cldemote(p);
> > +}
> > +#endif
> > +
> >  
> >  #ifdef __cplusplus
> >  }
> > -- 
> > 1.8.3.1
> >

Re: Consult the official release version of dpdk-kmod

2023-04-12 Thread Tyler Retzlaff

On Mon, Apr 10, 2023 at 12:34:28PM +, wangzengyuan wrote:
> Hi,
> 
> We are honored to use dpdk and dpdk-kmod, two outstanding open source 
> softwares. However, according to the company's open source software usage 
> standards, only officially released versions can be introduced as open source 
> software. As far as I know, there is no official release of dpdk-kmod. Can 
> the community release an official version for users?

are you asking about the linux or windows components? i can speak to
plans for the windows components if that's what is being asked.

> 
> Thank you for your time.
> 
> Yours sincerely
> 
> Zengyuan Wang

Re: [PATCH v1 1/2] dts: fabric requirements

2023-04-12 Thread Thomas Monjalon

12/04/2023 15:42, Juraj Linkeš:
> On Tue, Apr 11, 2023 at 4:48 PM Thomas Monjalon  wrote:
> >
> > 04/04/2023 13:51, Juraj Linkeš:
> > > On Mon, Apr 3, 2023 at 5:18 PM Thomas Monjalon  
> > > wrote:
> > >
> > > > 03/04/2023 16:56, Juraj Linkeš:
> > > > > On Mon, Apr 3, 2023 at 2:33 PM Thomas Monjalon 
> > > > wrote:
> > > > >
> > > > > > 03/04/2023 13:46, Juraj Linkeš:
> > > > > > > Replace pexpect with Fabric.
> > > > > >
> > > > > > You should squash these lines with the move to Fabric.
> > > > > >
> > > > > > > Signed-off-by: Juraj Linkeš 
> > > > > > > ---
> > > > > > >  dts/poetry.lock| 553
> > > > +++--
> > > > > >
> > > > > > Do we really need *all* these lines?
> > > > > > I see a lot of lines about Windows and MacOSX which are not 
> > > > > > supported
> > > > in
> > > > > > DTS.
> > > > > > It is so long that it looks impossible to review.
> > > > > >
> > > > > >
> > > > > This is a generated file and doesn't need to be reviewed.
> > > >
> > > > In general, I don't like storing generated files.
> > > >
> > >
> > > Me neither, but this one is specifically designed to be stored in a
> > > repository:
> > > https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
> > >
> > >
> > > >
> > > > > I separated the
> > > > > dependencies part so that the code part is easier to review. If you
> > > > want, I
> > > > > can squash the two commits.
> > > >
> > > > What happens if we manually remove the useless lines?
> > > >
> > > >
> > > The lock file is there so that everyone installs exactly the same versions
> > > of dependencies. We can specify the versions of dependencies in
> > > pyproject.toml, but we won't control the versions of dependencies of
> > > dependencies this way. If we remove the changes to the lock file, then we
> > > won't be storing tested versions, everyone would be using slightly
> > > different versions and we may potentially need to address versioning 
> > > issues
> > > in the future - best to prevent that with a lock file.
> >
> > You didn't answer about removing the usuless lines, like unneeded Windows 
> > support.
> >
> 
> Do you mean the list of files from macos and windows? I tried removing
> those from mypy and testing it and it looks like it didn't have an
> impact, but I don't know the inner workings of poetry and the lock
> file to test it properly (i.e. to rule out any breakages). What would
> be the reason for removing those? Seems like it has more downsides (we
> could potentially break something and it's extra work) than updsides
> (as this is a generated file, I don't really see any).

Yes this is what I mean.
Any other opinion?

Re: [RFC 00/27] Add VDUSE support to Vhost library

2023-04-12 Thread Maxime Coquelin


Hi Ferruh,

On 4/12/23 13:33, Ferruh Yigit wrote:

On 3/31/2023 4:42 PM, Maxime Coquelin wrote:

This series introduces a new type of backend, VDUSE,
to the Vhost library.

VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.

Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.

It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.

While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:

https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc

In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.


Hi Maxime,

Is this a replacement to the existing DPDK vDPA framework? What is the
plan for long term?



No, this is not a replacement for DPDK vDPA framework.

We (Red Hat) don't have plans to support DPDK vDPA framework in our
products, but there are still contribution to DPDK vDPA by several vDPA
hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
to be deprecated soon.

Regards,
Maxime

RE: [PATCH v1 1/2] dts: fabric requirements

2023-04-12 Thread Honnappa Nagarahalli



> -Original Message-
> From: Thomas Monjalon 
> Sent: Wednesday, April 12, 2023 10:25 AM
> To: Juraj Linkeš 
> Cc: Wathsala Wathawana Vithanage ;
> jspew...@iol.unh.edu; pr...@iol.unh.edu; Honnappa Nagarahalli
> ; lijuan...@intel.com;
> bruce.richard...@intel.com; dev@dpdk.org
> Subject: Re: [PATCH v1 1/2] dts: fabric requirements
> 
> 12/04/2023 15:42, Juraj Linkeš:
> > On Tue, Apr 11, 2023 at 4:48 PM Thomas Monjalon 
> wrote:
> > >
> > > 04/04/2023 13:51, Juraj Linkeš:
> > > > On Mon, Apr 3, 2023 at 5:18 PM Thomas Monjalon
>  wrote:
> > > >
> > > > > 03/04/2023 16:56, Juraj Linkeš:
> > > > > > On Mon, Apr 3, 2023 at 2:33 PM Thomas Monjalon
> > > > > > 
> > > > > wrote:
> > > > > >
> > > > > > > 03/04/2023 13:46, Juraj Linkeš:
> > > > > > > > Replace pexpect with Fabric.
> > > > > > >
> > > > > > > You should squash these lines with the move to Fabric.
> > > > > > >
> > > > > > > > Signed-off-by: Juraj Linkeš 
> > > > > > > > ---
> > > > > > > >  dts/poetry.lock| 553
> > > > > +++--
> > > > > > >
> > > > > > > Do we really need *all* these lines?
> > > > > > > I see a lot of lines about Windows and MacOSX which are not
> > > > > > > supported
> > > > > in
> > > > > > > DTS.
> > > > > > > It is so long that it looks impossible to review.
> > > > > > >
> > > > > > >
> > > > > > This is a generated file and doesn't need to be reviewed.
> > > > >
> > > > > In general, I don't like storing generated files.
> > > > >
> > > >
> > > > Me neither, but this one is specifically designed to be stored in
> > > > a
> > > > repository:
> > > > https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock
> > > > -file-to-version-control
> > > >
> > > >
> > > > >
> > > > > > I separated the
> > > > > > dependencies part so that the code part is easier to review.
> > > > > > If you
> > > > > want, I
> > > > > > can squash the two commits.
> > > > >
> > > > > What happens if we manually remove the useless lines?
> > > > >
> > > > >
> > > > The lock file is there so that everyone installs exactly the same
> > > > versions of dependencies. We can specify the versions of
> > > > dependencies in pyproject.toml, but we won't control the versions
> > > > of dependencies of dependencies this way. If we remove the changes
> > > > to the lock file, then we won't be storing tested versions,
> > > > everyone would be using slightly different versions and we may
> > > > potentially need to address versioning issues in the future - best to 
> > > > prevent
> that with a lock file.
> > >
> > > You didn't answer about removing the usuless lines, like unneeded Windows
> support.
> > >
> >
> > Do you mean the list of files from macos and windows? I tried removing
> > those from mypy and testing it and it looks like it didn't have an
> > impact, but I don't know the inner workings of poetry and the lock
> > file to test it properly (i.e. to rule out any breakages). What would
> > be the reason for removing those? Seems like it has more downsides (we
> > could potentially break something and it's extra work) than updsides
> > (as this is a generated file, I don't really see any).
> 
> Yes this is what I mean.
> Any other opinion?
> 
If it is a generated file, there might be an expectation from the tool that the 
file is not changed. It would be good to understand this.

Since it is a generated file, should we generate this during DTS run time 
rather than storing a generated file?

>

Re: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Rushil Gupta

On Wed, Apr 12, 2023 at 2:41 AM Guo, Junfeng  wrote:

>
>
> > -Original Message-
> > From: Ferruh Yigit 
> > Sent: Wednesday, April 12, 2023 17:35
> > To: Guo, Junfeng ; Richardson, Bruce
> > 
> > Cc: dev@dpdk.org; Zhang, Qi Z ; Rushil Gupta
> > 
> > Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> >
> > On 4/12/2023 10:09 AM, Guo, Junfeng wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: Ferruh Yigit 
> > >> Sent: Wednesday, April 12, 2023 16:50
> > >> To: Guo, Junfeng ; Richardson, Bruce
> > >> 
> > >> Cc: dev@dpdk.org; Zhang, Qi Z ; Rushil Gupta
> > >> 
> > >> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> > >>
> > >> On 4/11/2023 7:51 AM, Guo, Junfeng wrote:
> > >>
> > >> Hi Junfeng, message moved down.
> > >>
> > >>>
> >  -Original Message-
> >  From: Rushil Gupta 
> >  Sent: Tuesday, April 11, 2023 12:59
> >  To: Zhang, Qi Z ; ferruh.yi...@amd.com
> >  Cc: Richardson, Bruce ;
> > dev@dpdk.org;
> >  Rushil Gupta ; Guo, Junfeng
> >  
> >  Subject: [PATCH 1/1] net/gve: update base code for DQO
> > 
> >  Update gve base code to support DQO.
> > 
> >  This patch is based on this:
> > 
> > https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*
> > 
> >  Signed-off-by: Rushil Gupta 
> >  Signed-off-by: Junfeng Guo 
> > >>> Hi Ferruh & Bruce,
> > >>>
> > >>> This patch contains few lines change for the MIT licensed gve base
> > code.
> > >>> Note that there is no new files added, just some minor code update.
> > >>>
> > >>> Do we need to ask for special approval from the Tech Board for this?
> > >>> Please help give some advice and also help review this patch. Thanks!
> > >>>
> > >>
> > >> Once the MIT license exception is in place, as far as I know no more
> > >> approval is required per change.
> > >
> > > Got it, thanks the comment!
> > >
> > > Then we may also need your help to review, as well as the coming patch
> > > set for GVE PMD enhancement for DPDK 23.07. Thanks in advance!
> > >
> > >>
> > >>> BTW, Google will also help replace all the base code under MIT
> > license
> > >>> with the ones under BSD-3 license soon, which would make things
> > more
> > >>> easier.
> > >>>
> > >>
> > >> Is this different from base code under DPDK is changing license [1] ?
> > >>
> > >>
> > >> [1]
> > >>
> > https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar
> > >> chive=both
> > >>
> > >
> > > The patch set of the above link only contains the processing of replace
> > the
> > > MIT licensed base code with the BSD-3 licensed base code. After some
> > > discussion, we think Google is in the right place to do that work. And
> > they
> > > are working on that now.
> > >
> >
> > Is the Google GVE driver [2] in the process of changing license from MIT
> > to BSD-3?
> >
> >
> > [2]
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> > linux/tree/v1.3.0/google/gve
> >
>
> I'm not sure, I don't know much about Google's plans.
> Maybe they could provide some info here. Thanks!
>
> @Rushil Gupta
>
> >
> >
> > > This patch is mainly for the feature upstreaming of DPDK 23.07. It
> > contains
> > > only the code part, following previous license statements, without any
> > > license change.
> > >
> > > This patch is separated and sent by Google, to ensure there is no
> license
> > > violation.
> > >
> > > BTW, about the feature of GVE PMD enhancement, the rest code are all
> > > about BSD-3 licensed files, and that patch set will be sent out soon.
> > >
> > > Thanks!
>
> I have got the green light internally to switch to BSD-3 license for code
under base directory. If it is ok with the tech board, I can send a patch
right away with all of the base files changed to BSD-3 which can be merged
after this patch. Please let me know what you think.
We are also about to upstream driver code:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-freebsd to
FreeBSD as well so you will see similar code under BSD license soon in
freebsd repo.

Re: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Ferruh Yigit

On 4/12/2023 4:42 PM, Rushil Gupta wrote:
> 
> 
> On Wed, Apr 12, 2023 at 2:41 AM Guo, Junfeng  > wrote:
> 
> 
> 
> > -Original Message-
> > From: Ferruh Yigit  >
> > Sent: Wednesday, April 12, 2023 17:35
> > To: Guo, Junfeng  >; Richardson, Bruce
> > mailto:bruce.richard...@intel.com>>
> > Cc: dev@dpdk.org ; Zhang, Qi Z
> mailto:qi.z.zh...@intel.com>>; Rushil Gupta
> > mailto:rush...@google.com>>
> > Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> >
> > On 4/12/2023 10:09 AM, Guo, Junfeng wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: Ferruh Yigit  >
> > >> Sent: Wednesday, April 12, 2023 16:50
> > >> To: Guo, Junfeng  >; Richardson, Bruce
> > >> mailto:bruce.richard...@intel.com>>
> > >> Cc: dev@dpdk.org ; Zhang, Qi Z
> mailto:qi.z.zh...@intel.com>>; Rushil Gupta
> > >> mailto:rush...@google.com>>
> > >> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> > >>
> > >> On 4/11/2023 7:51 AM, Guo, Junfeng wrote:
> > >>
> > >> Hi Junfeng, message moved down.
> > >>
> > >>>
> >  -Original Message-
> >  From: Rushil Gupta  >
> >  Sent: Tuesday, April 11, 2023 12:59
> >  To: Zhang, Qi Z  >; ferruh.yi...@amd.com
> 
> >  Cc: Richardson, Bruce  >;
> > dev@dpdk.org ;
> >  Rushil Gupta  >; Guo, Junfeng
> >  mailto:junfeng@intel.com>>
> >  Subject: [PATCH 1/1] net/gve: update base code for DQO
> > 
> >  Update gve base code to support DQO.
> > 
> >  This patch is based on this:
> > 
> > https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*
> 
> > 
> >  Signed-off-by: Rushil Gupta  >
> >  Signed-off-by: Junfeng Guo  >
> > >>> Hi Ferruh & Bruce,
> > >>>
> > >>> This patch contains few lines change for the MIT licensed gve base
> > code.
> > >>> Note that there is no new files added, just some minor code
> update.
> > >>>
> > >>> Do we need to ask for special approval from the Tech Board for
> this?
> > >>> Please help give some advice and also help review this patch.
> Thanks!
> > >>>
> > >>
> > >> Once the MIT license exception is in place, as far as I know no
> more
> > >> approval is required per change.
> > >
> > > Got it, thanks the comment!
> > >
> > > Then we may also need your help to review, as well as the coming
> patch
> > > set for GVE PMD enhancement for DPDK 23.07. Thanks in advance!
> > >
> > >>
> > >>> BTW, Google will also help replace all the base code under MIT
> > license
> > >>> with the ones under BSD-3 license soon, which would make things
> > more
> > >>> easier.
> > >>>
> > >>
> > >> Is this different from base code under DPDK is changing license
> [1] ?
> > >>
> > >>
> > >> [1]
> > >>
> >
> https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar 
> 
> > >> chive=both
> > >>
> > >
> > > The patch set of the above link only contains the processing of
> replace
> > the
> > > MIT licensed base code with the BSD-3 licensed base code. After some
> > > discussion, we think Google is in the right place to do that
> work. And
> > they
> > > are working on that now.
> > >
> >
> > Is the Google GVE driver [2] in the process of changing license
> from MIT
> > to BSD-3?
> >
> >
> > [2]
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> 
> > linux/tree/v1.3.0/google/gve
> >
> 
> I'm not sure, I don't know much about Google's plans.
> Maybe they could provide some info here. Thanks!
> 
> @Rushil Gupta
> 
> >
> >
> > > This patch is mainly for the feature upstreaming of DPDK 23.07. It
> > contains
> > > only the code part, following previous license statements,
> without any
> > > license change.
> > >
> > > This patch is separated and sent by Google, to ensure there is
> no license
> > > violation.
> > >
> > > BTW, about the feature of GVE PMD enhancement, the rest code are all
> > > about BSD-3

[PATCH v2] common/mlx5: enable operation in iova virtual address mode

2023-04-12 Thread Viacheslav Ovsiienko

The ConnectX NIC series hardware provides advanced internal
MMU option and can operate directly over virtual addresses,
the host software should not care about any virtual-to-physical
address translations. It means the mlx5 PMDs can operate in DPDK
IOVA VA (virtual address) mode transparently.

To force IOVA VA mode the DPDK should be built with meson option:

  enable_iova_as_pa=false

With this option only drivers supporting IOVA VA mode are enabled.
This patch marks mlx5 drivers with require_iova_in_mbuf flag false
value, thus allowing their compilation for IOVA VA mode.

Signed-off-by: Viacheslav Ovsiienko 

---
v1: 
http://patches.dpdk.org/project/dpdk/patch/20230412125320.8585-1-viachesl...@nvidia.com/
v2: fixed typos in commit message, reworded a little bit
---
 drivers/common/mlx5/meson.build   | 2 ++
 drivers/compress/mlx5/meson.build | 2 ++
 drivers/crypto/mlx5/meson.build   | 2 ++
 drivers/net/mlx5/meson.build  | 2 ++
 drivers/regex/mlx5/meson.build| 2 ++
 5 files changed, 10 insertions(+)

diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 9dc809f192..26c6d80fe1 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -42,6 +42,8 @@ else
 cflags += [ '-UPEDANTIC' ]
 endif
 
+require_iova_in_mbuf = false
+
 mlx5_config = configuration_data()
 subdir(exec_env)
 configure_file(output: 'mlx5_autoconf.h', configuration: mlx5_config)
diff --git a/drivers/compress/mlx5/meson.build 
b/drivers/compress/mlx5/meson.build
index c906f2d7a2..26a0e0cd09 100644
--- a/drivers/compress/mlx5/meson.build
+++ b/drivers/compress/mlx5/meson.build
@@ -28,3 +28,5 @@ foreach option:cflags_options
 cflags += option
 endif
 endforeach
+
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index a830a4c7b9..a2691ec0f0 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -30,3 +30,5 @@ foreach option:cflags_options
 cflags += option
 endif
 endforeach
+
+require_iova_in_mbuf = false
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index dba911693e..fde7241f7e 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -77,6 +77,8 @@ else
 cflags += [ '-UPEDANTIC' ]
 endif
 
+require_iova_in_mbuf = false
+
 testpmd_sources += files('mlx5_testpmd.c')
 
 subdir(exec_env)
diff --git a/drivers/regex/mlx5/meson.build b/drivers/regex/mlx5/meson.build
index 87404101b9..0f4ca46f44 100644
--- a/drivers/regex/mlx5/meson.build
+++ b/drivers/regex/mlx5/meson.build
@@ -32,3 +32,5 @@ foreach option:cflags_options
 cflags += option
 endif
 endforeach
+
+require_iova_in_mbuf = false
-- 
2.18.1

[PATCH] eal: choose IOVA mode according to compilation flags

2023-04-12 Thread Viacheslav Ovsiienko

The DPDK can be compiled to be run in IOVA VA mode with
'enable_iova_as_pa=false' meson option. If there is no
explicit EAL --iova-mode parameter specified in the command
line the rte_eal_init() tried to deduce  VA or PA mode without
taking into account the above mentioned compile time option,
resulting into initialization failure.

Signed-off-by: Viacheslav Ovsiienko 
---
 lib/eal/linux/eal.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index c37868b7f0..4481bc4ad8 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1080,7 +1080,10 @@ rte_eal_init(int argc, char **argv)
if (iova_mode == RTE_IOVA_DC) {
RTE_LOG(DEBUG, EAL, "Buses did not request a specific 
IOVA mode.\n");
 
-   if (!phys_addrs) {
+   if (!RTE_IOVA_IN_MBUF) {
+   iova_mode = RTE_IOVA_VA;
+   RTE_LOG(DEBUG, EAL, "IOVA VA mode is forced by 
build option.\n");
+   } else if (!phys_addrs) {
/* if we have no access to physical addresses,
 * pick IOVA as VA mode.
 */
-- 
2.18.1

Re: [dpdk-dev][dpdk-users] A problem about memory may not be all-zero allocated by rte_zmalloc_socket()

2023-04-12 Thread Stephen Hemminger

On Wed, 23 Feb 2022 15:38:09 +
Honnappa Nagarahalli  wrote:

> I have a question, does the dpdk code implement to ensure that the memory 
> initialization is 0?
> [Ruifeng] Clearing of the memory should be done by the kernel. In section 
> 3.1.4.6 of Programmer's Guide, it says: "
> Hugepages are cleared by the kernel when a file in hugetlbfs or its part is 
> mapped for the first time system-wide to prevent data leaks from previous 
> users of the same hugepage".
> http://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#memory-mapping-discovery-and-memory-reservation
> [Yunjian] Thanks. However, hugepages are not cleared by the kernel(version 
> 4.19.90) on the ARM platform.
> [Honnappa] I think that is besides the point we are discussing. 
> rte_zmalloc_socket should be able to zero the memory every time it is called 
> (not just the first time).
> 
> I see that rte_zmalloc_socket explicitly clears the memory using memset when 
> the RTE_MALLOC_DEBUG is enabled. Have you tested with RTE_MALLOC_DEBUG 
> enabled?
> 
> 
> Thanks,
> Yunjian

Normally.
  - hugepage memory is zero'd by kernel when mapped in.  DPDK assumes this 
because the overhead
of zeroing large amounts of memory can impact application startup time.
If kernel is not zeroing, then your kernel is buggy.
  - when memory is freed by rte_free() it is set to zero before returning to 
the pool.
  - when malloc gets memory it will be zero'd

RTE_MALLOC_DEBUG changes this so that:
   - when memory is freed it gets overwritten by a poison value
   - when malloc gets memory it will zero it.

Re: [PATCH 1/1] net/gve: update base code for DQO

2023-04-12 Thread Rushil Gupta

Sorry for the confusion. I was talking about the same patch (titled
net/gve: update copyright holders); however, I am not able to find it on
patchwork.


On Wed, Apr 12, 2023 at 9:03 AM Ferruh Yigit  wrote:

> On 4/12/2023 4:42 PM, Rushil Gupta wrote:
> >
> >
> > On Wed, Apr 12, 2023 at 2:41 AM Guo, Junfeng  > > wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Ferruh Yigit  > >
> > > Sent: Wednesday, April 12, 2023 17:35
> > > To: Guo, Junfeng  > >; Richardson, Bruce
> > > mailto:bruce.richard...@intel.com>>
> > > Cc: dev@dpdk.org ; Zhang, Qi Z
> > mailto:qi.z.zh...@intel.com>>; Rushil Gupta
> > > mailto:rush...@google.com>>
> > > Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> > >
> > > On 4/12/2023 10:09 AM, Guo, Junfeng wrote:
> > > >
> > > >
> > > >> -Original Message-
> > > >> From: Ferruh Yigit  > >
> > > >> Sent: Wednesday, April 12, 2023 16:50
> > > >> To: Guo, Junfeng  > >; Richardson, Bruce
> > > >> mailto:bruce.richard...@intel.com
> >>
> > > >> Cc: dev@dpdk.org ; Zhang, Qi Z
> > mailto:qi.z.zh...@intel.com>>; Rushil Gupta
> > > >> mailto:rush...@google.com>>
> > > >> Subject: Re: [PATCH 1/1] net/gve: update base code for DQO
> > > >>
> > > >> On 4/11/2023 7:51 AM, Guo, Junfeng wrote:
> > > >>
> > > >> Hi Junfeng, message moved down.
> > > >>
> > > >>>
> > >  -Original Message-
> > >  From: Rushil Gupta  > >
> > >  Sent: Tuesday, April 11, 2023 12:59
> > >  To: Zhang, Qi Z  > >; ferruh.yi...@amd.com
> > 
> > >  Cc: Richardson, Bruce  > >;
> > > dev@dpdk.org ;
> > >  Rushil Gupta  > >; Guo, Junfeng
> > >  mailto:junfeng@intel.com>>
> > >  Subject: [PATCH 1/1] net/gve: update base code for DQO
> > > 
> > >  Update gve base code to support DQO.
> > > 
> > >  This patch is based on this:
> > > 
> > > https://patchwork.dpdk.org/project/dpdk/list/?series=27647&state=*
> > 
> > > 
> > >  Signed-off-by: Rushil Gupta  > >
> > >  Signed-off-by: Junfeng Guo  > >
> > > >>> Hi Ferruh & Bruce,
> > > >>>
> > > >>> This patch contains few lines change for the MIT licensed gve
> base
> > > code.
> > > >>> Note that there is no new files added, just some minor code
> > update.
> > > >>>
> > > >>> Do we need to ask for special approval from the Tech Board for
> > this?
> > > >>> Please help give some advice and also help review this patch.
> > Thanks!
> > > >>>
> > > >>
> > > >> Once the MIT license exception is in place, as far as I know no
> > more
> > > >> approval is required per change.
> > > >
> > > > Got it, thanks the comment!
> > > >
> > > > Then we may also need your help to review, as well as the coming
> > patch
> > > > set for GVE PMD enhancement for DPDK 23.07. Thanks in advance!
> > > >
> > > >>
> > > >>> BTW, Google will also help replace all the base code under MIT
> > > license
> > > >>> with the ones under BSD-3 license soon, which would make things
> > > more
> > > >>> easier.
> > > >>>
> > > >>
> > > >> Is this different from base code under DPDK is changing license
> > [1] ?
> > > >>
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar <
> https://patches.dpdk.org/project/dpdk/list/?series=27570&state=%2A&ar>
> > > >> chive=both
> > > >>
> > > >
> > > > The patch set of the above link only contains the processing of
> > replace
> > > the
> > > > MIT licensed base code with the BSD-3 licensed base code. After
> some
> > > > discussion, we think Google is in the right place to do that
> > work. And
> > > they
> > > > are working on that now.
> > > >
> > >
> > > Is the Google GVE driver [2] in the process of changing license
> > from MIT
> > > to BSD-3?
> > >
> > >
> > > [2]
> > > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> > 
> > > linux/tree/v1.3.0/google/gve
> > >
> >
> > I'm not sure, I don't know much about Google's plans.
> > Maybe they could provide some info here. Thanks!
> >
> >

[PATCH v4] app/testpmd: txonly multiflow port change support

2023-04-12 Thread Joshua Washington

Google cloud routes traffic using IP addresses without the support of MAC
addresses, so changing source IP address for txonly-multi-flow can have
negative performance implications for net/gve when using testpmd. This
patch updates txonly multiflow mode to modify source ports instead of
source IP addresses.

The change can be tested with the following command:
dpdk-testpmd -- --forward-mode=txonly --txonly-multi-flow \
--txip=,

Signed-off-by: Joshua Washington 
Reviewed-by: Rushil Gupta 
---
 app/test-pmd/txonly.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index b3d6873104..7fc743b508 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -56,7 +56,7 @@ uint32_t tx_ip_dst_addr = (198U << 24) | (18 << 16) | (0 << 
8) | 2;
 #define IP_DEFTTL  64   /* from RFC 1340. */
 
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. 
*/
-RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
+RTE_DEFINE_PER_LCORE(uint16_t, _src_var); /**< Source port variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
 
 static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */
@@ -230,28 +230,30 @@ pkt_burst_prepare(struct rte_mbuf *pkt, struct 
rte_mempool *mbp,
copy_buf_to_pkt(eth_hdr, sizeof(*eth_hdr), pkt, 0);
copy_buf_to_pkt(&pkt_ip_hdr, sizeof(pkt_ip_hdr), pkt,
sizeof(struct rte_ether_hdr));
+   copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
+   sizeof(struct rte_ether_hdr) +
+   sizeof(struct rte_ipv4_hdr));
if (txonly_multi_flow) {
-   uint8_t  ip_var = RTE_PER_LCORE(_ip_var);
-   struct rte_ipv4_hdr *ip_hdr;
-   uint32_t addr;
+   uint16_t src_var = RTE_PER_LCORE(_src_var);
+   struct rte_udp_hdr *udp_hdr;
+   uint16_t port;
 
-   ip_hdr = rte_pktmbuf_mtod_offset(pkt,
-   struct rte_ipv4_hdr *,
-   sizeof(struct rte_ether_hdr));
+   udp_hdr = rte_pktmbuf_mtod_offset(pkt,
+   struct rte_udp_hdr *,
+   sizeof(struct rte_ether_hdr) +
+   sizeof(struct rte_ipv4_hdr));
/*
-* Generate multiple flows by varying IP src addr. This
-* enables packets are well distributed by RSS in
+* Generate multiple flows by varying UDP source port.
+* This enables packets are well distributed by RSS in
 * receiver side if any and txonly mode can be a decent
 * packet generator for developer's quick performance
 * regression test.
 */
-   addr = (tx_ip_dst_addr | (ip_var++ << 8)) + rte_lcore_id();
-   ip_hdr->src_addr = rte_cpu_to_be_32(addr);
-   RTE_PER_LCORE(_ip_var) = ip_var;
+
+   port = src_var++;
+   udp_hdr->src_port = rte_cpu_to_be_16(port);
+   RTE_PER_LCORE(_src_var) = src_var;
}
-   copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
-   sizeof(struct rte_ether_hdr) +
-   sizeof(struct rte_ipv4_hdr));
 
if (unlikely(tx_pkt_split == TX_PKT_SPLIT_RND) || txonly_multi_flow)
update_pkt_header(pkt, pkt_len);
@@ -393,7 +395,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt);
 
if (txonly_multi_flow)
-   RTE_PER_LCORE(_ip_var) -= nb_pkt - nb_tx;
+   RTE_PER_LCORE(_src_var) -= nb_pkt - nb_tx;
 
if (unlikely(nb_tx < nb_pkt)) {
if (verbose_level > 0 && fs->fwd_dropped == 0)
-- 
2.40.0.577.gac1e443424-goog

RE: [PATCH] eal: choose IOVA mode according to compilation flags

2023-04-12 Thread Morten Brørup

> From: Viacheslav Ovsiienko [mailto:viachesl...@nvidia.com]
> Sent: Wednesday, 12 April 2023 19.20
> 
> The DPDK can be compiled to be run in IOVA VA mode with
> 'enable_iova_as_pa=false' meson option. If there is no
> explicit EAL --iova-mode parameter specified in the command
> line the rte_eal_init() tried to deduce  VA or PA mode without
> taking into account the above mentioned compile time option,
> resulting into initialization failure.
> 
> Signed-off-by: Viacheslav Ovsiienko 
> ---
>  lib/eal/linux/eal.c | 5 -

This patch is close to being a bugfix. Good catch!

You could also consider another patch, logging warnings in 
rte_bus_get_iommu_class() [1] for the busses that want IOVA as PA when 
!RTE_IOVA_IN_MBUF.

[1]: 
https://elixir.bootlin.com/dpdk/v23.03/source/lib/eal/common/eal_common_bus.c#L224

>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index c37868b7f0..4481bc4ad8 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1080,7 +1080,10 @@ rte_eal_init(int argc, char **argv)
>   if (iova_mode == RTE_IOVA_DC) {
>   RTE_LOG(DEBUG, EAL, "Buses did not request a specific
> IOVA mode.\n");
> 
> - if (!phys_addrs) {
> + if (!RTE_IOVA_IN_MBUF) {
> + iova_mode = RTE_IOVA_VA;
> + RTE_LOG(DEBUG, EAL, "IOVA VA mode is forced by
> build option.\n");

Minor detail regarding conventions: "IOVA VA " -> "IOVA as VA"

> + } else if (!phys_addrs) {
>   /* if we have no access to physical addresses,
>* pick IOVA as VA mode.
>*/
> --
> 2.18.1

Reviewed-by: Morten Brørup

RE: [PATCH v2] common/mlx5: enable operation in iova virtual address mode

2023-04-12 Thread Morten Brørup

> From: Viacheslav Ovsiienko [mailto:viachesl...@nvidia.com]
> Sent: Wednesday, 12 April 2023 19.07
> 
> The ConnectX NIC series hardware provides advanced internal
> MMU option and can operate directly over virtual addresses,
> the host software should not care about any virtual-to-physical
> address translations. It means the mlx5 PMDs can operate in DPDK
> IOVA VA (virtual address) mode transparently.
> 
> To force IOVA VA mode the DPDK should be built with meson option:
> 
>   enable_iova_as_pa=false
> 
> With this option only drivers supporting IOVA VA mode are enabled.
> This patch marks mlx5 drivers with require_iova_in_mbuf flag false
> value, thus allowing their compilation for IOVA VA mode.
> 
> Signed-off-by: Viacheslav Ovsiienko 

Acked-by: Morten Brørup

RE: [RFC 00/27] Add VDUSE support to Vhost library

2023-04-12 Thread Morten Brørup

> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Wednesday, 12 April 2023 17.28
> 
> Hi Ferruh,
> 
> On 4/12/23 13:33, Ferruh Yigit wrote:
> > On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >
> > Hi Maxime,
> >
> > Is this a replacement to the existing DPDK vDPA framework? What is the
> > plan for long term?
> >
> 
> No, this is not a replacement for DPDK vDPA framework.
> 
> We (Red Hat) don't have plans to support DPDK vDPA framework in our
> products, but there are still contribution to DPDK vDPA by several vDPA
> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
> to be deprecated soon.

Ferruh's question made me curious...

I don't know anything about VDUSE or vDPA, and don't use any of it, so consider 
me ignorant in this area.

Is VDUSE an alternative to the existing DPDK vDPA framework? What are the 
differences, e.g. in which cases would an application developer (or user) 
choose one or the other?

And if it is a better alternative, perhaps the documentation should mention 
that it is recommended over DPDK vDPA. Just like we started recommending 
alternatives to the KNI driver, so we could phase it out and eventually get rid 
of it.

> 
> Regards,
> Maxime

[PATCH] eventdev/timer: move buffer flush call

2023-04-12 Thread Erik Gabriel Carrillo

The SW event timer adapter attempts to flush its event buffer on every
adapter tick. If events remain in the buffer after the attempt, another
attempt to flush won't occur until the next adapter tick, which delays
the enqueue of those events to the event device unecessarily.

Move the buffer flush call so that it happens with every invocation of
the service function, rather than on every adapter tick, to avoid the
delay.

Fixes: cc7b73ea9e3b ("eventdev: add new software timer adapter")
Cc: sta...@dpdk.org

Signed-off-by: Erik Gabriel Carrillo 
---
 lib/eventdev/rte_event_timer_adapter.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/eventdev/rte_event_timer_adapter.c 
b/lib/eventdev/rte_event_timer_adapter.c
index 23eb1d4a7d..427c4c6287 100644
--- a/lib/eventdev/rte_event_timer_adapter.c
+++ b/lib/eventdev/rte_event_timer_adapter.c
@@ -855,17 +855,18 @@ swtim_service_func(void *arg)
 sw->n_expired_timers);
sw->n_expired_timers = 0;
 
-   event_buffer_flush(&sw->buffer,
-  adapter->data->event_dev_id,
-  adapter->data->event_port_id,
-  &nb_evs_flushed,
-  &nb_evs_invalid);
-
-   sw->stats.ev_enq_count += nb_evs_flushed;
-   sw->stats.ev_inv_count += nb_evs_invalid;
sw->stats.adapter_tick_count++;
}
 
+   event_buffer_flush(&sw->buffer,
+  adapter->data->event_dev_id,
+  adapter->data->event_port_id,
+  &nb_evs_flushed,
+  &nb_evs_invalid);
+
+   sw->stats.ev_enq_count += nb_evs_flushed;
+   sw->stats.ev_inv_count += nb_evs_invalid;
+
rte_event_maintain(adapter->data->event_dev_id,
   adapter->data->event_port_id, 0);
 
-- 
2.23.0

RE: [PATCH v2] eventdev/timer: fix timeout event wait behavior

2023-04-12 Thread Carrillo, Erik G

> -Original Message-
> From: Shijith Thotton 
> Sent: Tuesday, March 21, 2023 12:20 AM
> To: Carrillo, Erik G ; jer...@marvell.com
> Cc: Shijith Thotton ; dev@dpdk.org;
> pbhagavat...@marvell.com; sta...@dpdk.org
> Subject: [PATCH v2] eventdev/timer: fix timeout event wait behavior
> 
> Improved the accuracy and consistency of timeout event wait behavior by
> refactoring it. Previously, the delay function used for waiting could be
> inaccurate, leading to inconsistent results. This commit updates the wait
> behavior to use a timeout-based approach, enabling the wait for the exact
> number of timer ticks before proceeding.
> 
> The new function timeout_event_dequeue mimics the behavior of the
> tested systems closely. It dequeues timer expiry events until either the
> expected number of events have been dequeued or the specified time has
> elapsed. The WAIT_TICKS macro defines the waiting behavior based on the
> type of timer being used (software or hardware).
> 
> Fixes: d1f3385d0076 ("test: add event timer adapter auto-test")
> 
> Signed-off-by: Shijith Thotton 
Thanks for the update.

Acked-by: Erik Gabriel Carrillo

RE: [PATCH] doc: fix event timer adapter guide

2023-04-12 Thread Carrillo, Erik G

> -Original Message-
> From: pbhagavat...@marvell.com 
> Sent: Friday, April 7, 2023 3:14 AM
> To: jer...@marvell.com; Carrillo, Erik G 
> Cc: dev@dpdk.org; Pavan Nikhilesh 
> Subject: [PATCH] doc: fix event timer adapter guide
> 
> From: Pavan Nikhilesh 
> 
> Remove incorrect spec definition from programmers guide, it is applications
> responsibility to set ev.event_ptr to a valid value.
> 
> Fixes: 30e7fbd62839 ("doc: add event timer adapter guide")
> 
> Signed-off-by: Pavan Nikhilesh 
Acked-by: Erik Gabriel Carrillo

RE: [PATCH v5] enhance NUMA affinity heuristic

2023-04-12 Thread You, KaisenX




> -Original Message-
> From: You, KaisenX
> Sent: 2023年3月9日 9:58
> To: Thomas Monjalon 
> Cc: dev@dpdk.org; Zhou, YidingX ;
> david.march...@redhat.com; Matz, Olivier ;
> ferruh.yi...@amd.com; zhou...@loongson.cn; sta...@dpdk.org;
> Richardson, Bruce ; jer...@marvell.com;
> Burakov, Anatoly 
> Subject: RE: [PATCH v5] enhance NUMA affinity heuristic
> 
> 
> 
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: 2023年3月3日 22:07
> > To: Burakov, Anatoly ; You, KaisenX
> > 
> > Cc: dev@dpdk.org; Zhou, YidingX ;
> > david.march...@redhat.com; Matz, Olivier ;
> > ferruh.yi...@amd.com; zhou...@loongson.cn; sta...@dpdk.org;
> > Richardson, Bruce ; jer...@marvell.com
> > Subject: Re: [PATCH v5] enhance NUMA affinity heuristic
> >
> > I'm not comfortable with this patch.
> >
> > First, there is no comment in the code which helps to understand the logic.
> > Second, I'm afraid changing the value of the per-core variable
> > _socket_id may have an impact on some applications.
> >
Hi Thomas, I'm sorry to bother you again, but we can't think of a better 
solution for now,
would you please give me some suggestion, and then I will modify it accordingly.

> Thank you for your reply.
> First, about comments, I can submit a new patch to add comments to help
> understand.
> Second, if you do not change the value of the per-core variable_ socket_ id,
> /lib/eal/common/malloc_heap.c
> malloc_get_numa_socket(void)
> {
> const struct internal_config *conf = eal_get_internal_configuration();
> unsigned int socket_id = rte_socket_id();   // The return value of
> "rte_socket_id()" is 1
> unsigned int idx;
> 
> if (socket_id != (unsigned int)SOCKET_ID_ANY)
> return socket_id;//so return here
> 
> This will cause return here, This function returns the socket_id of 
> unallocated
> memory.
> 
> If you have a better solution, I can modify it.
> > 16/02/2023 03:50, You, KaisenX:
> > > From: Burakov, Anatoly 
> > > > On 2/1/2023 12:20 PM, Kaisen You wrote:
> > > > > Trying to allocate memory on the first detected numa node has
> > > > > less chance to find some memory actually available rather than
> > > > > on the main lcore numa node (especially when the DPDK
> > > > > application is started only on one numa node).
> > > > >
> > > > > Fixes: 705356f0811f ("eal: simplify control thread creation")
> > > > > Fixes: bb0bd346d5c1 ("eal: suggest using --lcores option")
> > > > > Cc: sta...@dpdk.org
> > > > >
> > > > > Signed-off-by: David Marchand 
> > > > > Signed-off-by: Kaisen You 
> > > > > ---
> > > > > Changes since v4:
> > > > > - mod the patch title,
> > > > >
> > > > > Changes since v3:
> > > > > - add the assignment of socket_id in thread initialization,
> > > > >
> > > > > Changes since v2:
> > > > > - add uncommitted local change and fix compilation,
> > > > >
> > > > > Changes since v1:
> > > > > - accomodate for configurations with main lcore running on multiples
> > > > >physical cores belonging to different numa,
> > > > > ---
> > > > >   lib/eal/common/eal_common_thread.c | 1 +
> > > > >   lib/eal/common/malloc_heap.c   | 4 
> > > > >   2 files changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/lib/eal/common/eal_common_thread.c
> > > > > b/lib/eal/common/eal_common_thread.c
> > > > > index 38d83a6885..21bff971f8 100644
> > > > > --- a/lib/eal/common/eal_common_thread.c
> > > > > +++ b/lib/eal/common/eal_common_thread.c
> > > > > @@ -251,6 +251,7 @@ static void *ctrl_thread_init(void *arg)
> > > > >   void *routine_arg = params->arg;
> > > > >
> > > > >   __rte_thread_init(rte_lcore_id(), cpuset);
> > > > > + RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
> > > > >   params->ret =
> > > > > rte_thread_set_affinity_by_id(rte_thread_self(),
> > > > cpuset);
> > > > >   if (params->ret != 0) {
> > > > >   __atomic_store_n(¶ms->ctrl_thread_status,
> > > > > diff --git a/lib/eal/common/malloc_heap.c
> > > > > b/lib/eal/common/malloc_heap.c index d7c410b786..3ee19aee15
> > 100644
> > > > > --- a/lib/eal/common/malloc_heap.c
> > > > > +++ b/lib/eal/common/malloc_heap.c
> > > > > @@ -717,6 +717,10 @@ malloc_get_numa_socket(void)
> > > > >   return socket_id;
> > > > >   }
> > > > >
> > > > > + socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
> > > > > + if (socket_id != (unsigned int)SOCKET_ID_ANY)
> > > > > + return socket_id;
> > > > > +
> > > > >   return rte_socket_id_by_idx(0);
> > > > >   }
> > > > >
> > > >
> > > > I may be lacking context, but I don't quite get the suggested change.
> > > >  From what I understand, the original has to do with assigning
> > > > lcore cpusets in such a way that an lcore ends up having two
> > > > socket ID's (because it's been assigned to CPU's on different
> > > > sockets). Why is this
> > allowed in the first place?
> > > > It seems like a user error to me, as it breaks many of the
> > > > fundamental assumptions D

[PATCH v2] net/mlx5: fix lro update tcp header cksum error

2023-04-12 Thread jiangheng (G)

The variable csum is the sum of three 16 bits integers, the max value
is 0x2FFFD. The corner case of sum of 3 is 0x1 gives the wrong
result: 0x1 + 0x = 0x1, the upper 16 bits are not 0.
It must be folded again to ensure that the upper 16 bits are 0.

Fixes: e4c2a16eb1de ("net/mlx5: handle LRO packets in Rx queue")
Cc: sta...@dpdk.org

Signed-off-by: jiangheng 
---
 drivers/net/mlx5/mlx5_rx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index a2be523e9e..ae537dfffa 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -1090,6 +1090,7 @@ mlx5_lro_update_tcp_hdr(struct rte_tcp_hdr 
*__rte_restrict tcp,
tcp->cksum = 0;
csum += rte_raw_cksum(tcp, (tcp->data_off >> 4) * 4);
csum = ((csum & 0x) >> 16) + (csum & 0x);
+   csum = ((csum & 0x) >> 16) + (csum & 0x);
csum = (~csum) & 0x;
if (csum == 0)
csum = 0x;
-- 
2.27.0

> Hi,  Jiangheng
>
> You are right, the corner case of sum of 3 is 0x1 gives the wrong result.
> Could you,  please, format the patch according to the rules and send v2 ?
> - add Fixes: tag with reference to appropriate commit
> - add Cc: sta...@dpdk.org
> - fix typos in commit message - capitalize sentences, add trailing points, 
> etc.
>
> With best regards,
> Slava
>
> > From: jiangheng (G) 
> > Sent: среда, 12 апреля 2023 г. 14:39
> > To: dev@dpdk.org; Matan Azrad ; Slava Ovsiienko 
> > 
> > Subject: [PATCH] net/mlx5: fix lro update tcp header cksum error
> > 
> > csum is the sum of three 16 bits value it must be folded twice to 
> > ensure that the upper 16 bits are 0
> > ---
> >  drivers/net/mlx5/mlx5_rx.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c 
> > index a2be523e9e..ae537dfffa 100644
> > --- a/drivers/net/mlx5/mlx5_rx.c
> > +++ b/drivers/net/mlx5/mlx5_rx.c
> > @@ -1090,6 +1090,7 @@ mlx5_lro_update_tcp_hdr(struct rte_tcp_hdr 
> > *__rte_restrict tcp,
> > tcp->cksum = 0;
> > csum += rte_raw_cksum(tcp, (tcp->data_off >> 4) * 4);
> > csum = ((csum & 0x) >> 16) + (csum & 0x);
> > +   csum = ((csum & 0x) >> 16) + (csum & 0x);
> > csum = (~csum) & 0x;
> > if (csum == 0)
> > csum = 0x;
> > --
> > 2.27.0

[Bug 1215] Hotplug sigbus handler is using signal unsafe calls

2023-04-12 Thread bugzilla

https://bugs.dpdk.org/show_bug.cgi?id=1215

Bug ID: 1215
   Summary: Hotplug sigbus handler is using signal unsafe calls
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: minor
  Priority: Normal
 Component: core
  Assignee: dev@dpdk.org
  Reporter: step...@networkplumber.org
  Target Milestone: ---

The sigbus_handler() in lib/eal/linux/eal_dev.c is using many direct library
calls and DPDK API's that are not safe to be used in a signal handler.
Unsafe routines include RTE_LOG() uses sprintf and syslog. etc..

It is also possible that failure_handle_lock could deadlock if called during a
change of state by other thread.
The safer way would be to use signalfd() and a monitoring thread.

Handling signals in Linux and BSD is hard.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH] dmadev: add tracepoints

2023-04-12 Thread fengchengwen

On 2023/4/12 17:52, Bruce Richardson wrote:
> On Wed, Apr 12, 2023 at 02:48:08AM +, Chengwen Feng wrote:
>> Add tracepoints at important APIs for tracing support.
>>
>> Signed-off-by: Chengwen Feng 
>> ---
>>  lib/dmadev/meson.build   |   2 +-
>>  lib/dmadev/rte_dmadev.c  |  39 ++--
>>  lib/dmadev/rte_dmadev.h  |  56 ---
>>  lib/dmadev/rte_dmadev_trace.h| 133 +++
>>  lib/dmadev/rte_dmadev_trace_fp.h | 113 +++
>>  lib/dmadev/rte_dmadev_trace_points.c |  59 
>>  lib/dmadev/version.map   |  10 ++
>>  7 files changed, 391 insertions(+), 21 deletions(-)
>>  create mode 100644 lib/dmadev/rte_dmadev_trace.h
>>  create mode 100644 lib/dmadev/rte_dmadev_trace_fp.h
>>  create mode 100644 lib/dmadev/rte_dmadev_trace_points.c
>>
> For completeness, do you have any numbers for the performance impact (if
> any) to the DMA dataplane APIs with this tracing added?

No, because:

The dataplane trace points was disable default (unless the RTE_ENABLE_TRACE_FP 
is set),
so there will no trace-points code default.

So I think it will not impact performance default.

> 
> /Bruce
> 
> .
>

[PATCH 0/4] support UDP fragmentation offload

2023-04-12 Thread Zhichao Zeng

This patch set supports UDP fragmentation offload for ice and iavf.

Zhichao Zeng (4):
  net: calculate correct UDP pseudo header for UFO
  app/testpmd: support UFO in checksum engine
  net/ice: enable UDP fragmentation offload
  net/iavf: enable UDP fragmentation offload

 app/test-pmd/csumonly.c  | 17 -
 drivers/net/iavf/iavf_rxtx.c |  2 +-
 drivers/net/iavf/iavf_rxtx.h |  2 ++
 drivers/net/ice/ice_rxtx.c   | 15 ---
 lib/net/rte_ip.h |  4 ++--
 lib/net/rte_net.h|  5 +++--
 6 files changed, 32 insertions(+), 13 deletions(-)

-- 
2.25.1

[PATCH 1/4] net: calculate correct UDP pseudo header for UFO

2023-04-12 Thread Zhichao Zeng

This commit calculates the correct pseudo header for the
UDP fragmentation offload by adding UDP_SEG flag.

Signed-off-by: Zhichao Zeng 
---
 lib/net/rte_ip.h  | 4 ++--
 lib/net/rte_net.h | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index a310e9d498..8073c8e889 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -345,7 +345,7 @@ rte_ipv4_phdr_cksum(const struct rte_ipv4_hdr *ipv4_hdr, 
uint64_t ol_flags)
psd_hdr.dst_addr = ipv4_hdr->dst_addr;
psd_hdr.zero = 0;
psd_hdr.proto = ipv4_hdr->next_proto_id;
-   if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+   if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
psd_hdr.len = 0;
} else {
l3_len = rte_be_to_cpu_16(ipv4_hdr->total_length);
@@ -596,7 +596,7 @@ rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, 
uint64_t ol_flags)
} psd_hdr;
 
psd_hdr.proto = (uint32_t)(ipv6_hdr->proto << 24);
-   if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+   if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
psd_hdr.len = 0;
} else {
psd_hdr.len = ipv6_hdr->payload_len;
diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
index 56611fc8f9..ef3ff4c6fd 100644
--- a/lib/net/rte_net.h
+++ b/lib/net/rte_net.h
@@ -121,7 +121,7 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, 
uint64_t ol_flags)
 * no offloads are requested.
 */
if (!(ol_flags & (RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_L4_MASK | 
RTE_MBUF_F_TX_TCP_SEG |
- RTE_MBUF_F_TX_OUTER_IP_CKSUM)))
+   RTE_MBUF_F_TX_UDP_SEG | 
RTE_MBUF_F_TX_OUTER_IP_CKSUM)))
return 0;
 
if (ol_flags & (RTE_MBUF_F_TX_OUTER_IPV4 | RTE_MBUF_F_TX_OUTER_IPV6)) {
@@ -154,7 +154,8 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, 
uint64_t ol_flags)
ipv4_hdr->hdr_checksum = 0;
}
 
-   if ((ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_UDP_CKSUM) {
+   if ((ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_UDP_CKSUM ||
+   (ol_flags & RTE_MBUF_F_TX_UDP_SEG)) {
if (ol_flags & RTE_MBUF_F_TX_IPV4) {
udp_hdr = (struct rte_udp_hdr *)((char *)ipv4_hdr +
m->l3_len);
-- 
2.25.1

[PATCH 2/4] app/testpmd: support UFO in checksum engine

2023-04-12 Thread Zhichao Zeng

This commit supports UFO for both non-tunnel and tunneled packets.

Similar to TSO, the command "tso set  " or
"tunnel_tso set  " is used to enable UFO,
and the following conditions need to be met:
a. The NIC supports UFO;
b. For enabling UFO in tunnel packets, "csum parse_tunnel" must be set to
   recognize tunnel packets;
c. For IPv4 tunnel packets, "csum set outer-ip" must be set to hw, because
   UFO changes the total_len of the external IP header and the checksum
   calculated by SW becomes incorrect; This is not necessary for IPv6
   tunnel packets since there's no checksum field to fill in.

Signed-off-by: Zhichao Zeng 
---
 app/test-pmd/csumonly.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index fc85c22a77..062eb09b36 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -505,7 +505,9 @@ process_inner_cksums(void *l3_hdr, const struct 
testpmd_offload_info *info,
udp_hdr = (struct rte_udp_hdr *)((char *)l3_hdr + info->l3_len);
/* do not recalculate udp cksum if it was 0 */
if (udp_hdr->dgram_cksum != 0) {
-   if (tx_offloads & RTE_ETH_TX_OFFLOAD_UDP_CKSUM) {
+   if (tso_segsz)
+   ol_flags |= RTE_MBUF_F_TX_UDP_SEG;
+   else if (tx_offloads & RTE_ETH_TX_OFFLOAD_UDP_CKSUM) {
ol_flags |= RTE_MBUF_F_TX_UDP_CKSUM;
} else {
if (info->is_tunnel)
@@ -590,8 +592,10 @@ process_outer_cksums(void *outer_l3_hdr, struct 
testpmd_offload_info *info,
udp_hdr = (struct rte_udp_hdr *)
((char *)outer_l3_hdr + info->outer_l3_len);
 
-   if (tso_enabled)
+   if (tso_enabled && info->l4_proto == IPPROTO_TCP)
ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
+   else if (tso_enabled && info->l4_proto == IPPROTO_UDP)
+   ol_flags |= RTE_MBUF_F_TX_UDP_SEG;
 
/* Skip SW outer UDP checksum generation if HW supports it */
if (tx_offloads & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM) {
@@ -991,7 +995,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
if (info.is_tunnel == 1) {
tx_ol_flags |= process_outer_cksums(outer_l3_hdr, &info,
tx_offloads,
-   !!(tx_ol_flags & RTE_MBUF_F_TX_TCP_SEG),
+   !!(tx_ol_flags & (RTE_MBUF_F_TX_TCP_SEG 
|
+   RTE_MBUF_F_TX_UDP_SEG)),
m);
}
 
@@ -1083,11 +1088,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
m->outer_l2_len,
m->outer_l3_len);
if (info.tunnel_tso_segsz != 0 &&
-   (m->ol_flags & 
RTE_MBUF_F_TX_TCP_SEG))
+   (m->ol_flags & 
(RTE_MBUF_F_TX_TCP_SEG |
+   RTE_MBUF_F_TX_UDP_SEG)))
printf("tx: m->tso_segsz=%d\n",
m->tso_segsz);
} else if (info.tso_segsz != 0 &&
-   (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG))
+   (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
+   RTE_MBUF_F_TX_UDP_SEG)))
printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
rte_get_tx_ol_flag_list(m->ol_flags, buf, sizeof(buf));
printf("tx: flags=%s", buf);
-- 
2.25.1

[PATCH 3/4] net/ice: enable UDP fragmentation offload

2023-04-12 Thread Zhichao Zeng

This commit enables transmit segmentation offload for UDP, including both
non-tunneled and tunneled packets.

The command "tso set  " or
"tunnel_tso set  " is used to enable UFO.

Signed-off-by: Zhichao Zeng 
---
 drivers/net/ice/ice_rxtx.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 0ea0045836..ed4d27389a 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -12,6 +12,7 @@
 #define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM | \
RTE_MBUF_F_TX_L4_MASK |  \
RTE_MBUF_F_TX_TCP_SEG |  \
+   RTE_MBUF_F_TX_UDP_SEG |  \
RTE_MBUF_F_TX_OUTER_IP_CKSUM)
 
 /**
@@ -2767,6 +2768,13 @@ ice_txd_enable_checksum(uint64_t ol_flags,
return;
}
 
+   if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
+   *td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+   *td_offset |= (tx_offload.l4_len >> 2) <<
+ ICE_TX_DESC_LEN_L4_LEN_S;
+   return;
+   }
+
/* Enable L4 checksum offloads */
switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
case RTE_MBUF_F_TX_TCP_CKSUM:
@@ -2858,6 +2866,7 @@ static inline uint16_t
 ice_calc_context_desc(uint64_t flags)
 {
static uint64_t mask = RTE_MBUF_F_TX_TCP_SEG |
+   RTE_MBUF_F_TX_UDP_SEG |
RTE_MBUF_F_TX_QINQ |
RTE_MBUF_F_TX_OUTER_IP_CKSUM |
RTE_MBUF_F_TX_TUNNEL_MASK |
@@ -2966,7 +2975,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
 * the mbuf data size exceeds max data size that hw allows
 * per tx desc.
 */
-   if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+   if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
 nb_ctx);
else
@@ -3026,7 +3035,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
txe->mbuf = NULL;
}
 
-   if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+   if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | 
RTE_MBUF_F_TX_UDP_SEG))
cd_type_cmd_tso_mss |=
ice_set_tso_ctx(tx_pkt, tx_offload);
else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
@@ -3066,7 +3075,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
slen = m_seg->data_len;
buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
-   while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
+   while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | 
RTE_MBUF_F_TX_UDP_SEG)) &&
unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
txd->cmd_type_offset_bsz =
-- 
2.25.1

[PATCH 4/4] net/iavf: enable UDP fragmentation offload

2023-04-12 Thread Zhichao Zeng

This commit enables transmit segmentation offload for UDP, including both
non-tunneled and tunneled packets.

The command "tso set  " or
"tunnel_tso set  " is used to enable UFO.

Signed-off-by: Zhichao Zeng 
---
 drivers/net/iavf/iavf_rxtx.c | 2 +-
 drivers/net/iavf/iavf_rxtx.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index b1d0fbceb6..8eca8aba3e 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -3622,7 +3622,7 @@ iavf_prep_pkts(__rte_unused void *tx_queue, struct 
rte_mbuf **tx_pkts,
ol_flags = m->ol_flags;
 
/* Check condition for nb_segs > IAVF_TX_MAX_MTU_SEG. */
-   if (!(ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+   if (!(ol_flags & (RTE_MBUF_F_TX_TCP_SEG | 
RTE_MBUF_F_TX_UDP_SEG))) {
if (m->nb_segs > IAVF_TX_MAX_MTU_SEG) {
rte_errno = EINVAL;
return i;
diff --git a/drivers/net/iavf/iavf_rxtx.h b/drivers/net/iavf/iavf_rxtx.h
index 09e2127db0..5096868d87 100644
--- a/drivers/net/iavf/iavf_rxtx.h
+++ b/drivers/net/iavf/iavf_rxtx.h
@@ -73,6 +73,7 @@
RTE_MBUF_F_TX_IP_CKSUM | \
RTE_MBUF_F_TX_L4_MASK |  \
RTE_MBUF_F_TX_TCP_SEG |  \
+   RTE_MBUF_F_TX_UDP_SEG |  \
RTE_MBUF_F_TX_OUTER_IP_CKSUM |   \
RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
 
@@ -85,6 +86,7 @@
RTE_MBUF_F_TX_IP_CKSUM | \
RTE_MBUF_F_TX_L4_MASK |  \
RTE_MBUF_F_TX_TCP_SEG |  \
+   RTE_MBUF_F_TX_UDP_SEG |  \
RTE_MBUF_F_TX_TUNNEL_MASK | \
RTE_MBUF_F_TX_OUTER_IP_CKSUM |  \
RTE_MBUF_F_TX_OUTER_UDP_CKSUM | \
-- 
2.25.1

RE: 21.11.4 patches review and test

2023-04-12 Thread Xu, HailinX

> -Original Message-
> From: Kevin Traynor 
> Sent: Thursday, April 6, 2023 7:38 PM
> To: sta...@dpdk.org
> Cc: dev@dpdk.org; Abhishek Marathe ;
> Ali Alnubani ; Walker, Benjamin
> ; David Christensen ;
> Hemant Agrawal ; Stokes, Ian
> ; Jerin Jacob ; Mcnamara, John
> ; Ju-Hyoung Lee ; Kevin
> Traynor ; Luca Boccassi ; Pei
> Zhang ; Xu, Qian Q ; Raslan
> Darawsheh ; Thomas Monjalon
> ; yangh...@redhat.com; Peng, Yuan
> ; Chen, Zhaoyan 
> Subject: 21.11.4 patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for stable release 21.11.4.
> 
> The planned date for the final release is 25th April.
> 
> Please help with testing and validation of your use cases and report any
> issues/results with reply-all to this mail. For the final release the fixes 
> and
> reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v21.11.4-rc1
> 
> These patches are located at branch 21.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Kevin

HI All,

Update the test status for Intel part. Till now dpdk21.11.4-rc1 validation test 
rate is 85%. No critical issue is found.
2 new bugs are found, 1 new issue is under confirming by Intel Dev.
New bugs:   --20.11.8-rc1 also has these two issues
  1. pvp_qemu_multi_paths_port_restart:perf_pvp_qemu_vector_rx_mac: performance 
drop about 23.5% when send small packets 
https://bugs.dpdk.org/show_bug.cgi?id=1212-- no fix yet
  2. some of the virtio tests are failing:-- Intel dev is under 
investigating
# Basic Intel(R) NIC testing
* Build & CFLAG compile: cover the build test combination with latest GCC/Clang 
version and the popular OS revision such as
  Ubuntu20.04, Ubuntu22.04, Fedora35, Fedora37, RHEL8.6, RHEL8.4, FreeBSD13.1, 
SUSE15, CentOS7.9, etc.
- All test done. No new dpdk issue is found.
* PF(i40e, ixgbe): test scenarios including RTE_FLOW/TSO/Jumboframe/checksum 
offload/VLAN/VXLAN, etc. 
- All test done. No new dpdk issue is found.
* VF(i40e, ixgbe): test scenarios including VF-RTE_FLOW/TSO/Jumboframe/checksum 
offload/VLAN/VXLAN, etc.
- All test done. No new dpdk issue is found.
* PF/VF(ice): test scenarios including Switch features/Package Management/Flow 
Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc.
- All test done. No new dpdk issue is found.
* Intel NIC single core/NIC performance: test scenarios including PF/VF single 
core performance test, etc.
- All test done. No new dpdk issue is found.
* IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - 
QAT&SW/FIB library, etc.
- On going.

# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as 
PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE 
ESXI 8.0, etc.
- All test done. found bug1.
* Cryptodev: 
  *Function test: test scenarios including Cryptodev API testing/CompressDev 
ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
- Execution rate is 90%. found bug2.
  *Performance test: test scenarios including Thoughput Performance/Cryptodev 
Latency, etc.
- All test done. No new dpdk issue is found.

Regards,
Xu, Hailin

[PATCH 00/10] gve PMD enhancement

2023-04-12 Thread Junfeng Guo

This patch set includs two main enhancements for gve PMD:
 - support basic data path with DQO queue format
 - support jumbo frame with GQI queue format

This patch set is based on this:
patchwork.dpdk.org/project/dpdk/list/?series=27653&state=*

Junfeng Guo (10):
  net/gve: add Tx queue setup for DQO
  net/gve: add Rx queue setup for DQO
  net/gve: support device start and close for DQO
  net/gve: support queue release and stop for DQO
  net/gve: support basic Tx data path for DQO
  net/gve: support basic Rx data path for DQO
  net/gve: support basic stats for DQO
  net/gve: enable Tx checksum offload for DQO
  net/gve: add maintainers for GVE
  net/gve: support jumbo frame for GQI

 MAINTAINERS  |   3 +
 drivers/net/gve/gve_ethdev.c |  88 +++-
 drivers/net/gve/gve_ethdev.h |  69 +-
 drivers/net/gve/gve_rx.c | 140 +
 drivers/net/gve/gve_rx_dqo.c | 353 +++
 drivers/net/gve/gve_tx.c |   3 +
 drivers/net/gve/gve_tx_dqo.c | 393 +++
 drivers/net/gve/meson.build  |   2 +
 8 files changed, 1005 insertions(+), 46 deletions(-)
 create mode 100644 drivers/net/gve/gve_rx_dqo.c
 create mode 100644 drivers/net/gve/gve_tx_dqo.c

-- 
2.34.1

[PATCH 01/10] net/gve: add Tx queue setup for DQO

2023-04-12 Thread Junfeng Guo

Add support for tx_queue_setup_dqo ops.

DQO format has submission and completion queue pair for each Tx/Rx
queue. Note that with DQO format all descriptors and doorbells, as
well as counters are written in little-endian.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c |  21 +++-
 drivers/net/gve/gve_ethdev.h |  27 -
 drivers/net/gve/gve_tx_dqo.c | 185 +++
 drivers/net/gve/meson.build  |   1 +
 4 files changed, 230 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/gve/gve_tx_dqo.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index cf28a4a3b7..90345b193d 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -298,6 +298,7 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
 
dev_info->default_txconf = (struct rte_eth_txconf) {
.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+   .tx_rs_thresh = GVE_DEFAULT_TX_RS_THRESH,
.offloads = 0,
};
 
@@ -528,6 +529,21 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
.xstats_get_names = gve_xstats_get_names,
 };
 
+static const struct eth_dev_ops gve_eth_dev_ops_dqo = {
+   .dev_configure= gve_dev_configure,
+   .dev_start= gve_dev_start,
+   .dev_stop = gve_dev_stop,
+   .dev_close= gve_dev_close,
+   .dev_infos_get= gve_dev_info_get,
+   .tx_queue_setup   = gve_tx_queue_setup_dqo,
+   .link_update  = gve_link_update,
+   .stats_get= gve_dev_stats_get,
+   .stats_reset  = gve_dev_stats_reset,
+   .mtu_set  = gve_dev_mtu_set,
+   .xstats_get   = gve_xstats_get,
+   .xstats_get_names = gve_xstats_get_names,
+};
+
 static void
 gve_free_counter_array(struct gve_priv *priv)
 {
@@ -770,8 +786,6 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
rte_be32_t *db_bar;
int err;
 
-   eth_dev->dev_ops = &gve_eth_dev_ops;
-
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
 
@@ -807,10 +821,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
return err;
 
if (gve_is_gqi(priv)) {
+   eth_dev->dev_ops = &gve_eth_dev_ops;
eth_dev->rx_pkt_burst = gve_rx_burst;
eth_dev->tx_pkt_burst = gve_tx_burst;
} else {
-   PMD_DRV_LOG(ERR, "DQO_RDA is not implemented and will be added 
in the future");
+   eth_dev->dev_ops = &gve_eth_dev_ops_dqo;
}
 
eth_dev->data->mac_addrs = &priv->dev_addr;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 0b825113f6..6c6defa045 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -28,7 +28,8 @@
 #define PCI_MSIX_FLAGS_QSIZE   0x07FF  /* Table size */
 
 #define GVE_DEFAULT_RX_FREE_THRESH  512
-#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_DEFAULT_TX_FREE_THRESH   32
+#define GVE_DEFAULT_TX_RS_THRESH 32
 #define GVE_TX_MAX_FREE_SZ  512
 
 #define GVE_MIN_BUF_SIZE   1024
@@ -53,6 +54,13 @@ union gve_tx_desc {
struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Tx desc for DQO format */
+union gve_tx_desc_dqo {
+   struct gve_tx_pkt_desc_dqo pkt;
+   struct gve_tx_tso_context_desc_dqo tso_ctx;
+   struct gve_tx_general_context_desc_dqo general_ctx;
+};
+
 /* Offload features */
 union gve_tx_offload {
uint64_t data;
@@ -100,8 +108,10 @@ struct gve_tx_queue {
uint32_t tx_tail;
uint16_t nb_tx_desc;
uint16_t nb_free;
+   uint16_t nb_used;
uint32_t next_to_clean;
uint16_t free_thresh;
+   uint16_t rs_thresh;
 
/* Only valid for DQO_QPL queue format */
uint16_t sw_tail;
@@ -128,7 +138,15 @@ struct gve_tx_queue {
struct gve_queue_resources *qres;
 
/* newly added for DQO */
+   volatile union gve_tx_desc_dqo *tx_ring;
+   struct gve_tx_compl_desc *compl_ring;
+   const struct rte_memzone *compl_ring_mz;
uint64_t compl_ring_phys_addr;
+   uint32_t complq_tail;
+   uint16_t sw_size;
+   uint8_t cur_gen_bit;
+   uint32_t last_desc_cleaned;
+   void **txqs;
 
/* Only valid for DQO_RDA queue format */
struct gve_tx_queue *complq;
@@ -342,4 +360,11 @@ gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts);
 uint16_t
 gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
 
+/* Below functions are used for DQO */
+
+int
+gve_tx_queue_setup_dqo(struct rte_eth_dev *dev, uint16_t queue_id,
+  uint16_t nb_desc, unsigned int socket_id,
+  const struct rte_eth_txconf *conf);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/

[PATCH 02/10] net/gve: add Rx queue setup for DQO

2023-04-12 Thread Junfeng Guo

Add support for rx_queue_setup_dqo ops.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c |   1 +
 drivers/net/gve/gve_ethdev.h |  11 +++
 drivers/net/gve/gve_rx_dqo.c | 156 +++
 drivers/net/gve/meson.build  |   1 +
 4 files changed, 169 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx_dqo.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 90345b193d..d387d7154b 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -535,6 +535,7 @@ static const struct eth_dev_ops gve_eth_dev_ops_dqo = {
.dev_stop = gve_dev_stop,
.dev_close= gve_dev_close,
.dev_infos_get= gve_dev_info_get,
+   .rx_queue_setup   = gve_rx_queue_setup_dqo,
.tx_queue_setup   = gve_tx_queue_setup_dqo,
.link_update  = gve_link_update,
.stats_get= gve_dev_stats_get,
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 6c6defa045..cb8cd62886 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -167,6 +167,7 @@ struct gve_rx_queue {
uint16_t nb_rx_desc;
uint16_t expected_seqno; /* the next expected seqno */
uint16_t free_thresh;
+   uint16_t nb_rx_hold;
uint32_t next_avail;
uint32_t nb_avail;
 
@@ -189,7 +190,12 @@ struct gve_rx_queue {
uint16_t rx_buf_len;
 
/* newly added for DQO */
+   volatile struct gve_rx_desc_dqo *rx_ring;
+   struct gve_rx_compl_desc_dqo *compl_ring;
+   const struct rte_memzone *compl_ring_mz;
uint64_t compl_ring_phys_addr;
+   uint8_t cur_gen_bit;
+   uint16_t bufq_tail;
 
/* Only valid for DQO_RDA queue format */
struct gve_rx_queue *bufq;
@@ -362,6 +368,11 @@ gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts);
 
 /* Below functions are used for DQO */
 
+int
+gve_rx_queue_setup_dqo(struct rte_eth_dev *dev, uint16_t queue_id,
+  uint16_t nb_desc, unsigned int socket_id,
+  const struct rte_eth_rxconf *conf,
+  struct rte_mempool *pool);
 int
 gve_tx_queue_setup_dqo(struct rte_eth_dev *dev, uint16_t queue_id,
   uint16_t nb_desc, unsigned int socket_id,
diff --git a/drivers/net/gve/gve_rx_dqo.c b/drivers/net/gve/gve_rx_dqo.c
new file mode 100644
index 00..c419c4dd2f
--- /dev/null
+++ b/drivers/net/gve/gve_rx_dqo.c
@@ -0,0 +1,156 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022-2023 Google LLC
+ * Copyright (c) 2022-2023 Intel Corporation
+ */
+
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static void
+gve_reset_rxq_dqo(struct gve_rx_queue *rxq)
+{
+   struct rte_mbuf **sw_ring;
+   uint32_t size, i;
+
+   if (rxq == NULL) {
+   PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+   return;
+   }
+
+   size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc_dqo);
+   for (i = 0; i < size; i++)
+   ((volatile char *)rxq->rx_ring)[i] = 0;
+
+   size = rxq->nb_rx_desc * sizeof(struct gve_rx_compl_desc_dqo);
+   for (i = 0; i < size; i++)
+   ((volatile char *)rxq->compl_ring)[i] = 0;
+
+   sw_ring = rxq->sw_ring;
+   for (i = 0; i < rxq->nb_rx_desc; i++)
+   sw_ring[i] = NULL;
+
+   rxq->bufq_tail = 0;
+   rxq->next_avail = 0;
+   rxq->nb_rx_hold = rxq->nb_rx_desc - 1;
+
+   rxq->rx_tail = 0;
+   rxq->cur_gen_bit = 1;
+}
+
+int
+gve_rx_queue_setup_dqo(struct rte_eth_dev *dev, uint16_t queue_id,
+  uint16_t nb_desc, unsigned int socket_id,
+  const struct rte_eth_rxconf *conf,
+  struct rte_mempool *pool)
+{
+   struct gve_priv *hw = dev->data->dev_private;
+   const struct rte_memzone *mz;
+   struct gve_rx_queue *rxq;
+   uint16_t free_thresh;
+   int err = 0;
+
+   if (nb_desc != hw->rx_desc_cnt) {
+   PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use 
hw nb_desc %u.",
+   hw->rx_desc_cnt);
+   }
+   nb_desc = hw->rx_desc_cnt;
+
+   /* Allocate the RX queue data structure. */
+   rxq = rte_zmalloc_socket("gve rxq",
+sizeof(struct gve_rx_queue),
+RTE_CACHE_LINE_SIZE,
+socket_id);
+   if (rxq == NULL) {
+   PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue 
structure");
+   return -ENOMEM;
+   }
+
+   /* check free_thresh here */
+   free_thresh = conf->rx_free_thresh ?
+   conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+   if (free_thresh >= nb_desc) {
+   PMD_DRV_LOG(ERR, "rx_free_thresh (%u

[PATCH 03/10] net/gve: support device start and close for DQO

2023-04-12 Thread Junfeng Guo

Add device start and close support for DQO.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c | 43 +++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index d387d7154b..fc60db63c5 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -78,6 +78,9 @@ gve_free_qpls(struct gve_priv *priv)
uint16_t nb_rxqs = priv->max_nb_rxq;
uint32_t i;
 
+   if (priv->queue_format != GVE_GQI_QPL_FORMAT)
+   return;
+
for (i = 0; i < nb_txqs + nb_rxqs; i++) {
if (priv->qpl[i].mz != NULL)
rte_memzone_free(priv->qpl[i].mz);
@@ -138,6 +141,41 @@ gve_refill_pages(struct gve_rx_queue *rxq)
return 0;
 }
 
+static int
+gve_refill_dqo(struct gve_rx_queue *rxq)
+{
+   struct rte_mbuf *nmb;
+   uint16_t i;
+   int diag;
+
+   diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], 
rxq->nb_rx_desc);
+   if (diag < 0) {
+   for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+   nmb = rte_pktmbuf_alloc(rxq->mpool);
+   if (!nmb)
+   break;
+   rxq->sw_ring[i] = nmb;
+   }
+   if (i < rxq->nb_rx_desc - 1)
+   return -ENOMEM;
+   }
+
+   for (i = 0; i < rxq->nb_rx_desc; i++) {
+   if (i == rxq->nb_rx_desc - 1)
+   break;
+   nmb = rxq->sw_ring[i];
+   rxq->rx_ring[i].buf_addr = 
rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
+   rxq->rx_ring[i].buf_id = rte_cpu_to_le_16(i);
+   }
+
+   rxq->nb_rx_hold = 0;
+   rxq->bufq_tail = rxq->nb_rx_desc - 1;
+
+   rte_write32(rxq->bufq_tail, rxq->qrx_tail);
+
+   return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -206,7 +244,10 @@ gve_dev_start(struct rte_eth_dev *dev)
 
rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
 
-   err = gve_refill_pages(rxq);
+   if (gve_is_gqi(priv))
+   err = gve_refill_pages(rxq);
+   else
+   err = gve_refill_dqo(rxq);
if (err) {
PMD_DRV_LOG(ERR, "Failed to refill for RX");
goto err_rx;
-- 
2.34.1

[PATCH 05/10] net/gve: support basic Tx data path for DQO

2023-04-12 Thread Junfeng Guo

Add basic Tx data path support for DQO.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c |   1 +
 drivers/net/gve/gve_ethdev.h |   4 +
 drivers/net/gve/gve_tx_dqo.c | 141 +++
 3 files changed, 146 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 340315a1a3..37bd8da12d 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -878,6 +878,7 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->tx_pkt_burst = gve_tx_burst;
} else {
eth_dev->dev_ops = &gve_eth_dev_ops_dqo;
+   eth_dev->tx_pkt_burst = gve_tx_burst_dqo;
}
 
eth_dev->data->mac_addrs = &priv->dev_addr;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index c8e1dd1435..1b8f511668 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -147,6 +147,7 @@ struct gve_tx_queue {
uint8_t cur_gen_bit;
uint32_t last_desc_cleaned;
void **txqs;
+   uint16_t re_cnt;
 
/* Only valid for DQO_RDA queue format */
struct gve_tx_queue *complq;
@@ -390,4 +391,7 @@ gve_stop_tx_queues_dqo(struct rte_eth_dev *dev);
 void
 gve_stop_rx_queues_dqo(struct rte_eth_dev *dev);
 
+uint16_t
+gve_tx_burst_dqo(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_tx_dqo.c b/drivers/net/gve/gve_tx_dqo.c
index ea6d5ff85e..2ea38a8f8e 100644
--- a/drivers/net/gve/gve_tx_dqo.c
+++ b/drivers/net/gve/gve_tx_dqo.c
@@ -6,6 +6,147 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_tx_clean_dqo(struct gve_tx_queue *txq)
+{
+   struct gve_tx_compl_desc *compl_ring;
+   struct gve_tx_compl_desc *compl_desc;
+   struct gve_tx_queue *aim_txq;
+   uint16_t nb_desc_clean;
+   struct rte_mbuf *txe;
+   uint16_t compl_tag;
+   uint16_t next;
+
+   next = txq->complq_tail;
+   compl_ring = txq->compl_ring;
+   compl_desc = &compl_ring[next];
+
+   if (compl_desc->generation != txq->cur_gen_bit)
+   return;
+
+   compl_tag = rte_le_to_cpu_16(compl_desc->completion_tag);
+
+   aim_txq = txq->txqs[compl_desc->id];
+
+   switch (compl_desc->type) {
+   case GVE_COMPL_TYPE_DQO_DESC:
+   /* need to clean Descs from last_cleaned to compl_tag */
+   if (aim_txq->last_desc_cleaned > compl_tag)
+   nb_desc_clean = aim_txq->nb_tx_desc - 
aim_txq->last_desc_cleaned +
+   compl_tag;
+   else
+   nb_desc_clean = compl_tag - aim_txq->last_desc_cleaned;
+   aim_txq->nb_free += nb_desc_clean;
+   aim_txq->last_desc_cleaned = compl_tag;
+   break;
+   case GVE_COMPL_TYPE_DQO_REINJECTION:
+   PMD_DRV_LOG(DEBUG, "GVE_COMPL_TYPE_DQO_REINJECTION !!!");
+   /* FALLTHROUGH */
+   case GVE_COMPL_TYPE_DQO_PKT:
+   txe = aim_txq->sw_ring[compl_tag];
+   if (txe != NULL) {
+   rte_pktmbuf_free_seg(txe);
+   txe = NULL;
+   }
+   break;
+   case GVE_COMPL_TYPE_DQO_MISS:
+   rte_delay_us_sleep(1);
+   PMD_DRV_LOG(DEBUG, "GVE_COMPL_TYPE_DQO_MISS ignored !!!");
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "unknown completion type.");
+   return;
+   }
+
+   next++;
+   if (next == txq->nb_tx_desc * DQO_TX_MULTIPLIER) {
+   next = 0;
+   txq->cur_gen_bit ^= 1;
+   }
+
+   txq->complq_tail = next;
+}
+
+uint16_t
+gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+   struct gve_tx_queue *txq = tx_queue;
+   volatile union gve_tx_desc_dqo *txr;
+   volatile union gve_tx_desc_dqo *txd;
+   struct rte_mbuf **sw_ring;
+   struct rte_mbuf *tx_pkt;
+   uint16_t mask, sw_mask;
+   uint16_t nb_to_clean;
+   uint16_t nb_tx = 0;
+   uint16_t nb_used;
+   uint16_t tx_id;
+   uint16_t sw_id;
+
+   sw_ring = txq->sw_ring;
+   txr = txq->tx_ring;
+
+   mask = txq->nb_tx_desc - 1;
+   sw_mask = txq->sw_size - 1;
+   tx_id = txq->tx_tail;
+   sw_id = txq->sw_tail;
+
+   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+   tx_pkt = tx_pkts[nb_tx];
+
+   if (txq->nb_free <= txq->free_thresh) {
+   nb_to_clean = DQO_TX_MULTIPLIER * txq->rs_thresh;
+   while (nb_to_clean--)
+   gve_tx_clean_dqo(txq);
+   }
+
+   if (txq->nb_free < tx_pkt->nb_segs)
+   break;
+
+   nb_used = tx_pkt->nb_segs;
+
+   d

[PATCH 04/10] net/gve: support queue release and stop for DQO

2023-04-12 Thread Junfeng Guo

Add support for queue operations:
 - gve_tx_queue_release_dqo
 - gve_rx_queue_release_dqo
 - gve_stop_tx_queues_dqo
 - gve_stop_rx_queues_dqo

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c | 18 +---
 drivers/net/gve/gve_ethdev.h | 12 
 drivers/net/gve/gve_rx.c |  3 ++
 drivers/net/gve/gve_rx_dqo.c | 57 
 drivers/net/gve/gve_tx.c |  3 ++
 drivers/net/gve/gve_tx_dqo.c | 55 ++
 6 files changed, 144 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index fc60db63c5..340315a1a3 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -292,11 +292,19 @@ gve_dev_close(struct rte_eth_dev *dev)
PMD_DRV_LOG(ERR, "Failed to stop dev.");
}
 
-   for (i = 0; i < dev->data->nb_tx_queues; i++)
-   gve_tx_queue_release(dev, i);
+   if (gve_is_gqi(priv)) {
+   for (i = 0; i < dev->data->nb_tx_queues; i++)
+   gve_tx_queue_release(dev, i);
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++)
+   gve_rx_queue_release(dev, i);
+   } else {
+   for (i = 0; i < dev->data->nb_tx_queues; i++)
+   gve_tx_queue_release_dqo(dev, i);
 
-   for (i = 0; i < dev->data->nb_rx_queues; i++)
-   gve_rx_queue_release(dev, i);
+   for (i = 0; i < dev->data->nb_rx_queues; i++)
+   gve_rx_queue_release_dqo(dev, i);
+   }
 
gve_free_qpls(priv);
rte_free(priv->adminq);
@@ -578,6 +586,8 @@ static const struct eth_dev_ops gve_eth_dev_ops_dqo = {
.dev_infos_get= gve_dev_info_get,
.rx_queue_setup   = gve_rx_queue_setup_dqo,
.tx_queue_setup   = gve_tx_queue_setup_dqo,
+   .rx_queue_release = gve_rx_queue_release_dqo,
+   .tx_queue_release = gve_tx_queue_release_dqo,
.link_update  = gve_link_update,
.stats_get= gve_dev_stats_get,
.stats_reset  = gve_dev_stats_reset,
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index cb8cd62886..c8e1dd1435 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -378,4 +378,16 @@ gve_tx_queue_setup_dqo(struct rte_eth_dev *dev, uint16_t 
queue_id,
   uint16_t nb_desc, unsigned int socket_id,
   const struct rte_eth_txconf *conf);
 
+void
+gve_tx_queue_release_dqo(struct rte_eth_dev *dev, uint16_t qid);
+
+void
+gve_rx_queue_release_dqo(struct rte_eth_dev *dev, uint16_t qid);
+
+void
+gve_stop_tx_queues_dqo(struct rte_eth_dev *dev);
+
+void
+gve_stop_rx_queues_dqo(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 8d8f94efff..3dd3f578f9 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -359,6 +359,9 @@ gve_stop_rx_queues(struct rte_eth_dev *dev)
uint16_t i;
int err;
 
+   if (!gve_is_gqi(hw))
+   return gve_stop_rx_queues_dqo(dev);
+
err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
if (err != 0)
PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
diff --git a/drivers/net/gve/gve_rx_dqo.c b/drivers/net/gve/gve_rx_dqo.c
index c419c4dd2f..7f58844839 100644
--- a/drivers/net/gve/gve_rx_dqo.c
+++ b/drivers/net/gve/gve_rx_dqo.c
@@ -7,6 +7,38 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_release_rxq_mbufs_dqo(struct gve_rx_queue *rxq)
+{
+   uint16_t i;
+
+   for (i = 0; i < rxq->nb_rx_desc; i++) {
+   if (rxq->sw_ring[i]) {
+   rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+   rxq->sw_ring[i] = NULL;
+   }
+   }
+
+   rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release_dqo(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct gve_rx_queue *q = dev->data->rx_queues[qid];
+
+   if (q == NULL)
+   return;
+
+   gve_release_rxq_mbufs_dqo(q);
+   rte_free(q->sw_ring);
+   rte_memzone_free(q->compl_ring_mz);
+   rte_memzone_free(q->mz);
+   rte_memzone_free(q->qres_mz);
+   q->qres = NULL;
+   rte_free(q);
+}
+
 static void
 gve_reset_rxq_dqo(struct gve_rx_queue *rxq)
 {
@@ -56,6 +88,12 @@ gve_rx_queue_setup_dqo(struct rte_eth_dev *dev, uint16_t 
queue_id,
}
nb_desc = hw->rx_desc_cnt;
 
+   /* Free memory if needed */
+   if (dev->data->rx_queues[queue_id]) {
+   gve_rx_queue_release_dqo(dev, queue_id);
+   dev->data->rx_queues[queue_id] = NULL;
+   }
+
/* Allocate the RX queue data structure. */
rxq = rte_zmalloc_socket("gve rxq",

[PATCH 06/10] net/gve: support basic Rx data path for DQO

2023-04-12 Thread Junfeng Guo

Add basic Rx data path support for DQO.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c |   1 +
 drivers/net/gve/gve_ethdev.h |   3 +
 drivers/net/gve/gve_rx_dqo.c | 128 +++
 3 files changed, 132 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 37bd8da12d..a532b8a93a 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -878,6 +878,7 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->tx_pkt_burst = gve_tx_burst;
} else {
eth_dev->dev_ops = &gve_eth_dev_ops_dqo;
+   eth_dev->rx_pkt_burst = gve_rx_burst_dqo;
eth_dev->tx_pkt_burst = gve_tx_burst_dqo;
}
 
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 1b8f511668..617bb55a85 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -391,6 +391,9 @@ gve_stop_tx_queues_dqo(struct rte_eth_dev *dev);
 void
 gve_stop_rx_queues_dqo(struct rte_eth_dev *dev);
 
+uint16_t
+gve_rx_burst_dqo(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
 uint16_t
 gve_tx_burst_dqo(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
 
diff --git a/drivers/net/gve/gve_rx_dqo.c b/drivers/net/gve/gve_rx_dqo.c
index 7f58844839..d0eaea9c24 100644
--- a/drivers/net/gve/gve_rx_dqo.c
+++ b/drivers/net/gve/gve_rx_dqo.c
@@ -7,6 +7,134 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_rx_refill_dqo(struct gve_rx_queue *rxq)
+{
+   volatile struct gve_rx_desc_dqo *rx_buf_ring;
+   volatile struct gve_rx_desc_dqo *rx_buf_desc;
+   struct rte_mbuf *nmb[rxq->free_thresh];
+   uint16_t nb_refill = rxq->free_thresh;
+   uint16_t nb_desc = rxq->nb_rx_desc;
+   uint16_t next_avail = rxq->bufq_tail;
+   struct rte_eth_dev *dev;
+   uint64_t dma_addr;
+   uint16_t delta;
+   int i;
+
+   if (rxq->nb_rx_hold < rxq->free_thresh)
+   return;
+
+   rx_buf_ring = rxq->rx_ring;
+   delta = nb_desc - next_avail;
+   if (unlikely(delta < nb_refill)) {
+   if (likely(rte_pktmbuf_alloc_bulk(rxq->mpool, nmb, delta) == 
0)) {
+   for (i = 0; i < delta; i++) {
+   rx_buf_desc = &rx_buf_ring[next_avail + i];
+   rxq->sw_ring[next_avail + i] = nmb[i];
+   dma_addr = 
rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb[i]));
+   rx_buf_desc->header_buf_addr = 0;
+   rx_buf_desc->buf_addr = dma_addr;
+   }
+   nb_refill -= delta;
+   next_avail = 0;
+   rxq->nb_rx_hold -= delta;
+   } else {
+   dev = &rte_eth_devices[rxq->port_id];
+   dev->data->rx_mbuf_alloc_failed += nb_desc - next_avail;
+   PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u 
queue_id=%u",
+   rxq->port_id, rxq->queue_id);
+   return;
+   }
+   }
+
+   if (nb_desc - next_avail >= nb_refill) {
+   if (likely(rte_pktmbuf_alloc_bulk(rxq->mpool, nmb, nb_refill) 
== 0)) {
+   for (i = 0; i < nb_refill; i++) {
+   rx_buf_desc = &rx_buf_ring[next_avail + i];
+   rxq->sw_ring[next_avail + i] = nmb[i];
+   dma_addr = 
rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb[i]));
+   rx_buf_desc->header_buf_addr = 0;
+   rx_buf_desc->buf_addr = dma_addr;
+   }
+   next_avail += nb_refill;
+   rxq->nb_rx_hold -= nb_refill;
+   } else {
+   dev = &rte_eth_devices[rxq->port_id];
+   dev->data->rx_mbuf_alloc_failed += nb_desc - next_avail;
+   PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u 
queue_id=%u",
+   rxq->port_id, rxq->queue_id);
+   }
+   }
+
+   rte_write32(next_avail, rxq->qrx_tail);
+
+   rxq->bufq_tail = next_avail;
+}
+
+uint16_t
+gve_rx_burst_dqo(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+   volatile struct gve_rx_compl_desc_dqo *rx_compl_ring;
+   volatile struct gve_rx_compl_desc_dqo *rx_desc;
+   struct gve_rx_queue *rxq;
+   struct rte_mbuf *rxm;
+   uint16_t rx_id_bufq;
+   uint16_t pkt_len;
+   uint16_t rx_id;
+   uint16_t nb_rx;
+
+   nb_rx = 0;
+   rxq = rx_queue;
+   rx_id = rxq->rx_tail;
+   rx_id_bufq = rxq->next_avail;
+   rx_compl_ring = rxq->compl_ring;
+
+   while (nb_r

[PATCH 07/10] net/gve: support basic stats for DQO

2023-04-12 Thread Junfeng Guo

Add basic stats support for DQO.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c |  5 -
 drivers/net/gve/gve_rx_dqo.c | 14 +-
 drivers/net/gve/gve_tx_dqo.c |  7 +++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index a532b8a93a..8b6861a24f 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -150,14 +150,17 @@ gve_refill_dqo(struct gve_rx_queue *rxq)
 
diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], 
rxq->nb_rx_desc);
if (diag < 0) {
+   rxq->stats.no_mbufs_bulk++;
for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
nmb = rte_pktmbuf_alloc(rxq->mpool);
if (!nmb)
break;
rxq->sw_ring[i] = nmb;
}
-   if (i < rxq->nb_rx_desc - 1)
+   if (i < rxq->nb_rx_desc - 1) {
+   rxq->stats.no_mbufs += rxq->nb_rx_desc - 1 - i;
return -ENOMEM;
+   }
}
 
for (i = 0; i < rxq->nb_rx_desc; i++) {
diff --git a/drivers/net/gve/gve_rx_dqo.c b/drivers/net/gve/gve_rx_dqo.c
index d0eaea9c24..1d6b21359c 100644
--- a/drivers/net/gve/gve_rx_dqo.c
+++ b/drivers/net/gve/gve_rx_dqo.c
@@ -39,6 +39,8 @@ gve_rx_refill_dqo(struct gve_rx_queue *rxq)
next_avail = 0;
rxq->nb_rx_hold -= delta;
} else {
+   rxq->stats.no_mbufs_bulk++;
+   rxq->stats.no_mbufs += nb_desc - next_avail;
dev = &rte_eth_devices[rxq->port_id];
dev->data->rx_mbuf_alloc_failed += nb_desc - next_avail;
PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u 
queue_id=%u",
@@ -59,6 +61,8 @@ gve_rx_refill_dqo(struct gve_rx_queue *rxq)
next_avail += nb_refill;
rxq->nb_rx_hold -= nb_refill;
} else {
+   rxq->stats.no_mbufs_bulk++;
+   rxq->stats.no_mbufs += nb_desc - next_avail;
dev = &rte_eth_devices[rxq->port_id];
dev->data->rx_mbuf_alloc_failed += nb_desc - next_avail;
PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u 
queue_id=%u",
@@ -82,7 +86,9 @@ gve_rx_burst_dqo(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
uint16_t pkt_len;
uint16_t rx_id;
uint16_t nb_rx;
+   uint64_t bytes;
 
+   bytes = 0;
nb_rx = 0;
rxq = rx_queue;
rx_id = rxq->rx_tail;
@@ -96,8 +102,10 @@ gve_rx_burst_dqo(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
if (rx_desc->generation != rxq->cur_gen_bit)
break;
 
-   if (unlikely(rx_desc->rx_error))
+   if (unlikely(rx_desc->rx_error)) {
+   rxq->stats.errors++;
continue;
+   }
 
pkt_len = rx_desc->packet_len;
 
@@ -122,6 +130,7 @@ gve_rx_burst_dqo(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
rxm->hash.rss = rte_be_to_cpu_32(rx_desc->hash);
 
rx_pkts[nb_rx++] = rxm;
+   bytes += pkt_len;
}
 
if (nb_rx > 0) {
@@ -130,6 +139,9 @@ gve_rx_burst_dqo(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
rxq->next_avail = rx_id_bufq;
 
gve_rx_refill_dqo(rxq);
+
+   rxq->stats.packets += nb_rx;
+   rxq->stats.bytes += bytes;
}
 
return nb_rx;
diff --git a/drivers/net/gve/gve_tx_dqo.c b/drivers/net/gve/gve_tx_dqo.c
index 2ea38a8f8e..578a409616 100644
--- a/drivers/net/gve/gve_tx_dqo.c
+++ b/drivers/net/gve/gve_tx_dqo.c
@@ -81,10 +81,12 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
uint16_t nb_used;
uint16_t tx_id;
uint16_t sw_id;
+   uint64_t bytes;
 
sw_ring = txq->sw_ring;
txr = txq->tx_ring;
 
+   bytes = 0;
mask = txq->nb_tx_desc - 1;
sw_mask = txq->sw_size - 1;
tx_id = txq->tx_tail;
@@ -119,6 +121,7 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
tx_id = (tx_id + 1) & mask;
sw_id = (sw_id + 1) & sw_mask;
 
+   bytes += tx_pkt->pkt_len;
tx_pkt = tx_pkt->next;
} while (tx_pkt);
 
@@ -142,6 +145,10 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
rte_write32(tx_id, txq->qtx_tail);
txq->tx_tail = tx_id;
txq->sw_tail = sw_id;
+

[PATCH 08/10] net/gve: enable Tx checksum offload for DQO

2023-04-12 Thread Junfeng Guo

Enable Tx checksum offload once any flag of L4 checksum is set.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.h | 4 
 drivers/net/gve/gve_tx_dqo.c | 5 +
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 617bb55a85..4a0e860afa 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -38,6 +38,10 @@
 #define GVE_MAX_MTURTE_ETHER_MTU
 #define GVE_MIN_MTURTE_ETHER_MIN_MTU
 
+#define GVE_TX_CKSUM_OFFLOAD_MASK (\
+   RTE_MBUF_F_TX_L4_MASK  |\
+   RTE_MBUF_F_TX_TCP_SEG)
+
 /* A list of pages registered with the device during setup and used by a queue
  * as buffers
  */
diff --git a/drivers/net/gve/gve_tx_dqo.c b/drivers/net/gve/gve_tx_dqo.c
index 578a409616..b38eeaea4b 100644
--- a/drivers/net/gve/gve_tx_dqo.c
+++ b/drivers/net/gve/gve_tx_dqo.c
@@ -78,6 +78,7 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
uint16_t mask, sw_mask;
uint16_t nb_to_clean;
uint16_t nb_tx = 0;
+   uint64_t ol_flags;
uint16_t nb_used;
uint16_t tx_id;
uint16_t sw_id;
@@ -104,6 +105,7 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
if (txq->nb_free < tx_pkt->nb_segs)
break;
 
+   ol_flags = tx_pkt->ol_flags;
nb_used = tx_pkt->nb_segs;
 
do {
@@ -128,6 +130,9 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
/* fill the last descriptor with End of Packet (EOP) bit */
txd->pkt.end_of_packet = 1;
 
+   if (ol_flags & GVE_TX_CKSUM_OFFLOAD_MASK)
+   txd->pkt.checksum_offload_enable = 1;
+
txq->nb_free -= nb_used;
txq->nb_used += nb_used;
}
-- 
2.34.1

[PATCH 09/10] net/gve: add maintainers for GVE

2023-04-12 Thread Junfeng Guo

Add maintainers from Google for GVE.

Signed-off-by: Junfeng Guo 
Signed-off-by: Rushil Gupta 
---
 MAINTAINERS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e5099..08001751b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -713,6 +713,9 @@ F: doc/guides/nics/features/enic.ini
 
 Google Virtual Ethernet
 M: Junfeng Guo 
+M: Jeroen de Borst 
+M: Rushil Gupta 
+M: Joshua Washington 
 F: drivers/net/gve/
 F: doc/guides/nics/gve.rst
 F: doc/guides/nics/features/gve.ini
-- 
2.34.1

[PATCH 10/10] net/gve: support jumbo frame for GQI

2023-04-12 Thread Junfeng Guo

Add multi-segment support to enable GQI Rx Jumbo Frame.

Signed-off-by: Rushil Gupta 
Signed-off-by: Joshua Washington 
Signed-off-by: Junfeng Guo 
Signed-off-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.h |   8 ++
 drivers/net/gve/gve_rx.c | 137 +--
 2 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 4a0e860afa..53a75044c5 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -159,6 +159,13 @@ struct gve_tx_queue {
uint8_t is_gqi_qpl;
 };
 
+struct gve_rx_ctx {
+   struct rte_mbuf *mbuf_head;
+   struct rte_mbuf *mbuf_tail;
+   uint16_t total_frags;
+   bool drop_pkt;
+};
+
 struct gve_rx_queue {
volatile struct gve_rx_desc *rx_desc_ring;
volatile union gve_rx_data_slot *rx_data_ring;
@@ -167,6 +174,7 @@ struct gve_rx_queue {
uint64_t rx_ring_phys_addr;
struct rte_mbuf **sw_ring;
struct rte_mempool *mpool;
+   struct gve_rx_ctx ctx;
 
uint16_t rx_tail;
uint16_t nb_rx_desc;
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 3dd3f578f9..f2f6202404 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,8 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+#define GVE_PKT_CONT_BIT_IS_SET(x) (GVE_RXF_PKT_CONT & (x))
+
 static inline void
 gve_rx_refill(struct gve_rx_queue *rxq)
 {
@@ -87,43 +89,72 @@ gve_rx_refill(struct gve_rx_queue *rxq)
}
 }
 
-uint16_t
-gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+/*
+ * This method processes a single rte_mbuf and handles packet segmentation
+ * In QPL mode it copies data from the mbuf to the gve_rx_queue.
+ */
+static void
+gve_rx_mbuf(struct gve_rx_queue *rxq, struct rte_mbuf *rxe, uint16_t len,
+   uint16_t rx_id)
 {
-   volatile struct gve_rx_desc *rxr, *rxd;
-   struct gve_rx_queue *rxq = rx_queue;
-   uint16_t rx_id = rxq->rx_tail;
-   struct rte_mbuf *rxe;
-   uint16_t nb_rx, len;
-   uint64_t bytes = 0;
+   uint16_t padding = 0;
uint64_t addr;
-   uint16_t i;
-
-   rxr = rxq->rx_desc_ring;
-   nb_rx = 0;
 
-   for (i = 0; i < nb_pkts; i++) {
-   rxd = &rxr[rx_id];
-   if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
-   break;
-
-   if (rxd->flags_seq & GVE_RXF_ERR) {
-   rxq->stats.errors++;
-   continue;
-   }
-
-   len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
-   rxe = rxq->sw_ring[rx_id];
-   if (rxq->is_gqi_qpl) {
-   addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * 
PAGE_SIZE + GVE_RX_PAD;
-   rte_memcpy((void *)((size_t)rxe->buf_addr + 
rxe->data_off),
-  (void *)(size_t)addr, len);
-   }
+   rxe->data_len = len;
+   if (!rxq->ctx.mbuf_head) {
+   rxq->ctx.mbuf_head = rxe;
+   rxq->ctx.mbuf_tail = rxe;
+   rxe->nb_segs = 1;
rxe->pkt_len = len;
rxe->data_len = len;
rxe->port = rxq->port_id;
rxe->ol_flags = 0;
+   padding = GVE_RX_PAD;
+   } else {
+   rxq->ctx.mbuf_head->pkt_len += len;
+   rxq->ctx.mbuf_head->nb_segs += 1;
+   rxq->ctx.mbuf_tail->next = rxe;
+   rxq->ctx.mbuf_tail = rxe;
+   }
+   if (rxq->is_gqi_qpl) {
+   addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + 
padding;
+   rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+   (void *)(size_t)addr, len);
+   }
+}
+
+/*
+ * This method processes a single packet fragment associated with the
+ * passed packet descriptor.
+ * This methods returns whether the fragment is the last fragment
+ * of a packet.
+ */
+static bool
+gve_rx(struct gve_rx_queue *rxq, volatile struct gve_rx_desc *rxd, uint16_t 
rx_id)
+{
+   bool is_last_frag = !GVE_PKT_CONT_BIT_IS_SET(rxd->flags_seq);
+   uint16_t frag_size = rte_be_to_cpu_16(rxd->len);
+   struct gve_rx_ctx *ctx = &rxq->ctx;
+   bool is_first_frag = ctx->total_frags == 0;
+   struct rte_mbuf *rxe;
+
+   if (ctx->drop_pkt)
+   goto finish_frag;
 
+   if (rxd->flags_seq & GVE_RXF_ERR) {
+   ctx->drop_pkt = true;
+   rxq->stats.errors++;
+   goto finish_frag;
+   }
+
+   if (is_first_frag)
+   frag_size -= GVE_RX_PAD;
+
+   rxe = rxq->sw_ring[rx_id];
+   gve_rx_mbuf(rxq, rxe, frag_size, rx_id);
+   rxq->stats.bytes += frag_size;
+
+   if (is_first_frag) {
if (rxd->flags_seq & GVE_RXF_TCP)
rxe->packet_type |= RTE_PTYPE_L4_TCP;
if (rxd-

Re: [PATCH] dmadev: add tracepoints

2023-04-12 Thread fengchengwen

On 2023/4/12 19:00, Morten Brørup wrote:
>> From: Chengwen Feng [mailto:fengcheng...@huawei.com]
>> Sent: Wednesday, 12 April 2023 04.48
>>
>> Add tracepoints at important APIs for tracing support.
>>
>> Signed-off-by: Chengwen Feng 
>> ---
> 

...

>> +)
>> +
>> +RTE_TRACE_POINT(
>> +rte_dma_trace_stats_get,
> 
> This should be a fast path trace point.
> For reference, ethdev considers rte_eth_stats_get() a fast path function.

Emm, I think it should discuss more, and make it clear.

The cryptodev and dmadev trace-points both make 'rte_xxx_trace_stats_get' as a 
slow path function.
And it mainly refer to the fast path API (means if a API is fast path then make 
it as a fast-path trace-points).

But the ethdev trace-points treats 'calls in loop function(such as 
rte_eth_trace_stats_get/rte_eth_trace_link_get/...)'
as fast path trace-points.

> 
>> +RTE_TRACE_POINT_ARGS(int16_t dev_id, uint16_t vchan,
>> + struct rte_dma_stats *stats, int ret),
>> +rte_trace_point_emit_i16(dev_id);
>> +rte_trace_point_emit_u16(vchan);
>> +rte_trace_point_emit_u64(stats->submitted);
>> +rte_trace_point_emit_u64(stats->completed);
>> +rte_trace_point_emit_u64(stats->errors);
>> +rte_trace_point_emit_int(ret);
>> +)
>> +

...

>> diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
>> index 7031d6b335..4ee1b3f74a 100644
>> --- a/lib/dmadev/version.map
>> +++ b/lib/dmadev/version.map
>> @@ -1,6 +1,16 @@
>>  EXPERIMENTAL {
>>  global:
>>
>> +# added in 23.07
>> +__rte_dma_trace_burst_capacity;
>> +__rte_dma_trace_completed;
>> +__rte_dma_trace_completed_status;
>> +__rte_dma_trace_copy;
>> +__rte_dma_trace_copy_sg;
>> +__rte_dma_trace_fill;
>> +__rte_dma_trace_submit;
>> +
> 
> Intuitively, I would suppose that the 23.07 functions should be listed after 
> the 21.11 functions, not before.

+1, will fix in v2

> 
>> +# added in 21.11
> 
> Good catch.
> 
>>  rte_dma_close;
>>  rte_dma_configure;
>>  rte_dma_count_avail;
>> --
>> 2.17.1
> 
> 
> .
>

Re: [PATCH v1 1/2] dts: fabric requirements

2023-04-12 Thread Juraj Linkeš

On Wed, Apr 12, 2023 at 5:38 PM Honnappa Nagarahalli
 wrote:
>
>
>
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Wednesday, April 12, 2023 10:25 AM
> > To: Juraj Linkeš 
> > Cc: Wathsala Wathawana Vithanage ;
> > jspew...@iol.unh.edu; pr...@iol.unh.edu; Honnappa Nagarahalli
> > ; lijuan...@intel.com;
> > bruce.richard...@intel.com; dev@dpdk.org
> > Subject: Re: [PATCH v1 1/2] dts: fabric requirements
> >
> > 12/04/2023 15:42, Juraj Linkeš:
> > > On Tue, Apr 11, 2023 at 4:48 PM Thomas Monjalon 
> > wrote:
> > > >
> > > > 04/04/2023 13:51, Juraj Linkeš:
> > > > > On Mon, Apr 3, 2023 at 5:18 PM Thomas Monjalon
> >  wrote:
> > > > >
> > > > > > 03/04/2023 16:56, Juraj Linkeš:
> > > > > > > On Mon, Apr 3, 2023 at 2:33 PM Thomas Monjalon
> > > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > > 03/04/2023 13:46, Juraj Linkeš:
> > > > > > > > > Replace pexpect with Fabric.
> > > > > > > >
> > > > > > > > You should squash these lines with the move to Fabric.
> > > > > > > >
> > > > > > > > > Signed-off-by: Juraj Linkeš 
> > > > > > > > > ---
> > > > > > > > >  dts/poetry.lock| 553
> > > > > > +++--
> > > > > > > >
> > > > > > > > Do we really need *all* these lines?
> > > > > > > > I see a lot of lines about Windows and MacOSX which are not
> > > > > > > > supported
> > > > > > in
> > > > > > > > DTS.
> > > > > > > > It is so long that it looks impossible to review.
> > > > > > > >
> > > > > > > >
> > > > > > > This is a generated file and doesn't need to be reviewed.
> > > > > >
> > > > > > In general, I don't like storing generated files.
> > > > > >
> > > > >
> > > > > Me neither, but this one is specifically designed to be stored in
> > > > > a
> > > > > repository:
> > > > > https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock
> > > > > -file-to-version-control
> > > > >
> > > > >
> > > > > >
> > > > > > > I separated the
> > > > > > > dependencies part so that the code part is easier to review.
> > > > > > > If you
> > > > > > want, I
> > > > > > > can squash the two commits.
> > > > > >
> > > > > > What happens if we manually remove the useless lines?
> > > > > >
> > > > > >
> > > > > The lock file is there so that everyone installs exactly the same
> > > > > versions of dependencies. We can specify the versions of
> > > > > dependencies in pyproject.toml, but we won't control the versions
> > > > > of dependencies of dependencies this way. If we remove the changes
> > > > > to the lock file, then we won't be storing tested versions,
> > > > > everyone would be using slightly different versions and we may
> > > > > potentially need to address versioning issues in the future - best to 
> > > > > prevent
> > that with a lock file.
> > > >
> > > > You didn't answer about removing the usuless lines, like unneeded 
> > > > Windows
> > support.
> > > >
> > >
> > > Do you mean the list of files from macos and windows? I tried removing
> > > those from mypy and testing it and it looks like it didn't have an
> > > impact, but I don't know the inner workings of poetry and the lock
> > > file to test it properly (i.e. to rule out any breakages). What would
> > > be the reason for removing those? Seems like it has more downsides (we
> > > could potentially break something and it's extra work) than updsides
> > > (as this is a generated file, I don't really see any).
> >
> > Yes this is what I mean.
> > Any other opinion?
> >
> If it is a generated file, there might be an expectation from the tool that 
> the file is not changed. It would be good to understand this.
>
> Since it is a generated file, should we generate this during DTS run time 
> rather than storing a generated file?
>

The file is not used during runtime, but rather when installing
dependencies. It's supposed to be generated by maintainers (once every
time dependencies change or need updating) who verify the versions
defined in the generated lockfile so that everyone then uses the same
versions from that point on, preventing issues arising from different
users using different versions of dependencies. So it's maintainers
giving this file to other people.

Juraj

Event device early back-pressure indication

2023-04-12 Thread Mattias Rönnblom

Hi.

Consider this situation:

An application EAL thread receives an eventdev event (or some other 
stimuli), which in turn triggers some action. This action results in a 
number of new events being prepared, and a number of associated state 
changes in the application.

On attempting to enqueue the newly created batch of RTE_EVENT_OP_NEW 
events, it turns out the system is very busy, and the event device back 
pressures (i.e., returns a short count in rte_event_enqueue_new_burst()).

The application may now be a in tough spot, in case:

A) The processing was expensive and/or difficult to reverse (e.g., 
destructive changes were made to a packet).
B) The application does not have the option to discard the events (and 
any related mbufs).

In this situation, it would be very beneficial to the application if the 
event device give could give some assurance that a future enqueue 
operation will succeed (in its entirety).

 From what I understand from today's Eventdev API, there are no good 
options. You *may* be able to do some heuristics based on a event 
device-specific xstat (to infer the event device load), but that is not 
even close to "good". You may also try some application-level buffering, 
but that assumes that the packets/state changes are going to be 
identical, if they are to be sent at a later time. It would drive 
complexity in the app.

One seemingly clean way to solve this issue is to allow pre-allocation 
of RTE_NEW_OP_NEW credits. The eventdev API doesn't talk about credits, 
but at least in the event device implementations I've come across use 
some kind of credit system internally.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint16_t 
count);

In addition to this function, the application would also need some way 
to indicate, at the point of enqueue, that the credits have already been 
allocated.

I don't see any need for pre-allocating credits for non-RTE_OP_NEW 
events. (Some event devices don't even use credits to track such 
events.) Back pressure on RTE_OP_FORWARD usually spells disaster, in one 
form of the other.

You could use a bit in the rte_event struct for the purpose of signaling 
if its credit is pre-allocated. That would allow this change to happen, 
without any changes to the enqueue function prototypes.

However, this would require the event device to scan the event array.

I'm not sure I think there is a use case for mixing pre-allocated and 
non-pre-allocated events in the same burst.

If this burst-level separation is good enough, one could either change 
the existing rte_enqueue_new_burst() or add a new one. Something like:

uint16_t
rte_enqueue_new_burst(uint8_t dev_id, uint8_t port_id,
   const struct rte_event ev[],
   uint16_t nb_events, uint32_t flags);

#define RTE_EVENT_FLAG_PRE_CREDITS_ALLOCATED (UINT32_C(1) << 0)

A related shortcoming of the current eventdev API is that the 
new_event_threshold is tied to a port, which is impractical for 
applications which require different threshold for different kinds of 
events enqueued on the same port. One can use different ports, but that 
approach does not scale, since there may be significant memory and/or 
event device hardware resources tied to ports, and thus you cannot allow 
for a combinatorial explosion of ports.

This issue could be solve by allowing the application to specify the 
new_event_threshold, either per burst, or per event.

Per event doesn't make a lot of sense in practice, I think, since mixing 
events with different back pressure points will create head-of-line 
blocking. An early low-threshold event may prevent higher-indexed high 
threshold event in the same enqueue burst from being enqueued. This is 
the same reason it usually doesn't make sense to mix RTE_OP_NEW and 
RTE_OP_FORWARD events in the same burst.

Although the new_event_threshold seems completely orthogonal to the port 
to me, it could still serve as the default.

In case you find this a useful feature, it could be added to the credit 
allocation function.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint32_t 
new_event_threshold, uint16_t count);

If that is the only change, the user is required to pre-allocated 
credits to use a flexible new_event_threshold.

It seems to me that that might be something you can live with. Or, you 
add new enqueue_new_burst() variant where a new_event_threshold 
parameter is added.

It may also be useful to have a way to return credits, in case not all 
allocated was actually needed.

void
rte_event_return_new_credits(...);

Thoughts?

Best regards,
Mattias

81 matches

Mail list logo