> -----Original Message-----
> From: Guo, Jia <jia....@intel.com>
> Sent: Thursday, September 17, 2020 3:59 PM
> To: Yang, Qiming <qiming.y...@intel.com>; Xing, Beilei
> <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Wu, Jingjing
> <jingjing...@intel.com>; Wang, Haiyue <haiyue.w...@intel.com>
> Cc: Zhao1, Wei <wei.zh...@intel.com>; Richardson, Bruce
> <bruce.richard...@intel.com>; dev@dpdk.org; Guo, Jia <jia....@intel.com>;
> Zhang, Helin <helin.zh...@intel.com>; m...@smartsharesystems.com; Yigit,
> Ferruh <ferruh.yi...@intel.com>; step...@networkplumber.org;
> barbe...@kth.se; Han, YingyaX <yingyax....@intel.com>
> Subject: [PATCH v4 4/5] net/ice: fix vector rx burst for ice
>
> The limitation of burst size in vector rx was removed, since it should
> retrieve as
> much received packets as possible. And also the scattered receive path should
> use a wrapper function to achieve the goal of burst maximizing. And do some
> code cleaning for vector rx path.
>
> Bugzilla ID: 516
> Fixes: c68a52b8b38c ("net/ice: support vector SSE in Rx")
> Fixes: ae60d3c9b227 ("net/ice: support Rx AVX2 vector")
>
> Signed-off-by: Jeff Guo <jia....@intel.com>
> Tested-by: Yingya Han <yingyax....@intel.com>
> ---
> drivers/net/ice/ice_rxtx.h | 1 +
> drivers/net/ice/ice_rxtx_vec_avx2.c | 23 ++++++------
> drivers/net/ice/ice_rxtx_vec_sse.c | 56 +++++++++++++++++++----------
> 3 files changed, 49 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index
> 2fdcfb7d0..3ef5f300d 100644
> --- a/drivers/net/ice/ice_rxtx.h
> +++ b/drivers/net/ice/ice_rxtx.h
> @@ -35,6 +35,7 @@
> #define ICE_MAX_RX_BURST ICE_RXQ_REARM_THRESH
> #define ICE_TX_MAX_FREE_BUF_SZ 64
> #define ICE_DESCS_PER_LOOP 4
> +#define ICE_DESCS_PER_LOOP_AVX 8
No need to expose this if no external link, better to keep all avx stuff inside
avx.c
>
> #define ICE_FDIR_PKT_LEN 512
>
> diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c
> b/drivers/net/ice/ice_rxtx_vec_avx2.c
> index be50677c2..843e4f32a 100644
> --- a/drivers/net/ice/ice_rxtx_vec_avx2.c
> +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
> @@ -29,7 +29,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq)
> __m128i dma_addr0;
>
> dma_addr0 = _mm_setzero_si128();
> - for (i = 0; i < ICE_DESCS_PER_LOOP; i++) {
> + for (i = 0; i < ICE_DESCS_PER_LOOP_AVX; i++) {
> rxep[i].mbuf = &rxq->fake_mbuf;
> _mm_store_si128((__m128i *)&rxdp[i].read,
> dma_addr0);
> @@ -132,12 +132,17 @@ ice_rxq_rearm(struct ice_rx_queue *rxq)
> ICE_PCI_REG_WRITE(rxq->qrx_tail, rx_id); }
>
> +/**
> + * vPMD raw receive routine, only accept(nb_pkts >=
> +ICE_DESCS_PER_LOOP_AVX)
> + *
> + * Notice:
> + * - nb_pkts < ICE_DESCS_PER_LOOP_AVX, just return no packet
> + * - floor align nb_pkts to a ICE_DESCS_PER_LOOP_AVX power-of-two */
The comment is misleading, it looks like we are going to floor align nb_pkts to
2^8, better to reword .
> static inline uint16_t
> _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf
> **rx_pkts,
> uint16_t nb_pkts, uint8_t *split_packet) { -#define
> ICE_DESCS_PER_LOOP_AVX 8
> -
> const uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl;
> const __m256i mbuf_init = _mm256_set_epi64x(0, 0,
> 0, rxq->mbuf_initializer);
> @@ -603,10 +608,6 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue
> *rxq, struct rte_mbuf **rx_pkts,
> return received;
> }
>
> -/*
> - * Notice:
> - * - nb_pkts < ICE_DESCS_PER_LOOP, just return no packet
> - */
> uint16_t
> ice_recv_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t nb_pkts)
> @@ -616,8 +617,6 @@ ice_recv_pkts_vec_avx2(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>
> /**
> * vPMD receive routine that reassembles single burst of 32 scattered
> packets
> - * Notice:
> - * - nb_pkts < ICE_DESCS_PER_LOOP, just return no packet
> */
Why we need to remove this? is it still true for this function?
> static uint16_t
> ice_recv_scattered_burst_vec_avx2(void *rx_queue, struct rte_mbuf
> **rx_pkts, @@ -626,6 +625,9 @@ ice_recv_scattered_burst_vec_avx2(void
> *rx_queue, struct rte_mbuf **rx_pkts,
> struct ice_rx_queue *rxq = rx_queue;
> uint8_t split_flags[ICE_VPMD_RX_BURST] = {0};
>
> + /* split_flags only can support max of ICE_VPMD_RX_BURST */
> + nb_pkts = RTE_MIN(nb_pkts, ICE_VPMD_RX_BURST);
Is this necessary? the only consumer of this function is
ice_recv_scattered_pkts_vec_avx2,
I think nb_pkts <= ICE_VPMD_RX_BURST it already be guaranteed.
> +
> /* get some new buffers */
> uint16_t nb_bufs = _ice_recv_raw_pkts_vec_avx2(rxq, rx_pkts, nb_pkts,
> split_flags);
> @@ -657,9 +659,6 @@ ice_recv_scattered_burst_vec_avx2(void *rx_queue,
> struct rte_mbuf **rx_pkts,
>
> /**
> * vPMD receive routine that reassembles scattered packets.
> - * Main receive routine that can handle arbitrary burst sizes
> - * Notice:
> - * - nb_pkts < ICE_DESCS_PER_LOOP, just return no packet
> */
Why we need to remove this? isn't it the main routine that be able to handle
arbitrary burst size?
Btw, I will suggest all AVX2 changes can be in a separate patch, because this
looks like some code clean and fix.
its not related with the main purpose of the patch set.