Tx path performance

Stephen Hemminger Sun, 05 Apr 2026 21:26:33 -0700

On Thu, 26 Mar 2026 10:28:28 +0800
Junlong Wang <[email protected]> wrote:

> This patch optimizes the ZXDH PMD's receive and transmit path for better
> performance through several improvements:
> 
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
>   zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
>   copy.
> - Optimize descriptor management with prefetching and simplified
>   cleanup.
> - Reorganize structure fields for better cache locality.
> 
> These changes reduce CPU cycles and memory bandwidth consumption,
> resulting in improved packet processing throughput.
> 
> Signed-off-by: Junlong Wang <[email protected]>

I saw some things when reviewing but AI found lots more

On Thu, 26 Mar 2026 10:28:28 +0800
Junlong Wang <[email protected]> wrote:

> This patch optimizes the ZXDH PMD's receive and transmit path for better
> performance through several improvements:

Several issues found in review.

Errors:

1. zxdh_rxtx.c, pkt_padding(): The return value is never checked by
   the caller submit_to_backend_simple(). If rte_pktmbuf_prepend()
   fails and pkt_padding() returns -1, the descriptor is still
   written with the mbuf's iova and data_len, submitting a corrupt
   packet to the device. Must check the return value and skip the
   packet on failure.

2. zxdh_rxtx.c, zxdh_recv_single_pkts(): When zxdh_init_mbuf() fails
   the loop does "break" instead of continuing or freeing the
   remaining mbufs. The mbufs at rcv_pkts[i+1] through
   rcv_pkts[num-1] were already dequeued from the virtqueue by
   zxdh_dequeue_burst_rx_packed() but are never freed, leaking them.

3. zxdh_rxtx.c, refill_desc_unwrap(): Descriptors are written with a
   plain store "start_dp[idx].flags = flags" instead of using
   zxdh_queue_store_flags_packed(). The original
   zxdh_enqueue_recv_refill_packed() uses the store-barrier version
   to ensure addr/len are visible before the flags. Without the
   barrier, the device could see the available flag before the
   descriptor data is committed. The rte_io_wmb() at the end of
   refill_que_descs() is after all flags are already written, so
   it does not help.

4. zxdh_rxtx.c, zxdh_xmit_pkts_prepare(): The removal of
   rte_net_intel_cksum_prepare() means packets requesting checksum
   offload will not have their pseudo-headers prepared. If the HW
   expects a pseudo-header, transmitted checksums will be incorrect.

5. zxdh_queue.h, zxdh_queue_enable_intr(): This function checks
   "if (event_flags_shadow == DISABLE)" then sets it to DISABLE
   again. It never actually enables interrupts. Pre-existing bug
   but this patch touches the function and should fix it.

6. zxdh_ethdev.c, zxdh_init_queue(): The hdr_mz NULL check logic is
   contradictory. Lines 158-162 check "if (hdr_mz == NULL)" and goto
   fail_q_alloc, but line 169 then checks "if (hdr_mz)" before
   assigning zxdh_net_hdr_mem. If the first check fires, the second
   is unreachable. If it doesn't fire, the second is always true.
   Pick one guard and use it consistently.

Warnings:

1. zxdh_rxtx.c, zxdh_xmit_pkts_simple(): stats.bytes is never
   incremented. The packed path uses zxdh_update_packet_stats() but
   the simple path only counts packets and idle. The good_bytes
   xstat will always read zero on the simple TX path.

2. zxdh_rxtx.c, zxdh_recv_single_pkts(): Same issue -- stats.bytes
   is never incremented, so good_bytes will always be zero on the
   single-packet receive path.

3. zxdh_rxtx.c, zxdh_init_mbuf(): rte_pktmbuf_dump(stdout, rxm, 40)
   should not be in production code. It writes to stdout
   unconditionally on the error path. Use PMD_RX_LOG or remove it.

4. zxdh_ethdev.c, zxdh_dev_free_mbufs(): Changed from
   rte_pktmbuf_free() to rte_pktmbuf_free_seg(). If any mbufs in
   the TX queue are multi-segment (from the packed path which
   handles multi-seg via zxdh_xmit_enqueue_append), only the first
   segment will be freed, leaking the rest.

5. This patch is large (~800 lines, 8 files) and combines multiple
   independent changes: structure reorganization, new fast-path
   functions, sw_ring removal, descriptor management, removal of
   rte_net_intel_cksum_prepare, and MTU validation. Splitting into
   separate patches would make review and bisection easier.

Re: [PATCH v1] net/zxdh: optimize Rx/Tx path performance

Reply via email to