On Sat, 28 Mar 2026 21:53:27 +0800 sunyuechi <[email protected]> wrote:
> On 2/6/26 4:16 PM, Sun Yuechi wrote: > > > Implement ip4_lookup_node_process_vec function for RISC-V architecture > > using RISC-V Vector Extension instruction set > > > > Signed-off-by: Sun Yuechi <[email protected]> > > Signed-off-by: Zijian <[email protected]> > > --- > > doc/guides/rel_notes/release_26_03.rst | 4 + > > lib/eal/riscv/include/rte_vect.h | 2 +- > > lib/node/ip4_lookup.c | 5 +- > > lib/node/ip4_lookup_rvv.h | 167 +++++++++++++++++++++++++ > > 4 files changed, 176 insertions(+), 2 deletions(-) > > create mode 100644 lib/node/ip4_lookup_rvv.h > > ping > There as no ack yet. Ran it through AI for review and it had lots of feedback. The only item worth noting is the naming of rte_lpm_lookup_vec which should match other arch. --- This series adds RISC-V Vector Extension (RVV) support to the IPv4 LPM lookup node. Patch 1/2 is a clean one-liner enabling the default SIMD bitwidth on RISC-V; cross-checked against the arm/ppc/x86 conventions in lib/eal/*/include/rte_vect.h, the change is correct and consistent with how those architectures handle the same define. No findings on patch 1/2. Findings on patch 2/2 below. [PATCH v4 2/2] node: lookup with RISC-V vector extension ======================================================== Warnings -------- * lib/node/ip4_lookup_rvv.h:14: the static inline helper is named rte_lpm_lookup_vec(). The rte_lpm_* prefix is reserved for the LPM library's API namespace (see lib/lpm/rte_lpm*.h). Defining a static inline with that prefix in a node-library private header is misleading -- it implies a public LPM API where there is none. For comparison, the SVE bulk lookup at lib/lpm/rte_lpm_sve.h:16 uses __rte_lpm_lookup_vec (double underscore, internal) and lives in the LPM library proper, exposed through rte_lpm.h's #undef/#define rte_lpm_lookup_bulk override. The NEON and SSE node paths (lib/node/ip4_lookup_neon.h:114, lib/node/ip4_lookup_sse.h:116) do not define their own helpers at all -- they call the public rte_lpm_lookupx4() from the LPM library. Other static helpers in lib/node/ use the node_* prefix (e.g. node_mbuf_priv1, node_mbuf_priv2 in lib/node/node_private.h). Two suggested options, in order of preference: 1. Move the bulk lookup into lib/lpm/rte_lpm_rvv.h as __rte_lpm_lookup_vec() with the same signature pattern as the SVE version, and have lib/lpm/rte_lpm.h conditionally override rte_lpm_lookup_bulk for the RVV case. The node path then becomes a plain rte_lpm_lookup_bulk() call and the implementation is reusable by other consumers (FIB, l3fwd, etc.). 2. Keep the helper local to the node header but rename it -- e.g. ip4_lookup_rvv_lpm_lookup() or just lpm_lookup_vec() -- so it does not occupy the rte_lpm_* namespace. Info ---- * lib/node/ip4_lookup_rvv.h: unlike ip4_lookup_neon.h, the RVV path does no prefetching of upcoming mbufs or packet headers. NEON prefetches both the next-line of objs[] and the next four packets' L3 headers. On RISC-V cores with hardware prefetchers this may be a wash, but on cores without one the per-iteration vl-wide gather over pkts[i] and the IPv4 header reads may stall. Worth measuring. * lib/node/ip4_lookup_rvv.h: the per-mbuf metadata is written in two passes -- cksum/ttl in the first loop, nh in the second. The NEON path packs all three into a uint64_t and writes once via node_mbuf_priv1(mbuf, dyn)->u = ...; (the overload struct is laid out as { uint16_t nh; uint16_t ttl; uint32_t cksum; } in rte_node_mbuf_dynfield.h:48). A single 64-bit store per mbuf would halve the store traffic to the dynfield region. * The release-notes entry is correctly placed under "New Features". Consider mentioning the dependency on RTE_RISCV_FEATURE_V (i.e. that this only activates when toolchain/-march reports the V extension), so users on non-V RISC-V builds know why they don't see a perf change. Notes from cross-checking (no action needed) -------------------------------------------- - The bswap32_vec() open-coded byte reversal is correct for the little-endian RISC-V configuration DPDK targets (rte_byteorder.h defines RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN unconditionally for riscv). - The byte-offset arithmetic for vluxei32 into tbl24 and tbl8 matches the scalar lookup in lib/lpm/rte_lpm.h:295-320 (entry index * sizeof(uint32_t) via <<2; tbl8 group_idx * 256 + ip_low). The static_assert at rte_lpm.h:121 guarantees sizeof(rte_lpm_tbl_entry) == 4. - The mu (mask-undisturbed) policy on the second vluxei32 correctly mirrors the scalar's "only follow tbl8 when VALID_EXT bit is set", and per the V spec masked-off elements raise no exceptions, so the unconditional pre-computation of vtbl8_index for masked-off lanes is safe even when those lanes contain garbage offsets. - vbool4_t is the correct mask type for SEW=32, LMUL=8 (ratio 4). - RVV_MAX_BURST=64 with the outer `while (n_left_from > 0)` loop correctly chunks the full nb_objs (up to RTE_GRAPH_BURST_SIZE=256) through repeated vsetvl calls. - The miss-counting heuristic `(res[i] >> 16) == (drop_nh >> 16)` matches what NEON does at lib/node/ip4_lookup_neon.h:117-120; it diverges from the scalar's "rc != 0" only when a user's LPM table legitimately resolves to the drop next-node, which is the same behavior already present in the existing vector paths.

