On Thu, 21 May 2026 18:56:31 +0000
Morten Brørup <[email protected]> wrote:

> The implementation for copying up to 64 bytes does not depend on address
> alignment with the size of the CPU's vector registers. Nonetheless, the
> exact same code for copying up to 64 bytes was present in both the aligned
> copy function and all the CPU vector register size specific variants of
> the unaligned copy functions.
> With this patch, the implementation for copying up to 64 bytes was
> consolidated into one instance, located in the common copy function,
> before checking alignment requirements.
> This provides three benefits:
> 1. No copy-paste in the source code.
> 2. A performance gain for copying up to 64 bytes, because the
> address alignment check is avoided in this case.
> 3. Reduced instruction memory footprint, because the compiler only
> generates one instance of the function for copying up to 64 bytes, instead
> of two instances (one in the unaligned copy function, and one in the
> aligned copy function).
> 
> Furthermore, __rte_restrict was added to source and destination addresses.
> 
> Also, the missing implementation of rte_mov48() was added.
> 
> Until recently, some drivers required disabling stringop-overflow warnings
> when using rte_memcpy().
> For some strange reason, these warnings were disabled in the rte_memcpy
> header file, instead of in the problematic drivers.
> With series-38174 ("remove use of rte_memcpy from net/intel"), the
> problematic drivers were updated to use memcpy() instead of rte_memcpy(),
> so disabling these warnings is no longer required, and was removed.
> 
> Regarding performance...
> The memcpy performance test (cache-to-cache copy) shows:
> Copying up to 15 bytes takes ca. 4.5 cycles, versus ca. 6.5 cycles before.
> Copying 8 bytes takes 4 cycles, versus 7 cycles before.
> Copying 16 bytes takes 2 cycles, versus 4 cycles before.
> Copying 64 bytes takes 4 cycles, versus 7 cycles before.
> 
> Depends-on: series-38174 ("remove use of rte_memcpy from net/intel")
> 
> Signed-off-by: Morten Brørup <[email protected]>
> Acked-by: Bruce Richardson <[email protected]>
> Acked-by: Konstantin Ananyev <[email protected]>
> ---

Used this as test case for comparing AI reviews.

Anthropic saw:

## Warnings

### 1. Missing rte_mov48() test coverage

**Issue:** The patch adds a new `rte_mov48()` function but does not mention 
adding tests for it.

**Why it matters:** New utility functions should have test coverage to verify 
correctness, especially for low-level copy operations that could silently 
corrupt data if implemented incorrectly.

**Suggested fix:** Add a test case in `app/test/test_memcpy.c` (or equivalent) 
that verifies `rte_mov48()` copies 48 bytes correctly and preserves data beyond 
the copy range.

### 2. Documentation comment inconsistency

**Issue:** The main `rte_memcpy()` function description comment was removed 
(lines 37-39 of original), but the opening documentation text about overlapping 
locations remains inconsistent across helper functions.

The original stated:
```c
/**
 * Copy bytes from one location to another. The locations must not overlap.
 *
 * @note This is implemented as a macro, so it's address should not be taken
 * and care is needed as parameter expressions may be evaluated multiple times.
```

The new version removes the `@note` about macro implementation (which is 
correct since it's now a function), but does not update the main function 
comment to include the same `@param` / `@return` documentation that other 
memory functions have.

**Suggested fix:** Add complete Doxygen documentation to `rte_memcpy()`:
```c
/**
 * Copy bytes from one location to another. The locations must not overlap.
 *
 * @param dst
 *   Pointer to the destination of the data.
 * @param src
 *   Pointer to the source data.
 * @param n
 *   Number of bytes to copy.
 * @return
 *   Pointer to the destination data.
 */
```

xAI (Grok) saw:
**Error**

- `rte_mov48()` implementation casts away `__rte_restrict` qualifiers (lines 
137-140 and 143-147).  
  Casting a restrict-qualified pointer to a non-restrict pointer is undefined 
behavior under the C standard and can break compiler alias analysis.  
  Fix: either make the callees (`rte_mov32`/`rte_mov16`) accept the restrict 
qualifiers, or use a local non-restrict pointer only inside `rte_mov48` after 
proper validation.

**Error**

- Multiple functions (`rte_mov16`, `rte_mov32`, `rte_mov48`, `rte_mov64`, 
`rte_mov128`, `rte_mov256`, `rte_mov128blocks`, `rte_mov512blocks`, 
`rte_memcpy_generic_more_than_64`, etc.) now take `uint8_t *__rte_restrict` 
parameters, but their callers inside the same file (e.g., the new `rte_memcpy` 
implementation and the various block-copy macros) pass `uint8_t *` without the 
restrict qualifier.  
  This creates implicit non-restrict paths that defeat the purpose of adding 
`__rte_restrict` and may trigger compiler warnings or incorrect optimizations.  
  Recommended fix: either propagate `__rte_restrict` consistently through all 
internal call sites or keep internal helpers non-restrict and only expose 
restrict at the public `rte_memcpy` boundary.

**Warning**

- `rte_memcpy_generic_more_than_64` and `rte_memcpy_aligned_more_than_64` are 
now the only implementations for >64-byte copies, but the old 
`rte_memcpy_generic` and `rte_memcpy_aligned` names have been removed.  
  Any external code (or test) that took the address of the old functions will 
break.  
  The patch should either keep the old names as aliases or document the removal 
in release notes.

**Warning**

- The new `rte_memcpy` implementation for 17–64 bytes contains a compile-time 
constant check (`if (__rte_constant(n) && n == 64)`) followed by a fallback 
path that still executes `rte_mov16` three times for SSE when `n > 48`.  
  This path is correct but duplicates logic that already exists in `rte_mov48`. 
 
  Consider calling `rte_mov48` for the AVX case and a single `rte_mov48` + tail 
for SSE to reduce code duplication.

**Info**

- The removal of the `#pragma GCC diagnostic` block for `-Wstringop-overflow` 
is correct per the patch description (drivers were updated). No further action 
needed.

**Info**

- All other style, API-tag, and documentation requirements appear satisfied. No 
additional correctness bugs (leaks, use-after-free, races, or forbidden tokens) 
were identified.



Reply via email to