On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote: > This series provides optimized implementations of strnlen(), strchr(), > and strrchr() for the RISC-V architecture. The strnlen implementation > is derived from the existing optimized strlen. For strchr and strrchr,
strchr() and strrchr() > the current versions use simple byte-by-byte assembly logic, which > will serve as a baseline for future Zbb-based optimizations. > > The patch series is organized into three parts: > 1. Correctness Testing: The first three patches add KUnit test cases > for strlen, strnlen, and strrchr to ensure the baseline and optimized strlen(), strnlen(), and strrchr() > versions are functionally correct. > 2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include > performance measurement capabilities, allowing for comparative > analysis within the KUnit environment. > 3. Architectural Optimizations: The final three patches introduce the > RISC-V specific assembly implementations. > > Following suggestions from Andy Shevchenko, performance benchmarks have > been added to string_kunit.c to provide quantifiable evidence of the > improvements. Andy provided many specific comments on the implementation > of the benchmark logic, which is also inspired by Eric Biggers' > crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment, > comparing the generic C implementation with the new RISC-V assembly versions. > > Performance Summary (Improvement %): > --------------------------------------------------------------- > Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long) > --------------------------------------------------------------- > strnlen | +64.0% | +346.2% | +410.7% This is still suspicious. > strchr | +4.0% | +6.4% | +1.5% > strrchr | +6.6% | +2.8% | +0.0% > --------------------------------------------------------------- > The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH > and running: ./tools/testing/kunit/kunit.py run --arch=riscv \ > --cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \ > --raw_output > > The strnlen implementation leverages the Zbb 'orc.b' instruction and strnlen() > word-at-a-time logic, showing significant gains as the string length > increases. Hmm... Have you tried to optimise the generic implementation to use word-at-a-time logic and compare? > For strchr and strrchr, the handwritten assembly reduces strchr() and strrchr() > fixed overhead by eliminating stack frame management. The gain is most > prominent on short strings (1-16B) where function call overhead dominates, > while the performance converges with the C implementation for longer > strings in the TCG environment. -- With Best Regards, Andy Shevchenko
