This series provides optimized implementations of strnlen(), strchr(), and strrchr() for the RISC-V architecture. The strnlen() implementation is derived from the existing optimized strlen(). For strchr() and strrchr(), the current versions use simple byte-by-byte assembly logic, which will serve as a baseline for future Zbb-based optimizations.
The patch series is organized into three parts: 1. Correctness Testing: The first three patches add KUnit test cases for strlen(), strnlen(), and strrchr() to ensure the baseline and optimized versions are functionally correct. 2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include performance measurement capabilities, allowing for comparative analysis within the KUnit environment. 3. Architectural Optimizations: The final three patches introduce the RISC-V specific assembly implementations. Following suggestions from Andy Shevchenko, performance benchmarks have been added to string_kunit.c to provide quantifiable evidence of the improvements. Andy provided many specific comments on the implementation of the benchmark logic, which is also inspired by Eric Biggers' crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment, comparing the generic C implementation with the new RISC-V assembly versions. Performance Summary (Improvement %): --------------------------------------------------------------- Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long) --------------------------------------------------------------- strnlen | +72.6% | +350.1% | +427.5% strchr | +3.6% | +3.5% | -0.3% strrchr | +5.3% | +5.8% | +0.8% --------------------------------------------------------------- The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH and running: ./tools/testing/kunit/kunit.py run --arch=riscv \ --cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \ --raw_output The strnlen() implementation leverages the Zbb 'orc.b' instruction and word-at-a-time logic, showing significant gains as the string length increases. For strchr() and strrchr(), the handwritten assembly reduces fixed overhead by eliminating stack frame management. The gain is most prominent on short strings where function call overhead dominates, while the performance converges with the C implementation for longer strings in the TCG environment. I would like to thank Andy Shevchenko for the suggestion to add benchmarks and for his detailed feedback on the test framework, and Eric Biggers for the benchmarking approach. I am also grateful to Kees Cook for his suggestion to use vmalloc-based over-read detection. I am also grateful to Qingfang Deng for providing the optimized implementation logic for strnlen(). Thanks also to Joel Stanley for testing support and feedback, and to David Laight for his suggestions regarding performance measurement. Changes: v7: - Fix build error in 1/8 (call to undeclared function 'PAGE_ALIGN') as reported by kernel test robot. - Link to v6: https://lore.kernel.org/all/[email protected]/ v6: - Use vmalloc() and page-boundary alignment in strlen(), strnlen(), and strrchr() correctness tests to ensure over-reads are detected, as suggested by Kees Cook. Consequently, previous Acked-by and Tested-by tags for these tests have been dropped. - Added <linux/minmax.h> include. - Update STRING_BENCH_BUF macro to initialize variables inside the loop. - Fixed operator positioning and removed redundant blank lines. - Added warm-up iteration comment. v5: - Include <linux/ktime.h> for ktime_get_ns() and <linux/time64.h> for NSEC_PER_SEC. - Use #if IS_ENABLED(CONFIG_STRING_KUNIT_BENCH) to define the macro instead of using if (!IS_ENABLED(...)) inside the function. - Declare variables inside for-loops. - Simplify the for-loop logic in alloc_max_bench_buffer(). - Replace the magic number 1000 with (NSEC_PER_SEC / MEGA) to clarify the bytes/ns to MB/s conversion. v4: - Refine formatting and terminology: - Refer to '\0' as NUL. - Append parentheses () when referencing function names. - Ensure trailing commas are present in initializers. - Reorder local variable declarations to follow the "reverse Xmas tree" style. (Style-only change; kept existing Acked-by tags). - Improve documentation: Refine comments and commit messages for better clarity. - Improve readability by using (1 * MEGA) instead of 1000000UL. - Replace max_t() with max() where type-casting is unnecessary. - Simplify the return value check for kunit_kzalloc() in alloc_max_bench_buffer(). - Remove redundant NUL-terminator handling in STRING_BENCH_BUF(). - Optimize strnlen() implementation by replacing bleu/bgeu instructions with minu, as suggested by Qingfang Deng. - Remove incorrect Suggested-by tags from certain patches. - Drop Tested-by tags for benchmark-related patches due to significant framework changes since v3. - Re-run all tests and updated the performance data in the documentation. v3: - Re-implement benchmark logic inspired by crc_benchmark(). - Add 'len - 2' test case to strnlen correctness tests. - Incorporate detailed benchmark data into individual commit messages. v2: - Refactored lib/string.c to export __generic_* functions and added corresponding functional/performance tests for strnlen, strchr, and strrchr (Andy Shevchenko). - Replaced magic numbers with STRING_TEST_MAX_LEN etc. (Andy Shevchenko). v1: Initial submission. --- Feng Jiang (8): lib/string_kunit: add correctness test for strlen() lib/string_kunit: add correctness test for strnlen() lib/string_kunit: add correctness test for strrchr() lib/string_kunit: add performance benchmark for strlen() lib/string_kunit: extend benchmarks to strnlen() and chr searches riscv: lib: add strnlen() implementation riscv: lib: add strchr() implementation riscv: lib: add strrchr() implementation arch/riscv/include/asm/string.h | 9 ++ arch/riscv/lib/Makefile | 3 + arch/riscv/lib/strchr.S | 35 ++++ arch/riscv/lib/strnlen.S | 164 +++++++++++++++++++ arch/riscv/lib/strrchr.S | 37 +++++ arch/riscv/purgatory/Makefile | 11 +- lib/Kconfig.debug | 11 ++ lib/tests/string_kunit.c | 274 ++++++++++++++++++++++++++++++++ 8 files changed, 543 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/lib/strchr.S create mode 100644 arch/riscv/lib/strnlen.S create mode 100644 arch/riscv/lib/strrchr.S -- 2.25.1
