On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:
> Introduce a benchmark to compare the architecture-optimized strlen()
> implementation against the generic C version (__generic_strlen).
>
> The benchmark uses a table-driven approach to evaluate performance
> across different string lengths (short, medium, and long). It employs
> ktime_get() for timing and get_random_bytes() followed by null-byte
> filtering to generate test data that prevents early termination.
>
> This helps in quantifying the performance gains of architecture-specific
> optimizations on various platforms.
...
> +static void string_test_strlen_bench(struct kunit *test)
> +{
> + char *buf;
> + size_t buf_len, iters;
> + ktime_t start, end;
> + u64 time_arch, time_generic;
> +
> + buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
> +
> + buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
> + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
> +
> + for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
> + get_random_nonzero_bytes(buf, bench_cases[i].len);
> + buf[bench_cases[i].len] = '\0';
> +
> + iters = bench_cases[i].iterations;
> +
> + /* 1. Benchmark the architecture-optimized version */
> + start = ktime_get();
> + for (unsigned int j = 0; j < iters; j++) {
> + OPTIMIZER_HIDE_VAR(buf);
> + (void)strlen(buf);
First Q: Are you sure the compiler doesn't replace this with __builtin_strlen()
?
> + }
> + end = ktime_get();
> + time_arch = ktime_to_ns(ktime_sub(end, start));
> +
> + /* 2. Benchmark the generic C version */
> + start = ktime_get();
> + for (unsigned int j = 0; j < iters; j++) {
> + OPTIMIZER_HIDE_VAR(buf);
> + (void)__generic_strlen(buf);
> + }
Are you sure the warmed up caches do not affect the benchmark? I think you need
to flush / make caches dirty or so on each iteration.
> + end = ktime_get();
> + time_generic = ktime_to_ns(ktime_sub(end, start));
> +
> + string_bench_report(test, "strlen", &bench_cases[i],
> + time_arch, time_generic);
> + }
> +}
--
With Best Regards,
Andy Shevchenko