The attached patch (which I haven't installed) simplifies
stdc_memreverse8 a bit, and I imagine it might make it a tad faster in
some cases on x86-64 with GCC 15, as the compiler generates one less
conditional branch in the function prolog. Also though I doubt whether
it matters, the tight loop has 2 fewer bytes of instructions.
Since the attached patch implements C2y almost word for word and the
patched code is therefore a bit simpler to verify, is there some reason
why it shouldn't be applied? Maybe it's really slow on some other
platform? If so, a comment to that effect would be helpful.From d2cc68d8bb5aed53ce4ac3babe14e5cb4810fb55 Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Mon, 16 Mar 2026 16:34:49 -0700
Subject: [PATCH] stdc_memreverse8: simplify and slightly tune
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* lib/stdbit.in.h (stdc_memreverse8):
Use code that closely mimics draft C2y.
On x86-64 with gcc 15 -O2, this generates slightly-better code
for the non-inlined version, and doesn’t seem to hurt inlining.
---
ChangeLog | 6 ++++++
lib/stdbit.in.h | 24 ++++++++++--------------
2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 1032063da7..342176863c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
2026-03-16 Paul Eggert <[email protected]>
+ stdc_memreverse8: simplify and slightly tune
+ * lib/stdbit.in.h (stdc_memreverse8):
+ Use code that closely mimics draft C2y.
+ On x86-64 with gcc 15 -O2, this generates slightly-better code
+ for the non-inlined version, and doesn’t seem to hurt inlining.
+
stdbit-h: don’t generate some dummy .o files
On recent GNU and other C23ish platforms, do not compile files
like lib/stdc_bit_ceil.c, as the corresponding .o files contain
diff --git a/lib/stdbit.in.h b/lib/stdbit.in.h
index 8b61300041..b8c99d0247 100644
--- a/lib/stdbit.in.h
+++ b/lib/stdbit.in.h
@@ -1307,21 +1307,17 @@ stdc_rotate_right_ull (unsigned long long int v, unsigned int c)
_GL_STDC_MEMREVERSE8_INLINE void
stdc_memreverse8 (size_t n, unsigned char *ptr)
{
- if (n > 0)
+ /* There is no need to optimize the cases N == 1, N == 2, N == 4
+ specially using __builtin_constant_p, because GCC does the possible
+ optimizations already, taking into account the alignment of PTR:
+ GCC >= 3 for N == 1, GCC >= 8 for N == 2, GCC >= 13 for N == 4.
+ (Whereas clang >= 3, <= 22 optimizes only the case N == 1.) */
+ for (size_t i = 0; i < n / 2; i++)
{
- /* There is no need to optimize the cases N == 1, N == 2, N == 4
- specially using __builtin_constant_p, because GCC does the possible
- optimizations already, taking into account the alignment of PTR:
- GCC >= 3 for N == 1, GCC >= 8 for N == 2, GCC >= 13 for N == 4.
- (Whereas clang >= 3, <= 22 optimizes only the case N == 1.) */
- size_t i, j;
- for (i = 0, j = n-1; i < j; i++, j--)
- {
- unsigned char xi = ptr[i];
- unsigned char xj = ptr[j];
- ptr[j] = xi;
- ptr[i] = xj;
- }
+ unsigned char xi = ptr[i];
+ unsigned char xj = ptr[n - i - 1];
+ ptr[n - i - 1] = xi;
+ ptr[i] = xj;
}
}
--
2.51.0