On Fri, Mar 27, 2026, at 4:22 PM, Greg Burd wrote:
> On Mon, Mar 23, 2026, at 11:09 AM, Nathan Bossart wrote:
>> On Sun, Mar 22, 2026 at 02:01:50PM -0400, Andres Freund wrote:
>>> I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was
>>> spent
>>> all that effectively - hard to believe there's any real world workloads
>>> where
>>> that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
>>> world use of those platforms, making niche-y perf improvements somewhat
>>> worthwhile. Whereas there's afaict not yet a whole lot of riscv production
>>> adoption.
>
> Hey Nathan,
>
>> That work was partially motivated by vector stuff that used popcount
>> functions pretty heavily, but yeah, the complexity compared to the gains is
>> the main reason I've been pushing to just use simd.h elsewhere (i.e., SSE2
>> and Neon). I'd still consider using AVX-512, etc. for things if the impact
>> on real-world workloads was huge, though.
>
> Yes, that and by research done while trying to understand why my RISC-V
> build farm animal "greenfly" (OrangePi RV2 with a VisionFive 2 CPU:
> RISC-V RV64GC + Zba/Zbb/Zbc/Zbs) is failing consistently.
>
>> --
>> nathan
>
> Forgive me, while $subject only mentions popcount I couldn't help
> myself so I added a few more RISC-V patches including a bug fix that I
> hope makes greenfly happy again.
>
>
> 0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.
>
> ------> Join me in "the rabbit hole" on this issue if you care to...
>
> The existing software DES (as shown by the build-farm animal "greenfly"
> [1]) fails because Clang 20 has an auto-vectorization bug that we
> trigger in the DES initialization code (des_init() function), not the
> DES encryption algorithm itself.
>
> I searched the LLVM issue tracker, here are the issues that caught my eye:
> 1. Issue #176001 - "RISC-V Wrong code at -O1"
> - Vector peephole optimization with vmerge folding
> - Fixed by PR #176077 (merged Jan 2024)
> - Link: https://github.com/llvm/llvm-project/issues/176001
> 2. Issue #187458 - "Wrong code for vector.extract.last.active"
> - Large index issues with zvl1024b
> - Partially fixed, still work ongoing
> - Link: https://github.com/llvm/llvm-project/issues/187458
> 3. Issue #171978 - "RISC-V Wrong code at -O2/O3"
> - Illegal instruction from mismatched EEW
> - Under investigation
> - Link: https://github.com/llvm/llvm-project/issues/171978
> 4. PR #176105 - "Fix i64 gather/scatter cost on rv32"
> - Cost model fixes for scatter/gather (merged Jan 2026)
> - Link: https://github.com/llvm/llvm-project/pull/176105
>
> My fix in 0001 is simply adding this in a few places in crypt-des.c:
>
> #if defined(__riscv) && defined(__clang__)
> pg_memory_barrier();
> #endif
>
> While searching I ran across a different solution, adding `-mllvm
> -riscv-v-vector-bits-min=0` sets the minimum vector bit width for
> RISC-V vector extension in LLVM to 0 disabling all vectorization
> forcing scalar code generation, no RVV instructions are emitted. This
> would prevent the DES bug at the cost of any vectorization anywhere in
> the binary.
>
> While that might also fix the other intermittent bug we'd been seeing
> on greenfly (not tested) disablnig all RVV optimizations seems to heavy
> handed to me.
>
>
> ------> Moving on.
>
> 0002 - (was "0001" in v2) this is unchanged, it implements popcount
> using Zbb extension on RISC-V
>
> 0003 - is a small patch that adapted from the Google Abseil project's
> RISC-V CRC32C implementation [1]. It is *a lot faster* than the
> software crc32c we fall back to now (see: riscv-crc32c.c). This
> algorithm requires the Zbc (or Zbkc) extension (for clmul) so the patch
> tests for that at build and adds the '-march' flag when it is.
> However, as is the case for Zbb and popcnt in, the presence of Zbc (or
> Zbkc) must be detected at runtime. That's done following the
> pre-existing pattern used for ARM features. This does introduce some
> runtime overhead and complexity, not more than required I hope.
>
> I attached test code, and results at the end of this email:
> * riscv-popcnt.c - unchanged
> * riscv-crc32c.c - new, based on work in the Google Abseil project
> * riscv-des.c - highlights the fix for DES using Clang on RISC-V
>
> I guess the question for 002 and/or 003 is if the "juice" is worth the
> "squeeze" or not. There is a lot of performance juice to be had IMO.
> But some might argue that RISC-V isn't widely adopted yet, and they'd
> be right. Others might point out that RISC-V is currently showing up
> in embedded systems more than server/desktop/laptop/cloud, also true.
> However, there is some evidence that is changing as there are RISC-V in
> servers [2][3], and there is a hosted (cloud) solution from Scaleway
> [4]. There exists a 64 core RISC-V desktop [6] and a Framework laptop
> mainboard [7] sporting a RISC-V CPUs. And there is the OrangePi RV2
> [7] I have that is "greenfly".
>
> Is it early days? Certainly! But too early? That's up for debate. :)
>
> If nothing else, these patches can be a durable record and used later
> when RISC-V is a critical platform for Postgres or informational to
> other projects.
Rebased and tested (v4) adding better support for RISC-V with a fix for DES and
faster popcount and CRC32 when the CPU supports it.
best.
-greg
> best.
>
> -greg
>
> [1] https://github.com/abseil/abseil-cpp/pull/1986
> absl/crc/internal/crc_riscv.cc
> [2]
> https://www.firefly.store/products/rs-sra120-risc-v-server-2u-computing-server-cloud-storage-large-model-sg2042
> [3]
> https://edgeaicomputer.com/our-products/servers/risc-v-compute-server-sra1-20/
> [4]
> https://www.scaleway.com/en/news/scaleway-launches-its-risc-v-servers-in-the-cloud-a-world-first-and-a-firm-commitment-to-technological-independence/
> [5] https://milkv.io/pioneer and
> https://www.crowdsupply.com/milk-v/milk-v-pioneer/updates/current-status-of-production
> [6] https://deepcomputing.io/product/dc-roma-risc-v-mainboard/
> [7]
> http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-RV2.html
>
>
> ---- TEST PROGRAM OUTPUT:
>
> gburd@rv:~/ws/postgres$ make -f Makefile.RISCV
> gcc -O2 riscv-des.c -o des-gcc-sw
> gcc -O2 riscv-des.c -march=rv64gcv -o des-gcc-hw
> clang-20 -O1 riscv-des.c -o des-clang-o1-sw
> clang-20 -O1 -march=rv64gcv riscv-des.c -o des-clang-o1-hw
> clang-20 -O2 riscv-des.c -o des-clang-o2-sw
> clang-20 -O2 -march=rv64gcv riscv-des.c -o des-clang-o2-hw
> gcc -O2 -o popcnt-gcc-o2-sw riscv-popcnt.c
> gcc -O2 -march=rv64gc_zbb -o popcnt-gcc-o2-hw riscv-popcnt.c
> clang-20 -O2 -o popcnt-clang-o2-sw riscv-popcnt.c
> clang-20 -O2 -march=rv64gc_zbb -o popcnt-clang-o2-hw riscv-popcnt.c
> gcc -O2 -o crc32c-gcc-o2-sw riscv-crc32c.c
> gcc -O2 -march=rv64gc_zbc -o crc32c-gcc-o2-hw riscv-crc32c.c
> clang-20 -O2 -o crc32c-clang-o2-sw riscv-crc32c.c
> clang-20 -O2 -march=rv64gc_zbc -o crc32c-clang-o2-hw riscv-crc32c.c
> gburd@rv:~/ws/postgres$ make -f Makefile.RISCV test
> ./des-gcc-sw
> Compiler: GCC 13.3.0
> Target: RISC-V 64-bit
> Vector extension: Not enabled
>
> Testing WITHOUT compiler barriers:
> PASS: Permutation tables are correct
>
> Testing WITH compiler barriers:
> PASS: Permutation tables are correct
>
> Performance Comparison (1000000 iterations):
> Without barriers: 0.409 seconds (409 ns/iter)
> With barriers: 0.416 seconds (416 ns/iter)
> Overhead: 1.6%
> ./des-gcc-hw
> Compiler: GCC 13.3.0
> Target: RISC-V 64-bit
> Vector extension: Enabled (RVV)
>
> Testing WITHOUT compiler barriers:
> PASS: Permutation tables are correct
>
> Testing WITH compiler barriers:
> PASS: Permutation tables are correct
>
> Performance Comparison (1000000 iterations):
> Without barriers: 0.410 seconds (410 ns/iter)
> With barriers: 0.410 seconds (410 ns/iter)
> Overhead: Negligible
> ./des-clang-o1-sw
> Compiler: Clang 20.1.2
> Target: RISC-V 64-bit
> Vector extension: Not enabled
>
> Testing WITHOUT compiler barriers:
> PASS: Permutation tables are correct
>
> Testing WITH compiler barriers:
> PASS: Permutation tables are correct
>
> Performance Comparison (1000000 iterations):
> Without barriers: 0.517 seconds (517 ns/iter)
> With barriers: 0.516 seconds (516 ns/iter)
> Overhead: Negligible
> ./des-clang-o1-hw
> Compiler: Clang 20.1.2
> Target: RISC-V 64-bit
> Vector extension: Enabled (RVV)
>
> Testing WITHOUT compiler barriers:
> PASS: Permutation tables are correct
>
> Testing WITH compiler barriers:
> PASS: Permutation tables are correct
>
> Performance Comparison (1000000 iterations):
> Without barriers: 0.405 seconds (405 ns/iter)
> With barriers: 0.405 seconds (405 ns/iter)
> Overhead: Negligible
> ./des-clang-o2-sw
> Compiler: Clang 20.1.2
> Target: RISC-V 64-bit
> Vector extension: Not enabled
>
> Testing WITHOUT compiler barriers:
> PASS: Permutation tables are correct
>
> Testing WITH compiler barriers:
> PASS: Permutation tables are correct
>
> Performance Comparison (1000000 iterations):
> Without barriers: 0.517 seconds (517 ns/iter)
> With barriers: 0.518 seconds (518 ns/iter)
> Overhead: Negligible
> ./des-clang-o2-hw
> Compiler: Clang 20.1.2
> Target: RISC-V 64-bit
> Vector extension: Enabled (RVV)
>
> Testing WITHOUT compiler barriers:
> ERROR: un_pbox mismatch:
> un_pbox[0] = 15, expected 8
> un_pbox[1] = 6, expected 16
> un_pbox[2] = 19, expected 22
> un_pbox[3] = 20, expected 30
> un_pbox[4] = 28, expected 12
> ... and 27 more errors
> FAIL: Permutation tables are incorrect
>
> Testing WITH compiler barriers:
> PASS: Permutation tables are correct
>
> Performance Comparison (1000000 iterations):
> Without barriers: 0.093 seconds (93 ns/iter)
> With barriers: 0.407 seconds (407 ns/iter)
> Overhead: 335.5%
> ./popcnt-gcc-o2-sw
> sw popcount: 0.183 sec ( 547.89 MB/s)
> hw popcount: 0.274 sec ( 365.40 MB/s)
>
> diff: 0.67x
> match: 406261900 bits counted
> ./popcnt-gcc-o2-hw
> sw popcount: 0.182 sec ( 548.17 MB/s)
> hw popcount: 0.044 sec ( 2287.82 MB/s)
>
> diff: 4.17x
> match: 406261900 bits counted
> ./popcnt-clang-o2-sw
> sw popcount: 0.188 sec ( 531.96 MB/s)
> hw popcount: 0.207 sec ( 482.84 MB/s)
>
> diff: 0.91x
> match: 406261900 bits counted
> ./popcnt-clang-o2-hw
> sw popcount: 0.224 sec ( 446.46 MB/s)
> hw popcount: 0.056 sec ( 1794.83 MB/s)
>
> diff: 4.02x
> match: 406261900 bits counted
> ./crc32c-gcc-o2-sw
> sw crc32c: 0.651 sec ( 153.68 MB/s)
> hw crc32c: 0.651 sec ( 153.72 MB/s)
>
> diff: 1.00x
> match: 0x0B141F2D
>
> validation: CRC32C("123456789") = 0xE3069283 (correct)
> ./crc32c-gcc-o2-hw
> sw crc32c: 0.651 sec ( 153.70 MB/s)
> hw crc32c: 0.000 sec ( 308052.33 MB/s)
>
> diff: 2004.21x
> match: 0x0B141F2D
>
> validation: CRC32C("123456789") = 0xE3069283 (correct)
> ./crc32c-clang-o2-sw
> sw crc32c: 0.584 sec ( 171.10 MB/s)
> hw crc32c: 0.584 sec ( 171.17 MB/s)
>
> diff: 1.00x
> match: 0x0B141F2D
>
> validation: CRC32C("123456789") = 0xE3069283 (correct)
> ./crc32c-clang-o2-hw
> sw crc32c: 0.584 sec ( 171.15 MB/s)
> hw crc32c: 0.000 sec ( 309282.38 MB/s)
>
> diff: 1807.08x
> match: 0x0B141F2D
>
> validation: CRC32C("123456789") = 0xE3069283 (correct)
> Attachments:
> * Makefile.RISCV
> * riscv-crc32c.c
> * riscv-des.c
> * riscv-popcnt.c
> * v3-0001-Avoid-Clang-RISC-V-auto-vectorization-bug-in-DES.patch
> * v3-0002-Add-RISC-V-popcount-using-Zbb-extension.patch
> * v3-0003-Add-RISC-V-CRC32C-using-the-Zbc-extension.patchFrom 8fe3527f87e3b22edf1aaf0bb07092013c63c057 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Mon, 23 Mar 2026 11:26:24 -0400
Subject: [PATCH v4 1/3] Avoid Clang RISC-V auto-vectorization bug in DES
Clang 20.1.2 (and possibly earlier/later versions) miscompiles
scatter-write patterns like "array[perm[i]] = i" when compiling with
-O2. This causes incorrect DES permutation tables in
contrib/pgcrypto/crypt-des.c, resulting in wrong password hashes and
authentication failures.
Add compiler barriers (memory clobber asm statements) after scatter
writes in des_init() to prevent auto-vectorization of the affected
loops. The barriers are harmless on all compilers (GCC, Clang, MSVC) and
have negligible performance impact since DES initialization occurs only
once per connection.
The fix applies to all compilers to ensure consistent behavior and
avoid future compiler bugs with similar optimizations.
---
contrib/pgcrypto/crypt-des.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/contrib/pgcrypto/crypt-des.c b/contrib/pgcrypto/crypt-des.c
index 98c30ea122e..0a698da7132 100644
--- a/contrib/pgcrypto/crypt-des.c
+++ b/contrib/pgcrypto/crypt-des.c
@@ -62,12 +62,14 @@
#include "postgres.h"
#include "miscadmin.h"
+#include "port/atomics.h"
#include "port/pg_bswap.h"
#include "px-crypt.h"
#define _PASSWORD_EFMT1 '_'
+
static const char _crypt_a64[] =
"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
@@ -265,6 +267,10 @@ des_init(void)
for (i = 0; i < 64; i++)
{
init_perm[final_perm[i] = IP[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
inv_key_perm[i] = 255;
}
@@ -276,6 +282,10 @@ des_init(void)
{
u_key_perm[i] = key_perm[i] - 1;
inv_key_perm[key_perm[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
inv_comp_perm[i] = 255;
}
@@ -283,7 +293,13 @@ des_init(void)
* Invert the key compression permutation.
*/
for (i = 0; i < 48; i++)
+ {
inv_comp_perm[comp_perm[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
+ }
/*
* Set up the OR-mask arrays for the initial and final permutations, and
@@ -353,7 +369,13 @@ des_init(void)
* the output of the S-box arrays setup above.
*/
for (i = 0; i < 32; i++)
+ {
un_pbox[pbox[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
+ }
for (b = 0; b < 4; b++)
for (i = 0; i < 256; i++)
--
2.51.2
From 9b7f5a7be2123e79a71c781191681c8eb972d795 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 22 Mar 2026 11:15:41 -0400
Subject: [PATCH v4 2/3] Add RISC-V popcount using Zbb extension
Implement hardware popcount support for RISC-V using the Zbb (basic bit
manipulation) extension when present. The Zbb extension provides the
'cpop' instruction which GCC and Clang emit from __builtin_popcountll()
when compiling with -march=rv64gc_zbb.
This patch adds:
- Build-time detection of Zbb support (configure.ac, meson.build)
- Runtime detection using __riscv_hwprobe() on Linux
- Optimized popcount implementation using cpop instruction
The implementation follows established pattern for hardware acceleration
(similar to x86 POPCNT and ARM SVE). Zbb-optimized code is compiled
separately with -march=rv64gc_zbb, while the main binary remains
portable across all RISC-V 64-bit systems.
---
configure.ac | 29 ++++++
meson.build | 32 ++++++
src/include/port/pg_bitutils.h | 2 +-
src/port/meson.build | 7 +-
src/port/pg_bitutils.c | 5 +-
src/port/pg_popcount_riscv.c | 183 +++++++++++++++++++++++++++++++++
6 files changed, 253 insertions(+), 5 deletions(-)
create mode 100644 src/port/pg_popcount_riscv.c
diff --git a/configure.ac b/configure.ac
index 8d176bd3468..da4d3bceb94 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2187,6 +2187,35 @@ if test x"$host_cpu" = x"aarch64"; then
fi
fi
+# Check for RISC-V Zbb bitmanip extension (provides 'cpop' for popcount).
+#
+# The Zbb extension provides the 'cpop' instruction for hardware popcount.
+# GCC/Clang emit the cpop instruction from __builtin_popcountll() when
+# -march=rv64gc_zbb is used. We test compilation with this flag, then
+# restore CFLAGS to avoid global march flags (for binary portability).
+# We define USE_RISCV_ZBB_WITH_RUNTIME_CHECK and use __riscv_hwprobe()
+# for runtime detection. We compile src/port/pg_popcount_riscv.c with
+# -march=rv64gc_zbb separately (like ARM SVE and x86 POPCNT).
+AC_MSG_CHECKING([for RISC-V Zbb extension (cpop/popcount)])
+if test x"$host_cpu" = x"riscv64"; then
+ pgac_save_CFLAGS_zbb="$CFLAGS"
+ CFLAGS="$CFLAGS -march=rv64gc_zbb"
+ AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [/* Test that the compiler will emit cpop from __builtin_popcountll */
+ static inline int test_cpop(unsigned long long x)
+ { return __builtin_popcountll(x); }],
+ [volatile int r = test_cpop(0xdeadbeefULL); (void) r;])],
+ [AC_DEFINE(USE_RISCV_ZBB_WITH_RUNTIME_CHECK, 1,
+ [Define to 1 to use RISC-V Zbb popcount with runtime detection.])
+ CFLAGS="$pgac_save_CFLAGS_zbb"
+ AC_MSG_RESULT([yes, with runtime check])],
+ [CFLAGS="$pgac_save_CFLAGS_zbb"
+ AC_MSG_RESULT([no])])
+else
+ AC_MSG_RESULT([not on RISC-V])
+fi
+
# Check for Intel SSE 4.2 intrinsics to do CRC calculations.
#
PGAC_SSE42_CRC32_INTRINSICS()
diff --git a/meson.build b/meson.build
index 20b887f1a1b..cf7f41715d8 100644
--- a/meson.build
+++ b/meson.build
@@ -2601,6 +2601,38 @@ int main(void)
endif
+# ---------------------------------------------------------------------------
+# Check for RISC-V Zbb bitmanip extension (provides 'cpop' for popcount).
+#
+# The Zbb extension provides the 'cpop' instruction for hardware popcount.
+# GCC/Clang emit the cpop instruction from __builtin_popcountll() when
+# -march=rv64gc_zbb is used. We test compilation with this flag, but
+# do NOT add it globally (for binary portability). Instead, we define
+# USE_RISCV_ZBB_WITH_RUNTIME_CHECK and compile src/port/pg_popcount_riscv.c
+# with -march=rv64gc_zbb separately (like ARM SVE and x86 POPCNT).
+# Runtime detection uses __riscv_hwprobe().
+# ---------------------------------------------------------------------------
+zbb_test_code = '''
+static inline int test_cpop(unsigned long long x)
+{ return __builtin_popcountll(x); }
+int main(void) {
+ volatile int r = test_cpop(0xdeadbeefULL);
+ (void) r;
+ return 0;
+}
+'''
+
+cflags_zbb = []
+if host_cpu == 'riscv64'
+ if cc.compiles(zbb_test_code,
+ args: ['-march=rv64gc_zbb'],
+ name: 'RISC-V Zbb cpop')
+ cdata.set('USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 1)
+ # Flag will be added only to pg_popcount_riscv.c in src/port/meson.build
+ cflags_zbb = ['-march=rv64gc_zbb']
+ endif
+endif
+
###############################################################
# Select CRC-32C implementation.
diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 7a00d197013..cb8d8b6e626 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -279,7 +279,7 @@ pg_ceil_log2_64(uint64 num)
extern uint64 pg_popcount_portable(const char *buf, int bytes);
extern uint64 pg_popcount_masked_portable(const char *buf, int bytes, uint8 mask);
-#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)
+#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK) || defined(USE_RISCV_ZBB_WITH_RUNTIME_CHECK)
/*
* Attempt to use specialized CPU instructions, but perform a runtime check
* first.
diff --git a/src/port/meson.build b/src/port/meson.build
index 922b3f64676..2c0486f5373 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -100,12 +100,15 @@ replace_funcs_pos = [
# loongarch
['pg_crc32c_loongarch', 'USE_LOONGARCH_CRC32C'],
+ # riscv
+ ['pg_popcount_riscv', 'USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 'zbb'],
+
# generic fallback
['pg_crc32c_sb8', 'USE_SLICING_BY_8_CRC32C'],
]
-pgport_cflags = {'crc': cflags_crc}
-pgport_sources_cflags = {'crc': []}
+pgport_cflags = {'crc': cflags_crc, 'zbb': cflags_zbb}
+pgport_sources_cflags = {'crc': [], 'zbb': []}
foreach f : replace_funcs_neg
func = f.get(0)
diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
index 7b11c38c417..23af6c54477 100644
--- a/src/port/pg_bitutils.c
+++ b/src/port/pg_bitutils.c
@@ -162,7 +162,7 @@ pg_popcount_masked_portable(const char *buf, int bytes, uint8 mask)
return popcnt;
}
-#if !defined(HAVE_X86_64_POPCNTQ) && !defined(USE_NEON)
+#if !defined(HAVE_X86_64_POPCNTQ) && !defined(USE_NEON) && !defined(USE_RISCV_ZBB_WITH_RUNTIME_CHECK)
/*
* When special CPU instructions are not available, there's no point in using
@@ -191,4 +191,5 @@ pg_popcount_masked_optimized(const char *buf, int bytes, uint8 mask)
return pg_popcount_masked_portable(buf, bytes, mask);
}
-#endif /* ! HAVE_X86_64_POPCNTQ && ! USE_NEON */
+#endif /* ! HAVE_X86_64_POPCNTQ && ! USE_NEON && !
+ * USE_RISCV_ZBB_WITH_RUNTIME_CHECK */
diff --git a/src/port/pg_popcount_riscv.c b/src/port/pg_popcount_riscv.c
new file mode 100644
index 00000000000..dce68d15c44
--- /dev/null
+++ b/src/port/pg_popcount_riscv.c
@@ -0,0 +1,183 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_popcount_riscv.c
+ * Holds the RISC-V Zbb popcount implementations.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/port/pg_popcount_riscv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#ifdef USE_RISCV_ZBB_WITH_RUNTIME_CHECK
+
+#if defined(__linux__)
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/*
+ * Try to pull in <asm/hwprobe.h> for RISCV_HWPROBE_* / struct riscv_hwprobe.
+ * On older kernel-headers packages (or non-RISC-V Linux distros configured
+ * without multiarch headers) the file may be absent; provide minimal
+ * fallback definitions so this file still builds. The runtime check below
+ * will gracefully report "unavailable" if the syscall fails.
+ */
+#if defined(__has_include)
+#if __has_include(<asm/hwprobe.h>)
+#include <asm/hwprobe.h>
+#define HAVE_ASM_HWPROBE_H 1
+#endif
+#endif
+
+#ifndef HAVE_ASM_HWPROBE_H
+struct riscv_hwprobe
+{
+ int64 key;
+ uint64 value;
+};
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#define RISCV_HWPROBE_EXT_ZBB (UINT64CONST(1) << 4)
+#endif
+
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe 258
+#endif
+#endif /* __linux__ */
+
+#include "port/pg_bitutils.h"
+
+/*
+ * Hardware implementation using RISC-V Zbb cpop instruction.
+ */
+static uint64 pg_popcount_zbb(const char *buf, int bytes);
+static uint64 pg_popcount_masked_zbb(const char *buf, int bytes, uint8 mask);
+
+/*
+ * The function pointers are initially set to "choose" functions. These
+ * functions will first set the pointers to the right implementations (based on
+ * what the current CPU supports) and then will call the pointer to fulfill the
+ * caller's request.
+ */
+static uint64 pg_popcount_choose(const char *buf, int bytes);
+static uint64 pg_popcount_masked_choose(const char *buf, int bytes, uint8 mask);
+uint64 (*pg_popcount_optimized) (const char *buf, int bytes) = pg_popcount_choose;
+uint64 (*pg_popcount_masked_optimized) (const char *buf, int bytes, uint8 mask) = pg_popcount_masked_choose;
+
+static inline bool
+pg_popcount_zbb_available(void)
+{
+#if defined(__linux__)
+ struct riscv_hwprobe pair = {.key = RISCV_HWPROBE_KEY_IMA_EXT_0};
+
+ if (syscall(__NR_riscv_hwprobe, &pair, 1, 0, NULL, 0) != 0)
+ return false;
+
+ return (pair.value & RISCV_HWPROBE_EXT_ZBB) != 0;
+#else
+ return false;
+#endif
+}
+
+static inline void
+choose_popcount_functions(void)
+{
+ if (pg_popcount_zbb_available())
+ {
+ pg_popcount_optimized = pg_popcount_zbb;
+ pg_popcount_masked_optimized = pg_popcount_masked_zbb;
+ }
+ else
+ {
+ pg_popcount_optimized = pg_popcount_portable;
+ pg_popcount_masked_optimized = pg_popcount_masked_portable;
+ }
+}
+
+static uint64
+pg_popcount_choose(const char *buf, int bytes)
+{
+ choose_popcount_functions();
+ return pg_popcount_optimized(buf, bytes);
+}
+
+static uint64
+pg_popcount_masked_choose(const char *buf, int bytes, uint8 mask)
+{
+ choose_popcount_functions();
+ return pg_popcount_masked_optimized(buf, bytes, mask);
+}
+
+/*
+ * pg_popcount64_zbb
+ * Return the number of 1 bits set in word
+ *
+ * Uses the RISC-V Zbb 'cpop' (count population) instruction via
+ * __builtin_popcountll(). When compiled with -march=rv64gc_zbb, GCC and
+ * Clang will emit the cpop instruction for this builtin.
+ */
+static inline int
+pg_popcount64_zbb(uint64 word)
+{
+ return __builtin_popcountll(word);
+}
+
+/*
+ * pg_popcount_zbb
+ * Returns number of 1 bits in buf
+ *
+ * Similar approach to x86 SSE4.2 POPCNT: process data in 8-byte chunks using
+ * the cpop instruction, with byte-by-byte fallback for remaining data.
+ */
+static uint64
+pg_popcount_zbb(const char *buf, int bytes)
+{
+ uint64 popcnt = 0;
+ const uint64 *words = (const uint64 *) buf;
+
+ /* Process 8-byte chunks */
+ while (bytes >= 8)
+ {
+ popcnt += pg_popcount64_zbb(*words++);
+ bytes -= 8;
+ }
+
+ buf = (const char *) words;
+
+ /* Process any remaining bytes */
+ while (bytes--)
+ popcnt += pg_number_of_ones[(unsigned char) *buf++];
+
+ return popcnt;
+}
+
+/*
+ * pg_popcount_masked_zbb
+ * Returns number of 1 bits in buf after applying the mask to each byte
+ */
+static uint64
+pg_popcount_masked_zbb(const char *buf, int bytes, uint8 mask)
+{
+ uint64 popcnt = 0;
+ uint64 maskv = ~UINT64CONST(0) / 0xFF * mask;
+ const uint64 *words = (const uint64 *) buf;
+
+ /* Process 8-byte chunks */
+ while (bytes >= 8)
+ {
+ popcnt += pg_popcount64_zbb(*words++ & maskv);
+ bytes -= 8;
+ }
+
+ buf = (const char *) words;
+
+ /* Process any remaining bytes */
+ while (bytes--)
+ popcnt += pg_number_of_ones[(unsigned char) *buf++ & mask];
+
+ return popcnt;
+}
+
+#endif /* USE_RISCV_ZBB_WITH_RUNTIME_CHECK */
--
2.51.2
From 7e78da689961fa8ca341c1a63bce1f7a0c8969a8 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Mon, 23 Mar 2026 12:31:58 +0000
Subject: [PATCH v4 3/3] Add RISC-V CRC32C using the Zbc extension
This adds hardware-accelerated CRC-32C computation for RISC-V platforms
with the Zbc (carry-less multiply) or Zbkc (crypto carry-less)
extension.
The implementation uses the clmul and clmulh instructions for polynomial
folding with Barrett reduction to compute CRC-32C checksums. This
provides approximately 20x speedup over the software slicing-by-8
implementation.
The algorithm is based on the Google Abseil project's RISC-V CRC32C
implementation (https://github.com/abseil/abseil-cpp/pull/1986 in
absl/crc/internal/crc_riscv.cc) that is Copyright 2025 The Abseil
Authors licensed under the Apache License, Version 2.0.
Runtime detection uses the Linux riscv_hwprobe syscall (kernel 6.4+) to
check for Zbc/Zbkc support, falling back gracefully to software on older
kernels or non-Linux platforms.
Similar to ARMv8 CRC Extension and x86 SSE 4.2 support, this is compiled
with '-march=rv64gc_zbc' and selected at runtime based on CPU
capabilities.
---
config/c-compiler.m4 | 41 +++++
configure.ac | 36 ++++-
meson.build | 36 +++++
src/include/port/pg_crc32c.h | 14 ++
src/port/meson.build | 3 +
src/port/pg_crc32c_riscv_choose.c | 101 ++++++++++++
src/port/pg_crc32c_riscv_zbc.c | 257 ++++++++++++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 482 insertions(+), 7 deletions(-)
create mode 100644 src/port/pg_crc32c_riscv_choose.c
create mode 100644 src/port/pg_crc32c_riscv_zbc.c
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3eab0da9cb6..00143c482c1 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -854,6 +854,47 @@ fi
undefine([Ac_cachevar])dnl
])# PGAC_LOONGARCH_CRC32C_INTRINSICS
+# PGAC_RISCV_ZBC_CRC32C_INTRINSICS
+# ---------------------------------
+# Check if the compiler supports RISC-V Zbc (carry-less multiply) instructions
+# for CRC-32C computation, using inline assembly for clmul instruction.
+#
+# An optional compiler flag can be passed as argument (e.g. -march=rv64gc_zbc).
+# If the intrinsics are supported, sets pgac_riscv_zbc_crc32c_intrinsics and
+# CFLAGS_CRC.
+#
+# The Zbc extension provides clmul and clmulh instructions which are used with
+# polynomial folding to compute CRC-32C. This implementation is based on the
+# algorithm from Google Abseil (https://github.com/abseil/abseil-cpp/pull/1986).
+AC_DEFUN([PGAC_RISCV_ZBC_CRC32C_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_riscv_zbc_crc32c_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for RISC-V Zbc clmul with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([
+#if !defined(__riscv) || !defined(__riscv_xlen) || __riscv_xlen != 64
+#error not RISC-V 64-bit
+#endif
+
+static inline unsigned long clmul_test(unsigned long a, unsigned long b)
+{
+ unsigned long result;
+ __asm__("clmul %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
+ return result;
+}],
+ [unsigned long result = clmul_test(0x123, 0x456);
+ /* return computed value, to prevent the above being optimized away */
+ return result == 0;])],
+ [Ac_cachevar=yes],
+ [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+ CFLAGS_CRC="$1"
+ pgac_riscv_zbc_crc32c_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_RISCV_ZBC_CRC32C_INTRINSICS
+
# PGAC_XSAVE_INTRINSICS
# ---------------------
# Check if the compiler supports the XSAVE instructions using the _xgetbv
diff --git a/configure.ac b/configure.ac
index da4d3bceb94..7154a578b7c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2248,6 +2248,17 @@ fi
# with the default compiler flags.
PGAC_LOONGARCH_CRC32C_INTRINSICS()
+# Check for RISC-V Zbc (carry-less multiply) for CRC calculations.
+#
+# The Zbc extension provides clmul and clmulh instructions for hardware-
+# accelerated CRC-32C computation using polynomial folding. Check if we
+# can compile with -march=rv64gc_zbc flag. CFLAGS_CRC is set if the flag
+# is required.
+#
+# This implementation is based on Google Abseil's algorithm:
+# https://github.com/abseil/abseil-cpp/pull/1986
+PGAC_RISCV_ZBC_CRC32C_INTRINSICS([-march=rv64gc_zbc])
+
AC_SUBST(CFLAGS_CRC)
# Select CRC-32C implementation.
@@ -2278,7 +2289,7 @@ AC_SUBST(CFLAGS_CRC)
#
# If we are targeting a LoongArch processor, CRC instructions are
# always available (at least on 64 bit), so no runtime check is needed.
-if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_LOONGARCH_CRC32C" = x""; then
+if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_LOONGARCH_CRC32C" = x"" && test x"$USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK" = x""; then
# Use Intel SSE 4.2 if available.
if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
USE_SSE42_CRC32C=1
@@ -2300,9 +2311,14 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" &&
if test x"$pgac_loongarch_crc32c_intrinsics" = x"yes"; then
USE_LOONGARCH_CRC32C=1
else
- # fall back to slicing-by-8 algorithm, which doesn't require any
- # special CPU support.
- USE_SLICING_BY_8_CRC32C=1
+ # RISC-V Zbc CRC, with runtime check.
+ if test x"$pgac_riscv_zbc_crc32c_intrinsics" = x"yes"; then
+ USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK=1
+ else
+ # fall back to slicing-by-8 algorithm, which doesn't require any
+ # special CPU support.
+ USE_SLICING_BY_8_CRC32C=1
+ fi
fi
fi
fi
@@ -2337,9 +2353,15 @@ else
PG_CRC32C_OBJS="pg_crc32c_loongarch.o"
AC_MSG_RESULT(LoongArch CRCC instructions)
else
- AC_DEFINE(USE_SLICING_BY_8_CRC32C, 1, [Define to 1 to use software CRC-32C implementation (slicing-by-8).])
- PG_CRC32C_OBJS="pg_crc32c_sb8.o"
- AC_MSG_RESULT(slicing-by-8)
+ if test x"$USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK" = x"1"; then
+ AC_DEFINE(USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK, 1, [Define to 1 to use RISC-V Zbc CRC instructions with a runtime check.])
+ PG_CRC32C_OBJS="pg_crc32c_riscv_zbc.o pg_crc32c_sb8.o pg_crc32c_riscv_choose.o"
+ AC_MSG_RESULT(RISC-V Zbc instructions with runtime check)
+ else
+ AC_DEFINE(USE_SLICING_BY_8_CRC32C, 1, [Define to 1 to use software CRC-32C implementation (slicing-by-8).])
+ PG_CRC32C_OBJS="pg_crc32c_sb8.o"
+ AC_MSG_RESULT(slicing-by-8)
+ fi
fi
fi
fi
diff --git a/meson.build b/meson.build
index cf7f41715d8..9d1460ff952 100644
--- a/meson.build
+++ b/meson.build
@@ -2835,6 +2835,42 @@ int main(void)
have_optimized_crc = true
endif
+elif host_cpu == 'riscv64'
+
+ # Check for RISC-V Zbc (carry-less multiply) extension for CRC-32C.
+ # The Zbc extension provides clmul and clmulh instructions used for
+ # hardware-accelerated CRC computation via polynomial folding.
+ #
+ # This implementation is based on Google Abseil's algorithm:
+ # https://github.com/abseil/abseil-cpp/pull/1986
+
+ prog = '''
+#if !defined(__riscv) || !defined(__riscv_xlen) || __riscv_xlen != 64
+#error not RISC-V 64-bit
+#endif
+
+static inline unsigned long clmul(unsigned long a, unsigned long b)
+{
+ unsigned long result;
+ __asm__("clmul %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
+ return result;
+}
+
+int main(void)
+{
+ unsigned long result = clmul(0x123, 0x456);
+ return result == 0;
+}
+'''
+
+ if cc.links(prog, name: 'RISC-V Zbc clmul with -march=rv64gc_zbc',
+ args: test_c_args + ['-march=rv64gc_zbc'])
+ # Use RISC-V Zbc CRC, with runtime check
+ cflags_crc += '-march=rv64gc_zbc'
+ cdata.set('USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK', 1)
+ have_optimized_crc = true
+ endif
+
endif
if not have_optimized_crc
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index 2f22e176a66..3e60a23b947 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -166,6 +166,20 @@ extern pg_crc32c pg_comp_crc32c_armv8(pg_crc32c crc, const void *data, size_t le
extern pg_crc32c pg_comp_crc32c_pmull(pg_crc32c crc, const void *data, size_t len);
#endif
+#elif defined(USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK)
+
+/*
+ * Use RISC-V Zbc instructions, but perform a runtime check first
+ * to check that they are available.
+ */
+#define COMP_CRC32C(crc, data, len) \
+ ((crc) = pg_comp_crc32c((crc), (data), (len)))
+#define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF)
+
+extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c pg_comp_crc32c_riscv_zbc(pg_crc32c crc, const void *data, size_t len);
+
#else
/*
* Use slicing-by-8 algorithm.
diff --git a/src/port/meson.build b/src/port/meson.build
index 2c0486f5373..c1427240511 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -101,6 +101,9 @@ replace_funcs_pos = [
['pg_crc32c_loongarch', 'USE_LOONGARCH_CRC32C'],
# riscv
+ ['pg_crc32c_riscv_zbc', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK', 'crc'],
+ ['pg_crc32c_riscv_choose', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK'],
+ ['pg_crc32c_sb8', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK'],
['pg_popcount_riscv', 'USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 'zbb'],
# generic fallback
diff --git a/src/port/pg_crc32c_riscv_choose.c b/src/port/pg_crc32c_riscv_choose.c
new file mode 100644
index 00000000000..18d105e5e12
--- /dev/null
+++ b/src/port/pg_crc32c_riscv_choose.c
@@ -0,0 +1,101 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_crc32c_riscv_choose.c
+ * Choose between RISC-V Zbc and software CRC-32C implementation.
+ *
+ * On first call, checks if the CPU supports the RISC-V Zbc (or Zbkc) extension.
+ * If it does, use carry-less multiply instructions for CRC-32C computation.
+ * Otherwise, fall back to the pure software implementation (slicing-by-8).
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/port/pg_crc32c_riscv_choose.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#include "port/pg_crc32c.h"
+
+/*
+ * RISC-V hardware probing definitions
+ */
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe 258
+#endif
+
+#ifndef RISCV_HWPROBE_KEY_IMA_EXT_0
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#endif
+
+#ifndef RISCV_HWPROBE_EXT_ZBC
+#define RISCV_HWPROBE_EXT_ZBC (1ULL << 7)
+#endif
+
+#ifndef RISCV_HWPROBE_EXT_ZBKC
+#define RISCV_HWPROBE_EXT_ZBKC (1ULL << 27)
+#endif
+
+struct riscv_hwprobe
+{
+ int64 key;
+ uint64 value;
+};
+
+/*
+ * Check if RISC-V Zbc or Zbkc extension is available
+ *
+ * Uses the riscv_hwprobe syscall which is available on Linux kernel 6.4+
+ * Falls back to software if the syscall fails or extensions are not available.
+ */
+static bool
+pg_crc32c_riscv_zbc_available(void)
+{
+#if defined(__linux__) && defined(__riscv) && (__riscv_xlen == 64)
+ struct riscv_hwprobe pair = {.key = RISCV_HWPROBE_KEY_IMA_EXT_0};
+
+ /*
+ * Make the syscall. If it fails (e.g., old kernel, non-Linux), fall back
+ * to software.
+ */
+ if (syscall(__NR_riscv_hwprobe, &pair, 1, 0, NULL, 0) != 0)
+ return false;
+
+ /*
+ * Check if either Zbc (general bitmanip carry-less) or Zbkc (crypto
+ * carry-less) is available. Both provide clmul/clmulh instructions.
+ */
+ return (pair.value & (RISCV_HWPROBE_EXT_ZBC | RISCV_HWPROBE_EXT_ZBKC)) != 0;
+#else
+ /* Not on RISC-V Linux, or not 64-bit - use software fallback */
+ return false;
+#endif
+}
+
+/*
+ * This gets called on the first call. It replaces the function pointer
+ * so that subsequent calls are routed directly to the chosen implementation.
+ */
+static pg_crc32c
+pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len)
+{
+ if (pg_crc32c_riscv_zbc_available())
+ pg_comp_crc32c = pg_comp_crc32c_riscv_zbc;
+ else
+ pg_comp_crc32c = pg_comp_crc32c_sb8;
+
+ return pg_comp_crc32c(crc, data, len);
+}
+
+pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len) = pg_comp_crc32c_choose;
diff --git a/src/port/pg_crc32c_riscv_zbc.c b/src/port/pg_crc32c_riscv_zbc.c
new file mode 100644
index 00000000000..9eb845dca69
--- /dev/null
+++ b/src/port/pg_crc32c_riscv_zbc.c
@@ -0,0 +1,257 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_crc32c_riscv_zbc.c
+ * Compute CRC-32C checksum using RISC-V Zbc carry-less multiply instructions
+ *
+ * This implementation uses the RISC-V Zbc (or Zbkc) extension for hardware-
+ * accelerated CRC-32C computation. It uses carry-less multiplication (clmul
+ * and clmulh) with polynomial folding and Barrett reduction.
+ *
+ * The algorithm is based on Google Abseil's implementation:
+ * https://github.com/abseil/abseil-cpp/pull/1986
+ * File: absl/crc/internal/crc_riscv.cc
+ *
+ * Copyright 2025 The Abseil Authors
+ * Licensed under the Apache License, Version 2.0
+ * Adapted for PostgreSQL under PostgreSQL license
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/port/pg_crc32c_riscv_zbc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#ifdef WORDS_BIGENDIAN
+#error "RISC-V Zbc CRC implementation does not support big-endian systems"
+#endif
+
+#include "port/pg_crc32c.h"
+
+/*
+ * 128-bit value for polynomial arithmetic
+ */
+typedef struct
+{
+ uint64 lo;
+ uint64 hi;
+} V128;
+
+/*
+ * Carry-less multiply instructions from RISC-V Zbc/Zbkc extension
+ */
+static inline uint64
+pg_clmul(uint64 a, uint64 b)
+{
+ uint64 _res;
+
+ __asm__(
+ " clmul %0, %1, %2\n"
+: "=r"(_res)
+: "r"(a), "r"(b));
+
+ return _res;
+}
+
+static inline uint64
+pg_clmulh(uint64 a, uint64 b)
+{
+ uint64 _res;
+
+ __asm__(
+ " clmulh %0, %1, %2"
+: "=r"(_res)
+: "r"(a), "r"(b));
+
+ return _res;
+}
+
+static inline V128
+pg_clmul128(uint64 a, uint64 b)
+{
+ V128 result;
+
+ result.lo = pg_clmul(a, b);
+ result.hi = pg_clmulh(a, b);
+ return result;
+}
+
+/*
+ * 128-bit operations
+ */
+static inline V128
+pg_v128_xor(V128 a, V128 b)
+{
+ V128 result;
+
+ result.lo = a.lo ^ b.lo;
+ result.hi = a.hi ^ b.hi;
+ return result;
+}
+
+static inline V128
+pg_v128_and_mask32(V128 a)
+{
+ V128 result;
+
+ result.lo = a.lo & UINT64CONST(0x00000000FFFFFFFF);
+ result.hi = a.hi & UINT64CONST(0x00000000FFFFFFFF);
+ return result;
+}
+
+static inline V128
+pg_v128_shift_right64(V128 a)
+{
+ V128 result;
+
+ result.lo = a.hi;
+ result.hi = 0;
+ return result;
+}
+
+static inline V128
+pg_v128_shift_right32(V128 a)
+{
+ V128 result;
+
+ result.lo = (a.lo >> 32) | (a.hi << 32);
+ result.hi = (a.hi >> 32);
+ return result;
+}
+
+static inline V128
+pg_v128_load(const unsigned char *p)
+{
+ V128 result;
+
+ /*
+ * Load 16 bytes as two 64-bit values. Use direct loads like Abseil
+ * reference implementation. RISC-V is always little-endian so no byte
+ * swapping needed.
+ */
+ result.lo = *(const uint64 *) p;
+ result.hi = *(const uint64 *) (p + 8);
+ return result;
+}
+
+/*
+ * CRC-32C (Castagnoli) polynomial folding constants. These are computed
+ * for the polynomial 0x1EDC6F41 (normal form) or 0x82F63B78 (reflected).
+ */
+static const uint64 kK5 = UINT64CONST(0x0f20c0dfe); /* Folding constant */
+static const uint64 kK6 = UINT64CONST(0x14cd00bd6); /* Folding constant */
+static const uint64 kK7 = UINT64CONST(0x0dd45aab8); /* 64->32 reduction */
+static const uint64 kP1 = UINT64CONST(0x105ec76f0); /* Barrett reduction */
+static const uint64 kP2 = UINT64CONST(0x0dea713f1); /* Barrett reduction */
+
+/*
+ * Core CRC-32C computation using carry-less multiplication.
+ *
+ * Input: CRC in working form (already inverted with ~crc)
+ * Output: CRC in working form (still inverted)
+ *
+ * Precondition: len >= 32 and len % 16 == 0
+ */
+static uint32
+pg_crc32c_clmul_core(uint32 crc_inverted, const unsigned char *buf, uint64 len)
+{
+ V128 x;
+
+ /* Load first 16-byte block and XOR with inverted CRC */
+ x = pg_v128_load(buf);
+ x.lo ^= (uint64) crc_inverted;
+ buf += 16;
+ len -= 16;
+
+ /* Fold 16-byte blocks into 128-bit accumulator */
+ while (len >= 16)
+ {
+ V128 block = pg_v128_load(buf);
+ V128 lo = pg_clmul128(x.lo, kK5);
+ V128 hi = pg_clmul128(x.hi, kK6);
+
+ x = pg_v128_xor(pg_v128_xor(lo, hi), block);
+ buf += 16;
+ len -= 16;
+ }
+
+ /* Reduce 128-bit to 64-bit */
+ {
+ V128 tmp = pg_clmul128(kK6, x.lo);
+
+ x = pg_v128_xor(pg_v128_shift_right64(x), tmp);
+ }
+
+ /* Reduce 64-bit to 32-bit */
+ {
+ V128 tmp = pg_v128_shift_right32(x);
+
+ x = pg_v128_and_mask32(x);
+ x = pg_clmul128(kK7, x.lo);
+ x = pg_v128_xor(x, tmp);
+ }
+
+ /* Barrett reduction to final 32-bit CRC */
+ {
+ V128 tmp = pg_v128_and_mask32(x);
+
+ tmp = pg_clmul128(kP2, tmp.lo);
+ tmp = pg_v128_and_mask32(tmp);
+ tmp = pg_clmul128(kP1, tmp.lo);
+ x = pg_v128_xor(x, tmp);
+ }
+
+ /* Extract result from second 32-bit lane */
+ return (uint32) ((x.lo >> 32) & UINT64CONST(0xFFFFFFFF));
+}
+
+/*
+ * Main CRC-32C computation function with RISC-V Zbc acceleration
+ */
+pg_crc32c
+pg_comp_crc32c_riscv_zbc(pg_crc32c crc, const void *data, size_t len)
+{
+ const unsigned char *p = data;
+ const size_t kMinLen = 32;
+ const size_t kChunkLen = 16;
+ size_t tail;
+
+ /* Use software fallback for small buffers */
+ if (len < kMinLen)
+ return pg_comp_crc32c_sb8(crc, data, len);
+
+ /*
+ * Process head bytes to align to 16-byte boundary if needed. The hardware
+ * algorithm requires 16-byte aligned access.
+ */
+ /* Process tail bytes with software (Abseil approach) */
+ tail = len % kChunkLen;
+ if (tail)
+ {
+ crc = pg_comp_crc32c_sb8(crc, p, tail);
+ p += tail;
+ len -= tail;
+ }
+
+ /*
+ * Process remaining bytes (now a multiple of 16) with hardware. The core
+ * algorithm requires at least 32 bytes.
+ */
+ if (len >= 32)
+ {
+ /*
+ * The Abseil core algorithm expects to receive 0xFFFFFFFF as the
+ * initial CRC value (corresponding to Abseil's initial value of 0
+ * after inversion). PostgreSQL's convention already passes 0xFFFFFFFF
+ * initially, so pass it directly. The core returns a value that needs
+ * final XOR with 0xFFFFFFFF (done by the caller).
+ */
+ crc = pg_crc32c_clmul_core(crc, p, len);
+ }
+
+ return crc;
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cf40c87043..372a80c067f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3393,6 +3393,7 @@ VirtualTransactionId
VirtualTupleTableSlot
VolatileFunctionStatus
Vsrt
+V128
WAIT_ORDER
WALAvailability
WALInsertLock
--
2.51.2
From 8fe3527f87e3b22edf1aaf0bb07092013c63c057 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Mon, 23 Mar 2026 11:26:24 -0400
Subject: [PATCH v4 1/3] Avoid Clang RISC-V auto-vectorization bug in DES
Clang 20.1.2 (and possibly earlier/later versions) miscompiles
scatter-write patterns like "array[perm[i]] = i" when compiling with
-O2. This causes incorrect DES permutation tables in
contrib/pgcrypto/crypt-des.c, resulting in wrong password hashes and
authentication failures.
Add compiler barriers (memory clobber asm statements) after scatter
writes in des_init() to prevent auto-vectorization of the affected
loops. The barriers are harmless on all compilers (GCC, Clang, MSVC) and
have negligible performance impact since DES initialization occurs only
once per connection.
The fix applies to all compilers to ensure consistent behavior and
avoid future compiler bugs with similar optimizations.
---
contrib/pgcrypto/crypt-des.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/contrib/pgcrypto/crypt-des.c b/contrib/pgcrypto/crypt-des.c
index 98c30ea122e..0a698da7132 100644
--- a/contrib/pgcrypto/crypt-des.c
+++ b/contrib/pgcrypto/crypt-des.c
@@ -62,12 +62,14 @@
#include "postgres.h"
#include "miscadmin.h"
+#include "port/atomics.h"
#include "port/pg_bswap.h"
#include "px-crypt.h"
#define _PASSWORD_EFMT1 '_'
+
static const char _crypt_a64[] =
"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
@@ -265,6 +267,10 @@ des_init(void)
for (i = 0; i < 64; i++)
{
init_perm[final_perm[i] = IP[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
inv_key_perm[i] = 255;
}
@@ -276,6 +282,10 @@ des_init(void)
{
u_key_perm[i] = key_perm[i] - 1;
inv_key_perm[key_perm[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
inv_comp_perm[i] = 255;
}
@@ -283,7 +293,13 @@ des_init(void)
* Invert the key compression permutation.
*/
for (i = 0; i < 48; i++)
+ {
inv_comp_perm[comp_perm[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
+ }
/*
* Set up the OR-mask arrays for the initial and final permutations, and
@@ -353,7 +369,13 @@ des_init(void)
* the output of the S-box arrays setup above.
*/
for (i = 0; i < 32; i++)
+ {
un_pbox[pbox[i] - 1] = i;
+ /* This prevents a Clang bug related to auto-vectorization */
+#if defined(__riscv) && defined(__clang__)
+ pg_memory_barrier();
+#endif
+ }
for (b = 0; b < 4; b++)
for (i = 0; i < 256; i++)
--
2.51.2
From 9b7f5a7be2123e79a71c781191681c8eb972d795 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 22 Mar 2026 11:15:41 -0400
Subject: [PATCH v4 2/3] Add RISC-V popcount using Zbb extension
Implement hardware popcount support for RISC-V using the Zbb (basic bit
manipulation) extension when present. The Zbb extension provides the
'cpop' instruction which GCC and Clang emit from __builtin_popcountll()
when compiling with -march=rv64gc_zbb.
This patch adds:
- Build-time detection of Zbb support (configure.ac, meson.build)
- Runtime detection using __riscv_hwprobe() on Linux
- Optimized popcount implementation using cpop instruction
The implementation follows established pattern for hardware acceleration
(similar to x86 POPCNT and ARM SVE). Zbb-optimized code is compiled
separately with -march=rv64gc_zbb, while the main binary remains
portable across all RISC-V 64-bit systems.
---
configure.ac | 29 ++++++
meson.build | 32 ++++++
src/include/port/pg_bitutils.h | 2 +-
src/port/meson.build | 7 +-
src/port/pg_bitutils.c | 5 +-
src/port/pg_popcount_riscv.c | 183 +++++++++++++++++++++++++++++++++
6 files changed, 253 insertions(+), 5 deletions(-)
create mode 100644 src/port/pg_popcount_riscv.c
diff --git a/configure.ac b/configure.ac
index 8d176bd3468..da4d3bceb94 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2187,6 +2187,35 @@ if test x"$host_cpu" = x"aarch64"; then
fi
fi
+# Check for RISC-V Zbb bitmanip extension (provides 'cpop' for popcount).
+#
+# The Zbb extension provides the 'cpop' instruction for hardware popcount.
+# GCC/Clang emit the cpop instruction from __builtin_popcountll() when
+# -march=rv64gc_zbb is used. We test compilation with this flag, then
+# restore CFLAGS to avoid global march flags (for binary portability).
+# We define USE_RISCV_ZBB_WITH_RUNTIME_CHECK and use __riscv_hwprobe()
+# for runtime detection. We compile src/port/pg_popcount_riscv.c with
+# -march=rv64gc_zbb separately (like ARM SVE and x86 POPCNT).
+AC_MSG_CHECKING([for RISC-V Zbb extension (cpop/popcount)])
+if test x"$host_cpu" = x"riscv64"; then
+ pgac_save_CFLAGS_zbb="$CFLAGS"
+ CFLAGS="$CFLAGS -march=rv64gc_zbb"
+ AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [/* Test that the compiler will emit cpop from __builtin_popcountll */
+ static inline int test_cpop(unsigned long long x)
+ { return __builtin_popcountll(x); }],
+ [volatile int r = test_cpop(0xdeadbeefULL); (void) r;])],
+ [AC_DEFINE(USE_RISCV_ZBB_WITH_RUNTIME_CHECK, 1,
+ [Define to 1 to use RISC-V Zbb popcount with runtime detection.])
+ CFLAGS="$pgac_save_CFLAGS_zbb"
+ AC_MSG_RESULT([yes, with runtime check])],
+ [CFLAGS="$pgac_save_CFLAGS_zbb"
+ AC_MSG_RESULT([no])])
+else
+ AC_MSG_RESULT([not on RISC-V])
+fi
+
# Check for Intel SSE 4.2 intrinsics to do CRC calculations.
#
PGAC_SSE42_CRC32_INTRINSICS()
diff --git a/meson.build b/meson.build
index 20b887f1a1b..cf7f41715d8 100644
--- a/meson.build
+++ b/meson.build
@@ -2601,6 +2601,38 @@ int main(void)
endif
+# ---------------------------------------------------------------------------
+# Check for RISC-V Zbb bitmanip extension (provides 'cpop' for popcount).
+#
+# The Zbb extension provides the 'cpop' instruction for hardware popcount.
+# GCC/Clang emit the cpop instruction from __builtin_popcountll() when
+# -march=rv64gc_zbb is used. We test compilation with this flag, but
+# do NOT add it globally (for binary portability). Instead, we define
+# USE_RISCV_ZBB_WITH_RUNTIME_CHECK and compile src/port/pg_popcount_riscv.c
+# with -march=rv64gc_zbb separately (like ARM SVE and x86 POPCNT).
+# Runtime detection uses __riscv_hwprobe().
+# ---------------------------------------------------------------------------
+zbb_test_code = '''
+static inline int test_cpop(unsigned long long x)
+{ return __builtin_popcountll(x); }
+int main(void) {
+ volatile int r = test_cpop(0xdeadbeefULL);
+ (void) r;
+ return 0;
+}
+'''
+
+cflags_zbb = []
+if host_cpu == 'riscv64'
+ if cc.compiles(zbb_test_code,
+ args: ['-march=rv64gc_zbb'],
+ name: 'RISC-V Zbb cpop')
+ cdata.set('USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 1)
+ # Flag will be added only to pg_popcount_riscv.c in src/port/meson.build
+ cflags_zbb = ['-march=rv64gc_zbb']
+ endif
+endif
+
###############################################################
# Select CRC-32C implementation.
diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 7a00d197013..cb8d8b6e626 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -279,7 +279,7 @@ pg_ceil_log2_64(uint64 num)
extern uint64 pg_popcount_portable(const char *buf, int bytes);
extern uint64 pg_popcount_masked_portable(const char *buf, int bytes, uint8 mask);
-#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)
+#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK) || defined(USE_RISCV_ZBB_WITH_RUNTIME_CHECK)
/*
* Attempt to use specialized CPU instructions, but perform a runtime check
* first.
diff --git a/src/port/meson.build b/src/port/meson.build
index 922b3f64676..2c0486f5373 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -100,12 +100,15 @@ replace_funcs_pos = [
# loongarch
['pg_crc32c_loongarch', 'USE_LOONGARCH_CRC32C'],
+ # riscv
+ ['pg_popcount_riscv', 'USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 'zbb'],
+
# generic fallback
['pg_crc32c_sb8', 'USE_SLICING_BY_8_CRC32C'],
]
-pgport_cflags = {'crc': cflags_crc}
-pgport_sources_cflags = {'crc': []}
+pgport_cflags = {'crc': cflags_crc, 'zbb': cflags_zbb}
+pgport_sources_cflags = {'crc': [], 'zbb': []}
foreach f : replace_funcs_neg
func = f.get(0)
diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
index 7b11c38c417..23af6c54477 100644
--- a/src/port/pg_bitutils.c
+++ b/src/port/pg_bitutils.c
@@ -162,7 +162,7 @@ pg_popcount_masked_portable(const char *buf, int bytes, uint8 mask)
return popcnt;
}
-#if !defined(HAVE_X86_64_POPCNTQ) && !defined(USE_NEON)
+#if !defined(HAVE_X86_64_POPCNTQ) && !defined(USE_NEON) && !defined(USE_RISCV_ZBB_WITH_RUNTIME_CHECK)
/*
* When special CPU instructions are not available, there's no point in using
@@ -191,4 +191,5 @@ pg_popcount_masked_optimized(const char *buf, int bytes, uint8 mask)
return pg_popcount_masked_portable(buf, bytes, mask);
}
-#endif /* ! HAVE_X86_64_POPCNTQ && ! USE_NEON */
+#endif /* ! HAVE_X86_64_POPCNTQ && ! USE_NEON && !
+ * USE_RISCV_ZBB_WITH_RUNTIME_CHECK */
diff --git a/src/port/pg_popcount_riscv.c b/src/port/pg_popcount_riscv.c
new file mode 100644
index 00000000000..dce68d15c44
--- /dev/null
+++ b/src/port/pg_popcount_riscv.c
@@ -0,0 +1,183 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_popcount_riscv.c
+ * Holds the RISC-V Zbb popcount implementations.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/port/pg_popcount_riscv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#ifdef USE_RISCV_ZBB_WITH_RUNTIME_CHECK
+
+#if defined(__linux__)
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/*
+ * Try to pull in <asm/hwprobe.h> for RISCV_HWPROBE_* / struct riscv_hwprobe.
+ * On older kernel-headers packages (or non-RISC-V Linux distros configured
+ * without multiarch headers) the file may be absent; provide minimal
+ * fallback definitions so this file still builds. The runtime check below
+ * will gracefully report "unavailable" if the syscall fails.
+ */
+#if defined(__has_include)
+#if __has_include(<asm/hwprobe.h>)
+#include <asm/hwprobe.h>
+#define HAVE_ASM_HWPROBE_H 1
+#endif
+#endif
+
+#ifndef HAVE_ASM_HWPROBE_H
+struct riscv_hwprobe
+{
+ int64 key;
+ uint64 value;
+};
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#define RISCV_HWPROBE_EXT_ZBB (UINT64CONST(1) << 4)
+#endif
+
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe 258
+#endif
+#endif /* __linux__ */
+
+#include "port/pg_bitutils.h"
+
+/*
+ * Hardware implementation using RISC-V Zbb cpop instruction.
+ */
+static uint64 pg_popcount_zbb(const char *buf, int bytes);
+static uint64 pg_popcount_masked_zbb(const char *buf, int bytes, uint8 mask);
+
+/*
+ * The function pointers are initially set to "choose" functions. These
+ * functions will first set the pointers to the right implementations (based on
+ * what the current CPU supports) and then will call the pointer to fulfill the
+ * caller's request.
+ */
+static uint64 pg_popcount_choose(const char *buf, int bytes);
+static uint64 pg_popcount_masked_choose(const char *buf, int bytes, uint8 mask);
+uint64 (*pg_popcount_optimized) (const char *buf, int bytes) = pg_popcount_choose;
+uint64 (*pg_popcount_masked_optimized) (const char *buf, int bytes, uint8 mask) = pg_popcount_masked_choose;
+
+static inline bool
+pg_popcount_zbb_available(void)
+{
+#if defined(__linux__)
+ struct riscv_hwprobe pair = {.key = RISCV_HWPROBE_KEY_IMA_EXT_0};
+
+ if (syscall(__NR_riscv_hwprobe, &pair, 1, 0, NULL, 0) != 0)
+ return false;
+
+ return (pair.value & RISCV_HWPROBE_EXT_ZBB) != 0;
+#else
+ return false;
+#endif
+}
+
+static inline void
+choose_popcount_functions(void)
+{
+ if (pg_popcount_zbb_available())
+ {
+ pg_popcount_optimized = pg_popcount_zbb;
+ pg_popcount_masked_optimized = pg_popcount_masked_zbb;
+ }
+ else
+ {
+ pg_popcount_optimized = pg_popcount_portable;
+ pg_popcount_masked_optimized = pg_popcount_masked_portable;
+ }
+}
+
+static uint64
+pg_popcount_choose(const char *buf, int bytes)
+{
+ choose_popcount_functions();
+ return pg_popcount_optimized(buf, bytes);
+}
+
+static uint64
+pg_popcount_masked_choose(const char *buf, int bytes, uint8 mask)
+{
+ choose_popcount_functions();
+ return pg_popcount_masked_optimized(buf, bytes, mask);
+}
+
+/*
+ * pg_popcount64_zbb
+ * Return the number of 1 bits set in word
+ *
+ * Uses the RISC-V Zbb 'cpop' (count population) instruction via
+ * __builtin_popcountll(). When compiled with -march=rv64gc_zbb, GCC and
+ * Clang will emit the cpop instruction for this builtin.
+ */
+static inline int
+pg_popcount64_zbb(uint64 word)
+{
+ return __builtin_popcountll(word);
+}
+
+/*
+ * pg_popcount_zbb
+ * Returns number of 1 bits in buf
+ *
+ * Similar approach to x86 SSE4.2 POPCNT: process data in 8-byte chunks using
+ * the cpop instruction, with byte-by-byte fallback for remaining data.
+ */
+static uint64
+pg_popcount_zbb(const char *buf, int bytes)
+{
+ uint64 popcnt = 0;
+ const uint64 *words = (const uint64 *) buf;
+
+ /* Process 8-byte chunks */
+ while (bytes >= 8)
+ {
+ popcnt += pg_popcount64_zbb(*words++);
+ bytes -= 8;
+ }
+
+ buf = (const char *) words;
+
+ /* Process any remaining bytes */
+ while (bytes--)
+ popcnt += pg_number_of_ones[(unsigned char) *buf++];
+
+ return popcnt;
+}
+
+/*
+ * pg_popcount_masked_zbb
+ * Returns number of 1 bits in buf after applying the mask to each byte
+ */
+static uint64
+pg_popcount_masked_zbb(const char *buf, int bytes, uint8 mask)
+{
+ uint64 popcnt = 0;
+ uint64 maskv = ~UINT64CONST(0) / 0xFF * mask;
+ const uint64 *words = (const uint64 *) buf;
+
+ /* Process 8-byte chunks */
+ while (bytes >= 8)
+ {
+ popcnt += pg_popcount64_zbb(*words++ & maskv);
+ bytes -= 8;
+ }
+
+ buf = (const char *) words;
+
+ /* Process any remaining bytes */
+ while (bytes--)
+ popcnt += pg_number_of_ones[(unsigned char) *buf++ & mask];
+
+ return popcnt;
+}
+
+#endif /* USE_RISCV_ZBB_WITH_RUNTIME_CHECK */
--
2.51.2
From 7e78da689961fa8ca341c1a63bce1f7a0c8969a8 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Mon, 23 Mar 2026 12:31:58 +0000
Subject: [PATCH v4 3/3] Add RISC-V CRC32C using the Zbc extension
This adds hardware-accelerated CRC-32C computation for RISC-V platforms
with the Zbc (carry-less multiply) or Zbkc (crypto carry-less)
extension.
The implementation uses the clmul and clmulh instructions for polynomial
folding with Barrett reduction to compute CRC-32C checksums. This
provides approximately 20x speedup over the software slicing-by-8
implementation.
The algorithm is based on the Google Abseil project's RISC-V CRC32C
implementation (https://github.com/abseil/abseil-cpp/pull/1986 in
absl/crc/internal/crc_riscv.cc) that is Copyright 2025 The Abseil
Authors licensed under the Apache License, Version 2.0.
Runtime detection uses the Linux riscv_hwprobe syscall (kernel 6.4+) to
check for Zbc/Zbkc support, falling back gracefully to software on older
kernels or non-Linux platforms.
Similar to ARMv8 CRC Extension and x86 SSE 4.2 support, this is compiled
with '-march=rv64gc_zbc' and selected at runtime based on CPU
capabilities.
---
config/c-compiler.m4 | 41 +++++
configure.ac | 36 ++++-
meson.build | 36 +++++
src/include/port/pg_crc32c.h | 14 ++
src/port/meson.build | 3 +
src/port/pg_crc32c_riscv_choose.c | 101 ++++++++++++
src/port/pg_crc32c_riscv_zbc.c | 257 ++++++++++++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 482 insertions(+), 7 deletions(-)
create mode 100644 src/port/pg_crc32c_riscv_choose.c
create mode 100644 src/port/pg_crc32c_riscv_zbc.c
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3eab0da9cb6..00143c482c1 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -854,6 +854,47 @@ fi
undefine([Ac_cachevar])dnl
])# PGAC_LOONGARCH_CRC32C_INTRINSICS
+# PGAC_RISCV_ZBC_CRC32C_INTRINSICS
+# ---------------------------------
+# Check if the compiler supports RISC-V Zbc (carry-less multiply) instructions
+# for CRC-32C computation, using inline assembly for clmul instruction.
+#
+# An optional compiler flag can be passed as argument (e.g. -march=rv64gc_zbc).
+# If the intrinsics are supported, sets pgac_riscv_zbc_crc32c_intrinsics and
+# CFLAGS_CRC.
+#
+# The Zbc extension provides clmul and clmulh instructions which are used with
+# polynomial folding to compute CRC-32C. This implementation is based on the
+# algorithm from Google Abseil (https://github.com/abseil/abseil-cpp/pull/1986).
+AC_DEFUN([PGAC_RISCV_ZBC_CRC32C_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_riscv_zbc_crc32c_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for RISC-V Zbc clmul with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([
+#if !defined(__riscv) || !defined(__riscv_xlen) || __riscv_xlen != 64
+#error not RISC-V 64-bit
+#endif
+
+static inline unsigned long clmul_test(unsigned long a, unsigned long b)
+{
+ unsigned long result;
+ __asm__("clmul %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
+ return result;
+}],
+ [unsigned long result = clmul_test(0x123, 0x456);
+ /* return computed value, to prevent the above being optimized away */
+ return result == 0;])],
+ [Ac_cachevar=yes],
+ [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+ CFLAGS_CRC="$1"
+ pgac_riscv_zbc_crc32c_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_RISCV_ZBC_CRC32C_INTRINSICS
+
# PGAC_XSAVE_INTRINSICS
# ---------------------
# Check if the compiler supports the XSAVE instructions using the _xgetbv
diff --git a/configure.ac b/configure.ac
index da4d3bceb94..7154a578b7c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2248,6 +2248,17 @@ fi
# with the default compiler flags.
PGAC_LOONGARCH_CRC32C_INTRINSICS()
+# Check for RISC-V Zbc (carry-less multiply) for CRC calculations.
+#
+# The Zbc extension provides clmul and clmulh instructions for hardware-
+# accelerated CRC-32C computation using polynomial folding. Check if we
+# can compile with -march=rv64gc_zbc flag. CFLAGS_CRC is set if the flag
+# is required.
+#
+# This implementation is based on Google Abseil's algorithm:
+# https://github.com/abseil/abseil-cpp/pull/1986
+PGAC_RISCV_ZBC_CRC32C_INTRINSICS([-march=rv64gc_zbc])
+
AC_SUBST(CFLAGS_CRC)
# Select CRC-32C implementation.
@@ -2278,7 +2289,7 @@ AC_SUBST(CFLAGS_CRC)
#
# If we are targeting a LoongArch processor, CRC instructions are
# always available (at least on 64 bit), so no runtime check is needed.
-if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_LOONGARCH_CRC32C" = x""; then
+if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_LOONGARCH_CRC32C" = x"" && test x"$USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK" = x""; then
# Use Intel SSE 4.2 if available.
if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
USE_SSE42_CRC32C=1
@@ -2300,9 +2311,14 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" &&
if test x"$pgac_loongarch_crc32c_intrinsics" = x"yes"; then
USE_LOONGARCH_CRC32C=1
else
- # fall back to slicing-by-8 algorithm, which doesn't require any
- # special CPU support.
- USE_SLICING_BY_8_CRC32C=1
+ # RISC-V Zbc CRC, with runtime check.
+ if test x"$pgac_riscv_zbc_crc32c_intrinsics" = x"yes"; then
+ USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK=1
+ else
+ # fall back to slicing-by-8 algorithm, which doesn't require any
+ # special CPU support.
+ USE_SLICING_BY_8_CRC32C=1
+ fi
fi
fi
fi
@@ -2337,9 +2353,15 @@ else
PG_CRC32C_OBJS="pg_crc32c_loongarch.o"
AC_MSG_RESULT(LoongArch CRCC instructions)
else
- AC_DEFINE(USE_SLICING_BY_8_CRC32C, 1, [Define to 1 to use software CRC-32C implementation (slicing-by-8).])
- PG_CRC32C_OBJS="pg_crc32c_sb8.o"
- AC_MSG_RESULT(slicing-by-8)
+ if test x"$USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK" = x"1"; then
+ AC_DEFINE(USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK, 1, [Define to 1 to use RISC-V Zbc CRC instructions with a runtime check.])
+ PG_CRC32C_OBJS="pg_crc32c_riscv_zbc.o pg_crc32c_sb8.o pg_crc32c_riscv_choose.o"
+ AC_MSG_RESULT(RISC-V Zbc instructions with runtime check)
+ else
+ AC_DEFINE(USE_SLICING_BY_8_CRC32C, 1, [Define to 1 to use software CRC-32C implementation (slicing-by-8).])
+ PG_CRC32C_OBJS="pg_crc32c_sb8.o"
+ AC_MSG_RESULT(slicing-by-8)
+ fi
fi
fi
fi
diff --git a/meson.build b/meson.build
index cf7f41715d8..9d1460ff952 100644
--- a/meson.build
+++ b/meson.build
@@ -2835,6 +2835,42 @@ int main(void)
have_optimized_crc = true
endif
+elif host_cpu == 'riscv64'
+
+ # Check for RISC-V Zbc (carry-less multiply) extension for CRC-32C.
+ # The Zbc extension provides clmul and clmulh instructions used for
+ # hardware-accelerated CRC computation via polynomial folding.
+ #
+ # This implementation is based on Google Abseil's algorithm:
+ # https://github.com/abseil/abseil-cpp/pull/1986
+
+ prog = '''
+#if !defined(__riscv) || !defined(__riscv_xlen) || __riscv_xlen != 64
+#error not RISC-V 64-bit
+#endif
+
+static inline unsigned long clmul(unsigned long a, unsigned long b)
+{
+ unsigned long result;
+ __asm__("clmul %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
+ return result;
+}
+
+int main(void)
+{
+ unsigned long result = clmul(0x123, 0x456);
+ return result == 0;
+}
+'''
+
+ if cc.links(prog, name: 'RISC-V Zbc clmul with -march=rv64gc_zbc',
+ args: test_c_args + ['-march=rv64gc_zbc'])
+ # Use RISC-V Zbc CRC, with runtime check
+ cflags_crc += '-march=rv64gc_zbc'
+ cdata.set('USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK', 1)
+ have_optimized_crc = true
+ endif
+
endif
if not have_optimized_crc
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index 2f22e176a66..3e60a23b947 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -166,6 +166,20 @@ extern pg_crc32c pg_comp_crc32c_armv8(pg_crc32c crc, const void *data, size_t le
extern pg_crc32c pg_comp_crc32c_pmull(pg_crc32c crc, const void *data, size_t len);
#endif
+#elif defined(USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK)
+
+/*
+ * Use RISC-V Zbc instructions, but perform a runtime check first
+ * to check that they are available.
+ */
+#define COMP_CRC32C(crc, data, len) \
+ ((crc) = pg_comp_crc32c((crc), (data), (len)))
+#define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF)
+
+extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c pg_comp_crc32c_riscv_zbc(pg_crc32c crc, const void *data, size_t len);
+
#else
/*
* Use slicing-by-8 algorithm.
diff --git a/src/port/meson.build b/src/port/meson.build
index 2c0486f5373..c1427240511 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -101,6 +101,9 @@ replace_funcs_pos = [
['pg_crc32c_loongarch', 'USE_LOONGARCH_CRC32C'],
# riscv
+ ['pg_crc32c_riscv_zbc', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK', 'crc'],
+ ['pg_crc32c_riscv_choose', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK'],
+ ['pg_crc32c_sb8', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK'],
['pg_popcount_riscv', 'USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 'zbb'],
# generic fallback
diff --git a/src/port/pg_crc32c_riscv_choose.c b/src/port/pg_crc32c_riscv_choose.c
new file mode 100644
index 00000000000..18d105e5e12
--- /dev/null
+++ b/src/port/pg_crc32c_riscv_choose.c
@@ -0,0 +1,101 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_crc32c_riscv_choose.c
+ * Choose between RISC-V Zbc and software CRC-32C implementation.
+ *
+ * On first call, checks if the CPU supports the RISC-V Zbc (or Zbkc) extension.
+ * If it does, use carry-less multiply instructions for CRC-32C computation.
+ * Otherwise, fall back to the pure software implementation (slicing-by-8).
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/port/pg_crc32c_riscv_choose.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#include "port/pg_crc32c.h"
+
+/*
+ * RISC-V hardware probing definitions
+ */
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe 258
+#endif
+
+#ifndef RISCV_HWPROBE_KEY_IMA_EXT_0
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#endif
+
+#ifndef RISCV_HWPROBE_EXT_ZBC
+#define RISCV_HWPROBE_EXT_ZBC (1ULL << 7)
+#endif
+
+#ifndef RISCV_HWPROBE_EXT_ZBKC
+#define RISCV_HWPROBE_EXT_ZBKC (1ULL << 27)
+#endif
+
+struct riscv_hwprobe
+{
+ int64 key;
+ uint64 value;
+};
+
+/*
+ * Check if RISC-V Zbc or Zbkc extension is available
+ *
+ * Uses the riscv_hwprobe syscall which is available on Linux kernel 6.4+
+ * Falls back to software if the syscall fails or extensions are not available.
+ */
+static bool
+pg_crc32c_riscv_zbc_available(void)
+{
+#if defined(__linux__) && defined(__riscv) && (__riscv_xlen == 64)
+ struct riscv_hwprobe pair = {.key = RISCV_HWPROBE_KEY_IMA_EXT_0};
+
+ /*
+ * Make the syscall. If it fails (e.g., old kernel, non-Linux), fall back
+ * to software.
+ */
+ if (syscall(__NR_riscv_hwprobe, &pair, 1, 0, NULL, 0) != 0)
+ return false;
+
+ /*
+ * Check if either Zbc (general bitmanip carry-less) or Zbkc (crypto
+ * carry-less) is available. Both provide clmul/clmulh instructions.
+ */
+ return (pair.value & (RISCV_HWPROBE_EXT_ZBC | RISCV_HWPROBE_EXT_ZBKC)) != 0;
+#else
+ /* Not on RISC-V Linux, or not 64-bit - use software fallback */
+ return false;
+#endif
+}
+
+/*
+ * This gets called on the first call. It replaces the function pointer
+ * so that subsequent calls are routed directly to the chosen implementation.
+ */
+static pg_crc32c
+pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len)
+{
+ if (pg_crc32c_riscv_zbc_available())
+ pg_comp_crc32c = pg_comp_crc32c_riscv_zbc;
+ else
+ pg_comp_crc32c = pg_comp_crc32c_sb8;
+
+ return pg_comp_crc32c(crc, data, len);
+}
+
+pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len) = pg_comp_crc32c_choose;
diff --git a/src/port/pg_crc32c_riscv_zbc.c b/src/port/pg_crc32c_riscv_zbc.c
new file mode 100644
index 00000000000..9eb845dca69
--- /dev/null
+++ b/src/port/pg_crc32c_riscv_zbc.c
@@ -0,0 +1,257 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_crc32c_riscv_zbc.c
+ * Compute CRC-32C checksum using RISC-V Zbc carry-less multiply instructions
+ *
+ * This implementation uses the RISC-V Zbc (or Zbkc) extension for hardware-
+ * accelerated CRC-32C computation. It uses carry-less multiplication (clmul
+ * and clmulh) with polynomial folding and Barrett reduction.
+ *
+ * The algorithm is based on Google Abseil's implementation:
+ * https://github.com/abseil/abseil-cpp/pull/1986
+ * File: absl/crc/internal/crc_riscv.cc
+ *
+ * Copyright 2025 The Abseil Authors
+ * Licensed under the Apache License, Version 2.0
+ * Adapted for PostgreSQL under PostgreSQL license
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/port/pg_crc32c_riscv_zbc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#ifdef WORDS_BIGENDIAN
+#error "RISC-V Zbc CRC implementation does not support big-endian systems"
+#endif
+
+#include "port/pg_crc32c.h"
+
+/*
+ * 128-bit value for polynomial arithmetic
+ */
+typedef struct
+{
+ uint64 lo;
+ uint64 hi;
+} V128;
+
+/*
+ * Carry-less multiply instructions from RISC-V Zbc/Zbkc extension
+ */
+static inline uint64
+pg_clmul(uint64 a, uint64 b)
+{
+ uint64 _res;
+
+ __asm__(
+ " clmul %0, %1, %2\n"
+: "=r"(_res)
+: "r"(a), "r"(b));
+
+ return _res;
+}
+
+static inline uint64
+pg_clmulh(uint64 a, uint64 b)
+{
+ uint64 _res;
+
+ __asm__(
+ " clmulh %0, %1, %2"
+: "=r"(_res)
+: "r"(a), "r"(b));
+
+ return _res;
+}
+
+static inline V128
+pg_clmul128(uint64 a, uint64 b)
+{
+ V128 result;
+
+ result.lo = pg_clmul(a, b);
+ result.hi = pg_clmulh(a, b);
+ return result;
+}
+
+/*
+ * 128-bit operations
+ */
+static inline V128
+pg_v128_xor(V128 a, V128 b)
+{
+ V128 result;
+
+ result.lo = a.lo ^ b.lo;
+ result.hi = a.hi ^ b.hi;
+ return result;
+}
+
+static inline V128
+pg_v128_and_mask32(V128 a)
+{
+ V128 result;
+
+ result.lo = a.lo & UINT64CONST(0x00000000FFFFFFFF);
+ result.hi = a.hi & UINT64CONST(0x00000000FFFFFFFF);
+ return result;
+}
+
+static inline V128
+pg_v128_shift_right64(V128 a)
+{
+ V128 result;
+
+ result.lo = a.hi;
+ result.hi = 0;
+ return result;
+}
+
+static inline V128
+pg_v128_shift_right32(V128 a)
+{
+ V128 result;
+
+ result.lo = (a.lo >> 32) | (a.hi << 32);
+ result.hi = (a.hi >> 32);
+ return result;
+}
+
+static inline V128
+pg_v128_load(const unsigned char *p)
+{
+ V128 result;
+
+ /*
+ * Load 16 bytes as two 64-bit values. Use direct loads like Abseil
+ * reference implementation. RISC-V is always little-endian so no byte
+ * swapping needed.
+ */
+ result.lo = *(const uint64 *) p;
+ result.hi = *(const uint64 *) (p + 8);
+ return result;
+}
+
+/*
+ * CRC-32C (Castagnoli) polynomial folding constants. These are computed
+ * for the polynomial 0x1EDC6F41 (normal form) or 0x82F63B78 (reflected).
+ */
+static const uint64 kK5 = UINT64CONST(0x0f20c0dfe); /* Folding constant */
+static const uint64 kK6 = UINT64CONST(0x14cd00bd6); /* Folding constant */
+static const uint64 kK7 = UINT64CONST(0x0dd45aab8); /* 64->32 reduction */
+static const uint64 kP1 = UINT64CONST(0x105ec76f0); /* Barrett reduction */
+static const uint64 kP2 = UINT64CONST(0x0dea713f1); /* Barrett reduction */
+
+/*
+ * Core CRC-32C computation using carry-less multiplication.
+ *
+ * Input: CRC in working form (already inverted with ~crc)
+ * Output: CRC in working form (still inverted)
+ *
+ * Precondition: len >= 32 and len % 16 == 0
+ */
+static uint32
+pg_crc32c_clmul_core(uint32 crc_inverted, const unsigned char *buf, uint64 len)
+{
+ V128 x;
+
+ /* Load first 16-byte block and XOR with inverted CRC */
+ x = pg_v128_load(buf);
+ x.lo ^= (uint64) crc_inverted;
+ buf += 16;
+ len -= 16;
+
+ /* Fold 16-byte blocks into 128-bit accumulator */
+ while (len >= 16)
+ {
+ V128 block = pg_v128_load(buf);
+ V128 lo = pg_clmul128(x.lo, kK5);
+ V128 hi = pg_clmul128(x.hi, kK6);
+
+ x = pg_v128_xor(pg_v128_xor(lo, hi), block);
+ buf += 16;
+ len -= 16;
+ }
+
+ /* Reduce 128-bit to 64-bit */
+ {
+ V128 tmp = pg_clmul128(kK6, x.lo);
+
+ x = pg_v128_xor(pg_v128_shift_right64(x), tmp);
+ }
+
+ /* Reduce 64-bit to 32-bit */
+ {
+ V128 tmp = pg_v128_shift_right32(x);
+
+ x = pg_v128_and_mask32(x);
+ x = pg_clmul128(kK7, x.lo);
+ x = pg_v128_xor(x, tmp);
+ }
+
+ /* Barrett reduction to final 32-bit CRC */
+ {
+ V128 tmp = pg_v128_and_mask32(x);
+
+ tmp = pg_clmul128(kP2, tmp.lo);
+ tmp = pg_v128_and_mask32(tmp);
+ tmp = pg_clmul128(kP1, tmp.lo);
+ x = pg_v128_xor(x, tmp);
+ }
+
+ /* Extract result from second 32-bit lane */
+ return (uint32) ((x.lo >> 32) & UINT64CONST(0xFFFFFFFF));
+}
+
+/*
+ * Main CRC-32C computation function with RISC-V Zbc acceleration
+ */
+pg_crc32c
+pg_comp_crc32c_riscv_zbc(pg_crc32c crc, const void *data, size_t len)
+{
+ const unsigned char *p = data;
+ const size_t kMinLen = 32;
+ const size_t kChunkLen = 16;
+ size_t tail;
+
+ /* Use software fallback for small buffers */
+ if (len < kMinLen)
+ return pg_comp_crc32c_sb8(crc, data, len);
+
+ /*
+ * Process head bytes to align to 16-byte boundary if needed. The hardware
+ * algorithm requires 16-byte aligned access.
+ */
+ /* Process tail bytes with software (Abseil approach) */
+ tail = len % kChunkLen;
+ if (tail)
+ {
+ crc = pg_comp_crc32c_sb8(crc, p, tail);
+ p += tail;
+ len -= tail;
+ }
+
+ /*
+ * Process remaining bytes (now a multiple of 16) with hardware. The core
+ * algorithm requires at least 32 bytes.
+ */
+ if (len >= 32)
+ {
+ /*
+ * The Abseil core algorithm expects to receive 0xFFFFFFFF as the
+ * initial CRC value (corresponding to Abseil's initial value of 0
+ * after inversion). PostgreSQL's convention already passes 0xFFFFFFFF
+ * initially, so pass it directly. The core returns a value that needs
+ * final XOR with 0xFFFFFFFF (done by the caller).
+ */
+ crc = pg_crc32c_clmul_core(crc, p, len);
+ }
+
+ return crc;
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cf40c87043..372a80c067f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3393,6 +3393,7 @@ VirtualTransactionId
VirtualTupleTableSlot
VolatileFunctionStatus
Vsrt
+V128
WAIT_ORDER
WALAvailability
WALInsertLock
--
2.51.2