On Mon, Jun 8, 2026, at 1:45 PM, Greg Burd wrote:
> Hello,
>
> The attached patch addresses the RISC-V instability when compiled with
> Clang < 22 related to vectorization by disabling that feature in during
> configuration. This fix is verified on greenfly which now passes tests
> given that it is now compiled with Clang 22.
>
> IMO, given that this impacts at least one important thing in a silent
> and broken way it feels like something we should adopt in v19 and not
> wait.
>
> best.
>
> -greg
>
> Attachments:
> * v5-0001-Disable-auto-vectorization-on-RISC-V-with-Clang-older-than-22.patch
I almost forgot that this thread started off about optimizations for RISC-V, so
I'll re-attach those there now along with the identical patch from v5.
Ideally all three could make it into v19, the RISC-V platform isn't huge but it
is expanding in the server market quickly.
best.
-greg
From f000ea332cf40031f045f854540343a9fcf89afc Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 22 Mar 2026 11:15:41 -0400
Subject: [PATCH v6 1/3] Add RISC-V popcount using Zbb extension
Implement hardware popcount support for RISC-V using the Zbb (basic bit
manipulation) extension when present. The Zbb extension provides the
'cpop' instruction which GCC and Clang emit from __builtin_popcountll()
when compiling with -march=rv64gc_zbb.
This patch adds:
- Build-time detection of Zbb support (configure.ac, meson.build)
- Runtime detection using __riscv_hwprobe() on Linux
- Optimized popcount implementation using cpop instruction
The implementation follows established pattern for hardware acceleration
(similar to x86 POPCNT and ARM SVE). Zbb-optimized code is compiled
separately with -march=rv64gc_zbb, while the main binary remains
portable across all RISC-V 64-bit systems.
---
configure.ac | 29 ++++++
meson.build | 32 ++++++
src/include/port/pg_bitutils.h | 2 +-
src/port/meson.build | 7 +-
src/port/pg_bitutils.c | 5 +-
src/port/pg_popcount_riscv.c | 183 +++++++++++++++++++++++++++++++++
6 files changed, 253 insertions(+), 5 deletions(-)
create mode 100644 src/port/pg_popcount_riscv.c
diff --git a/configure.ac b/configure.ac
index 61cee42daa7..e207d9c6d06 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2187,6 +2187,35 @@ if test x"$host_cpu" = x"aarch64"; then
fi
fi
+# Check for RISC-V Zbb bitmanip extension (provides 'cpop' for popcount).
+#
+# The Zbb extension provides the 'cpop' instruction for hardware popcount.
+# GCC/Clang emit the cpop instruction from __builtin_popcountll() when
+# -march=rv64gc_zbb is used. We test compilation with this flag, then
+# restore CFLAGS to avoid global march flags (for binary portability).
+# We define USE_RISCV_ZBB_WITH_RUNTIME_CHECK and use __riscv_hwprobe()
+# for runtime detection. We compile src/port/pg_popcount_riscv.c with
+# -march=rv64gc_zbb separately (like ARM SVE and x86 POPCNT).
+AC_MSG_CHECKING([for RISC-V Zbb extension (cpop/popcount)])
+if test x"$host_cpu" = x"riscv64"; then
+ pgac_save_CFLAGS_zbb="$CFLAGS"
+ CFLAGS="$CFLAGS -march=rv64gc_zbb"
+ AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [/* Test that the compiler will emit cpop from __builtin_popcountll */
+ static inline int test_cpop(unsigned long long x)
+ { return __builtin_popcountll(x); }],
+ [volatile int r = test_cpop(0xdeadbeefULL); (void) r;])],
+ [AC_DEFINE(USE_RISCV_ZBB_WITH_RUNTIME_CHECK, 1,
+ [Define to 1 to use RISC-V Zbb popcount with runtime detection.])
+ CFLAGS="$pgac_save_CFLAGS_zbb"
+ AC_MSG_RESULT([yes, with runtime check])],
+ [CFLAGS="$pgac_save_CFLAGS_zbb"
+ AC_MSG_RESULT([no])])
+else
+ AC_MSG_RESULT([not on RISC-V])
+fi
+
# Check for Intel SSE 4.2 intrinsics to do CRC calculations.
#
PGAC_SSE42_CRC32_INTRINSICS()
diff --git a/meson.build b/meson.build
index 568e0e150bf..5cf105a17ea 100644
--- a/meson.build
+++ b/meson.build
@@ -2601,6 +2601,38 @@ int main(void)
endif
+# ---------------------------------------------------------------------------
+# Check for RISC-V Zbb bitmanip extension (provides 'cpop' for popcount).
+#
+# The Zbb extension provides the 'cpop' instruction for hardware popcount.
+# GCC/Clang emit the cpop instruction from __builtin_popcountll() when
+# -march=rv64gc_zbb is used. We test compilation with this flag, but
+# do NOT add it globally (for binary portability). Instead, we define
+# USE_RISCV_ZBB_WITH_RUNTIME_CHECK and compile src/port/pg_popcount_riscv.c
+# with -march=rv64gc_zbb separately (like ARM SVE and x86 POPCNT).
+# Runtime detection uses __riscv_hwprobe().
+# ---------------------------------------------------------------------------
+zbb_test_code = '''
+static inline int test_cpop(unsigned long long x)
+{ return __builtin_popcountll(x); }
+int main(void) {
+ volatile int r = test_cpop(0xdeadbeefULL);
+ (void) r;
+ return 0;
+}
+'''
+
+cflags_zbb = []
+if host_cpu == 'riscv64'
+ if cc.compiles(zbb_test_code,
+ args: ['-march=rv64gc_zbb'],
+ name: 'RISC-V Zbb cpop')
+ cdata.set('USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 1)
+ # Flag will be added only to pg_popcount_riscv.c in src/port/meson.build
+ cflags_zbb = ['-march=rv64gc_zbb']
+ endif
+endif
+
###############################################################
# Select CRC-32C implementation.
diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 7a00d197013..cb8d8b6e626 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -279,7 +279,7 @@ pg_ceil_log2_64(uint64 num)
extern uint64 pg_popcount_portable(const char *buf, int bytes);
extern uint64 pg_popcount_masked_portable(const char *buf, int bytes, uint8 mask);
-#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)
+#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK) || defined(USE_RISCV_ZBB_WITH_RUNTIME_CHECK)
/*
* Attempt to use specialized CPU instructions, but perform a runtime check
* first.
diff --git a/src/port/meson.build b/src/port/meson.build
index 922b3f64676..2c0486f5373 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -100,12 +100,15 @@ replace_funcs_pos = [
# loongarch
['pg_crc32c_loongarch', 'USE_LOONGARCH_CRC32C'],
+ # riscv
+ ['pg_popcount_riscv', 'USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 'zbb'],
+
# generic fallback
['pg_crc32c_sb8', 'USE_SLICING_BY_8_CRC32C'],
]
-pgport_cflags = {'crc': cflags_crc}
-pgport_sources_cflags = {'crc': []}
+pgport_cflags = {'crc': cflags_crc, 'zbb': cflags_zbb}
+pgport_sources_cflags = {'crc': [], 'zbb': []}
foreach f : replace_funcs_neg
func = f.get(0)
diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
index 7b11c38c417..23af6c54477 100644
--- a/src/port/pg_bitutils.c
+++ b/src/port/pg_bitutils.c
@@ -162,7 +162,7 @@ pg_popcount_masked_portable(const char *buf, int bytes, uint8 mask)
return popcnt;
}
-#if !defined(HAVE_X86_64_POPCNTQ) && !defined(USE_NEON)
+#if !defined(HAVE_X86_64_POPCNTQ) && !defined(USE_NEON) && !defined(USE_RISCV_ZBB_WITH_RUNTIME_CHECK)
/*
* When special CPU instructions are not available, there's no point in using
@@ -191,4 +191,5 @@ pg_popcount_masked_optimized(const char *buf, int bytes, uint8 mask)
return pg_popcount_masked_portable(buf, bytes, mask);
}
-#endif /* ! HAVE_X86_64_POPCNTQ && ! USE_NEON */
+#endif /* ! HAVE_X86_64_POPCNTQ && ! USE_NEON && !
+ * USE_RISCV_ZBB_WITH_RUNTIME_CHECK */
diff --git a/src/port/pg_popcount_riscv.c b/src/port/pg_popcount_riscv.c
new file mode 100644
index 00000000000..dce68d15c44
--- /dev/null
+++ b/src/port/pg_popcount_riscv.c
@@ -0,0 +1,183 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_popcount_riscv.c
+ * Holds the RISC-V Zbb popcount implementations.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/port/pg_popcount_riscv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#ifdef USE_RISCV_ZBB_WITH_RUNTIME_CHECK
+
+#if defined(__linux__)
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/*
+ * Try to pull in <asm/hwprobe.h> for RISCV_HWPROBE_* / struct riscv_hwprobe.
+ * On older kernel-headers packages (or non-RISC-V Linux distros configured
+ * without multiarch headers) the file may be absent; provide minimal
+ * fallback definitions so this file still builds. The runtime check below
+ * will gracefully report "unavailable" if the syscall fails.
+ */
+#if defined(__has_include)
+#if __has_include(<asm/hwprobe.h>)
+#include <asm/hwprobe.h>
+#define HAVE_ASM_HWPROBE_H 1
+#endif
+#endif
+
+#ifndef HAVE_ASM_HWPROBE_H
+struct riscv_hwprobe
+{
+ int64 key;
+ uint64 value;
+};
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#define RISCV_HWPROBE_EXT_ZBB (UINT64CONST(1) << 4)
+#endif
+
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe 258
+#endif
+#endif /* __linux__ */
+
+#include "port/pg_bitutils.h"
+
+/*
+ * Hardware implementation using RISC-V Zbb cpop instruction.
+ */
+static uint64 pg_popcount_zbb(const char *buf, int bytes);
+static uint64 pg_popcount_masked_zbb(const char *buf, int bytes, uint8 mask);
+
+/*
+ * The function pointers are initially set to "choose" functions. These
+ * functions will first set the pointers to the right implementations (based on
+ * what the current CPU supports) and then will call the pointer to fulfill the
+ * caller's request.
+ */
+static uint64 pg_popcount_choose(const char *buf, int bytes);
+static uint64 pg_popcount_masked_choose(const char *buf, int bytes, uint8 mask);
+uint64 (*pg_popcount_optimized) (const char *buf, int bytes) = pg_popcount_choose;
+uint64 (*pg_popcount_masked_optimized) (const char *buf, int bytes, uint8 mask) = pg_popcount_masked_choose;
+
+static inline bool
+pg_popcount_zbb_available(void)
+{
+#if defined(__linux__)
+ struct riscv_hwprobe pair = {.key = RISCV_HWPROBE_KEY_IMA_EXT_0};
+
+ if (syscall(__NR_riscv_hwprobe, &pair, 1, 0, NULL, 0) != 0)
+ return false;
+
+ return (pair.value & RISCV_HWPROBE_EXT_ZBB) != 0;
+#else
+ return false;
+#endif
+}
+
+static inline void
+choose_popcount_functions(void)
+{
+ if (pg_popcount_zbb_available())
+ {
+ pg_popcount_optimized = pg_popcount_zbb;
+ pg_popcount_masked_optimized = pg_popcount_masked_zbb;
+ }
+ else
+ {
+ pg_popcount_optimized = pg_popcount_portable;
+ pg_popcount_masked_optimized = pg_popcount_masked_portable;
+ }
+}
+
+static uint64
+pg_popcount_choose(const char *buf, int bytes)
+{
+ choose_popcount_functions();
+ return pg_popcount_optimized(buf, bytes);
+}
+
+static uint64
+pg_popcount_masked_choose(const char *buf, int bytes, uint8 mask)
+{
+ choose_popcount_functions();
+ return pg_popcount_masked_optimized(buf, bytes, mask);
+}
+
+/*
+ * pg_popcount64_zbb
+ * Return the number of 1 bits set in word
+ *
+ * Uses the RISC-V Zbb 'cpop' (count population) instruction via
+ * __builtin_popcountll(). When compiled with -march=rv64gc_zbb, GCC and
+ * Clang will emit the cpop instruction for this builtin.
+ */
+static inline int
+pg_popcount64_zbb(uint64 word)
+{
+ return __builtin_popcountll(word);
+}
+
+/*
+ * pg_popcount_zbb
+ * Returns number of 1 bits in buf
+ *
+ * Similar approach to x86 SSE4.2 POPCNT: process data in 8-byte chunks using
+ * the cpop instruction, with byte-by-byte fallback for remaining data.
+ */
+static uint64
+pg_popcount_zbb(const char *buf, int bytes)
+{
+ uint64 popcnt = 0;
+ const uint64 *words = (const uint64 *) buf;
+
+ /* Process 8-byte chunks */
+ while (bytes >= 8)
+ {
+ popcnt += pg_popcount64_zbb(*words++);
+ bytes -= 8;
+ }
+
+ buf = (const char *) words;
+
+ /* Process any remaining bytes */
+ while (bytes--)
+ popcnt += pg_number_of_ones[(unsigned char) *buf++];
+
+ return popcnt;
+}
+
+/*
+ * pg_popcount_masked_zbb
+ * Returns number of 1 bits in buf after applying the mask to each byte
+ */
+static uint64
+pg_popcount_masked_zbb(const char *buf, int bytes, uint8 mask)
+{
+ uint64 popcnt = 0;
+ uint64 maskv = ~UINT64CONST(0) / 0xFF * mask;
+ const uint64 *words = (const uint64 *) buf;
+
+ /* Process 8-byte chunks */
+ while (bytes >= 8)
+ {
+ popcnt += pg_popcount64_zbb(*words++ & maskv);
+ bytes -= 8;
+ }
+
+ buf = (const char *) words;
+
+ /* Process any remaining bytes */
+ while (bytes--)
+ popcnt += pg_number_of_ones[(unsigned char) *buf++ & mask];
+
+ return popcnt;
+}
+
+#endif /* USE_RISCV_ZBB_WITH_RUNTIME_CHECK */
--
2.50.1
From a9c8618f9d2ddec35568357cdf462a6a2ba0242f Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Mon, 23 Mar 2026 12:31:58 +0000
Subject: [PATCH v6 2/3] Add RISC-V CRC32C using the Zbc extension
This adds hardware-accelerated CRC-32C computation for RISC-V platforms
with the Zbc (carry-less multiply) or Zbkc (crypto carry-less)
extension.
The implementation uses the clmul and clmulh instructions for polynomial
folding with Barrett reduction to compute CRC-32C checksums. This
provides approximately 20x speedup over the software slicing-by-8
implementation.
The algorithm is based on the Google Abseil project's RISC-V CRC32C
implementation (https://github.com/abseil/abseil-cpp/pull/1986 in
absl/crc/internal/crc_riscv.cc) that is Copyright 2025 The Abseil
Authors licensed under the Apache License, Version 2.0.
Runtime detection uses the Linux riscv_hwprobe syscall (kernel 6.4+) to
check for Zbc/Zbkc support, falling back gracefully to software on older
kernels or non-Linux platforms.
Similar to ARMv8 CRC Extension and x86 SSE 4.2 support, this is compiled
with '-march=rv64gc_zbc' and selected at runtime based on CPU
capabilities.
---
config/c-compiler.m4 | 41 +++++
configure.ac | 36 ++++-
meson.build | 36 +++++
src/include/port/pg_crc32c.h | 14 ++
src/port/meson.build | 3 +
src/port/pg_crc32c_riscv_choose.c | 101 ++++++++++++
src/port/pg_crc32c_riscv_zbc.c | 257 ++++++++++++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 482 insertions(+), 7 deletions(-)
create mode 100644 src/port/pg_crc32c_riscv_choose.c
create mode 100644 src/port/pg_crc32c_riscv_zbc.c
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3eab0da9cb6..00143c482c1 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -854,6 +854,47 @@ fi
undefine([Ac_cachevar])dnl
])# PGAC_LOONGARCH_CRC32C_INTRINSICS
+# PGAC_RISCV_ZBC_CRC32C_INTRINSICS
+# ---------------------------------
+# Check if the compiler supports RISC-V Zbc (carry-less multiply) instructions
+# for CRC-32C computation, using inline assembly for clmul instruction.
+#
+# An optional compiler flag can be passed as argument (e.g. -march=rv64gc_zbc).
+# If the intrinsics are supported, sets pgac_riscv_zbc_crc32c_intrinsics and
+# CFLAGS_CRC.
+#
+# The Zbc extension provides clmul and clmulh instructions which are used with
+# polynomial folding to compute CRC-32C. This implementation is based on the
+# algorithm from Google Abseil (https://github.com/abseil/abseil-cpp/pull/1986).
+AC_DEFUN([PGAC_RISCV_ZBC_CRC32C_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_riscv_zbc_crc32c_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for RISC-V Zbc clmul with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([
+#if !defined(__riscv) || !defined(__riscv_xlen) || __riscv_xlen != 64
+#error not RISC-V 64-bit
+#endif
+
+static inline unsigned long clmul_test(unsigned long a, unsigned long b)
+{
+ unsigned long result;
+ __asm__("clmul %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
+ return result;
+}],
+ [unsigned long result = clmul_test(0x123, 0x456);
+ /* return computed value, to prevent the above being optimized away */
+ return result == 0;])],
+ [Ac_cachevar=yes],
+ [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+ CFLAGS_CRC="$1"
+ pgac_riscv_zbc_crc32c_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_RISCV_ZBC_CRC32C_INTRINSICS
+
# PGAC_XSAVE_INTRINSICS
# ---------------------
# Check if the compiler supports the XSAVE instructions using the _xgetbv
diff --git a/configure.ac b/configure.ac
index e207d9c6d06..dc9c0fd247a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2248,6 +2248,17 @@ fi
# with the default compiler flags.
PGAC_LOONGARCH_CRC32C_INTRINSICS()
+# Check for RISC-V Zbc (carry-less multiply) for CRC calculations.
+#
+# The Zbc extension provides clmul and clmulh instructions for hardware-
+# accelerated CRC-32C computation using polynomial folding. Check if we
+# can compile with -march=rv64gc_zbc flag. CFLAGS_CRC is set if the flag
+# is required.
+#
+# This implementation is based on Google Abseil's algorithm:
+# https://github.com/abseil/abseil-cpp/pull/1986
+PGAC_RISCV_ZBC_CRC32C_INTRINSICS([-march=rv64gc_zbc])
+
AC_SUBST(CFLAGS_CRC)
# Select CRC-32C implementation.
@@ -2278,7 +2289,7 @@ AC_SUBST(CFLAGS_CRC)
#
# If we are targeting a LoongArch processor, CRC instructions are
# always available (at least on 64 bit), so no runtime check is needed.
-if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_LOONGARCH_CRC32C" = x""; then
+if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_LOONGARCH_CRC32C" = x"" && test x"$USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK" = x""; then
# Use Intel SSE 4.2 if available.
if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
USE_SSE42_CRC32C=1
@@ -2300,9 +2311,14 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" &&
if test x"$pgac_loongarch_crc32c_intrinsics" = x"yes"; then
USE_LOONGARCH_CRC32C=1
else
- # fall back to slicing-by-8 algorithm, which doesn't require any
- # special CPU support.
- USE_SLICING_BY_8_CRC32C=1
+ # RISC-V Zbc CRC, with runtime check.
+ if test x"$pgac_riscv_zbc_crc32c_intrinsics" = x"yes"; then
+ USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK=1
+ else
+ # fall back to slicing-by-8 algorithm, which doesn't require any
+ # special CPU support.
+ USE_SLICING_BY_8_CRC32C=1
+ fi
fi
fi
fi
@@ -2337,9 +2353,15 @@ else
PG_CRC32C_OBJS="pg_crc32c_loongarch.o"
AC_MSG_RESULT(LoongArch CRCC instructions)
else
- AC_DEFINE(USE_SLICING_BY_8_CRC32C, 1, [Define to 1 to use software CRC-32C implementation (slicing-by-8).])
- PG_CRC32C_OBJS="pg_crc32c_sb8.o"
- AC_MSG_RESULT(slicing-by-8)
+ if test x"$USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK" = x"1"; then
+ AC_DEFINE(USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK, 1, [Define to 1 to use RISC-V Zbc CRC instructions with a runtime check.])
+ PG_CRC32C_OBJS="pg_crc32c_riscv_zbc.o pg_crc32c_sb8.o pg_crc32c_riscv_choose.o"
+ AC_MSG_RESULT(RISC-V Zbc instructions with runtime check)
+ else
+ AC_DEFINE(USE_SLICING_BY_8_CRC32C, 1, [Define to 1 to use software CRC-32C implementation (slicing-by-8).])
+ PG_CRC32C_OBJS="pg_crc32c_sb8.o"
+ AC_MSG_RESULT(slicing-by-8)
+ fi
fi
fi
fi
diff --git a/meson.build b/meson.build
index 5cf105a17ea..286d3b01f15 100644
--- a/meson.build
+++ b/meson.build
@@ -2835,6 +2835,42 @@ int main(void)
have_optimized_crc = true
endif
+elif host_cpu == 'riscv64'
+
+ # Check for RISC-V Zbc (carry-less multiply) extension for CRC-32C.
+ # The Zbc extension provides clmul and clmulh instructions used for
+ # hardware-accelerated CRC computation via polynomial folding.
+ #
+ # This implementation is based on Google Abseil's algorithm:
+ # https://github.com/abseil/abseil-cpp/pull/1986
+
+ prog = '''
+#if !defined(__riscv) || !defined(__riscv_xlen) || __riscv_xlen != 64
+#error not RISC-V 64-bit
+#endif
+
+static inline unsigned long clmul(unsigned long a, unsigned long b)
+{
+ unsigned long result;
+ __asm__("clmul %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
+ return result;
+}
+
+int main(void)
+{
+ unsigned long result = clmul(0x123, 0x456);
+ return result == 0;
+}
+'''
+
+ if cc.links(prog, name: 'RISC-V Zbc clmul with -march=rv64gc_zbc',
+ args: test_c_args + ['-march=rv64gc_zbc'])
+ # Use RISC-V Zbc CRC, with runtime check
+ cflags_crc += '-march=rv64gc_zbc'
+ cdata.set('USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK', 1)
+ have_optimized_crc = true
+ endif
+
endif
if not have_optimized_crc
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index 2f22e176a66..3e60a23b947 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -166,6 +166,20 @@ extern pg_crc32c pg_comp_crc32c_armv8(pg_crc32c crc, const void *data, size_t le
extern pg_crc32c pg_comp_crc32c_pmull(pg_crc32c crc, const void *data, size_t len);
#endif
+#elif defined(USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK)
+
+/*
+ * Use RISC-V Zbc instructions, but perform a runtime check first
+ * to check that they are available.
+ */
+#define COMP_CRC32C(crc, data, len) \
+ ((crc) = pg_comp_crc32c((crc), (data), (len)))
+#define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF)
+
+extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c pg_comp_crc32c_riscv_zbc(pg_crc32c crc, const void *data, size_t len);
+
#else
/*
* Use slicing-by-8 algorithm.
diff --git a/src/port/meson.build b/src/port/meson.build
index 2c0486f5373..c1427240511 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -101,6 +101,9 @@ replace_funcs_pos = [
['pg_crc32c_loongarch', 'USE_LOONGARCH_CRC32C'],
# riscv
+ ['pg_crc32c_riscv_zbc', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK', 'crc'],
+ ['pg_crc32c_riscv_choose', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK'],
+ ['pg_crc32c_sb8', 'USE_RISCV_ZBC_CRC32C_WITH_RUNTIME_CHECK'],
['pg_popcount_riscv', 'USE_RISCV_ZBB_WITH_RUNTIME_CHECK', 'zbb'],
# generic fallback
diff --git a/src/port/pg_crc32c_riscv_choose.c b/src/port/pg_crc32c_riscv_choose.c
new file mode 100644
index 00000000000..18d105e5e12
--- /dev/null
+++ b/src/port/pg_crc32c_riscv_choose.c
@@ -0,0 +1,101 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_crc32c_riscv_choose.c
+ * Choose between RISC-V Zbc and software CRC-32C implementation.
+ *
+ * On first call, checks if the CPU supports the RISC-V Zbc (or Zbkc) extension.
+ * If it does, use carry-less multiply instructions for CRC-32C computation.
+ * Otherwise, fall back to the pure software implementation (slicing-by-8).
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/port/pg_crc32c_riscv_choose.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#include "port/pg_crc32c.h"
+
+/*
+ * RISC-V hardware probing definitions
+ */
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe 258
+#endif
+
+#ifndef RISCV_HWPROBE_KEY_IMA_EXT_0
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#endif
+
+#ifndef RISCV_HWPROBE_EXT_ZBC
+#define RISCV_HWPROBE_EXT_ZBC (1ULL << 7)
+#endif
+
+#ifndef RISCV_HWPROBE_EXT_ZBKC
+#define RISCV_HWPROBE_EXT_ZBKC (1ULL << 27)
+#endif
+
+struct riscv_hwprobe
+{
+ int64 key;
+ uint64 value;
+};
+
+/*
+ * Check if RISC-V Zbc or Zbkc extension is available
+ *
+ * Uses the riscv_hwprobe syscall which is available on Linux kernel 6.4+
+ * Falls back to software if the syscall fails or extensions are not available.
+ */
+static bool
+pg_crc32c_riscv_zbc_available(void)
+{
+#if defined(__linux__) && defined(__riscv) && (__riscv_xlen == 64)
+ struct riscv_hwprobe pair = {.key = RISCV_HWPROBE_KEY_IMA_EXT_0};
+
+ /*
+ * Make the syscall. If it fails (e.g., old kernel, non-Linux), fall back
+ * to software.
+ */
+ if (syscall(__NR_riscv_hwprobe, &pair, 1, 0, NULL, 0) != 0)
+ return false;
+
+ /*
+ * Check if either Zbc (general bitmanip carry-less) or Zbkc (crypto
+ * carry-less) is available. Both provide clmul/clmulh instructions.
+ */
+ return (pair.value & (RISCV_HWPROBE_EXT_ZBC | RISCV_HWPROBE_EXT_ZBKC)) != 0;
+#else
+ /* Not on RISC-V Linux, or not 64-bit - use software fallback */
+ return false;
+#endif
+}
+
+/*
+ * This gets called on the first call. It replaces the function pointer
+ * so that subsequent calls are routed directly to the chosen implementation.
+ */
+static pg_crc32c
+pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len)
+{
+ if (pg_crc32c_riscv_zbc_available())
+ pg_comp_crc32c = pg_comp_crc32c_riscv_zbc;
+ else
+ pg_comp_crc32c = pg_comp_crc32c_sb8;
+
+ return pg_comp_crc32c(crc, data, len);
+}
+
+pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len) = pg_comp_crc32c_choose;
diff --git a/src/port/pg_crc32c_riscv_zbc.c b/src/port/pg_crc32c_riscv_zbc.c
new file mode 100644
index 00000000000..9eb845dca69
--- /dev/null
+++ b/src/port/pg_crc32c_riscv_zbc.c
@@ -0,0 +1,257 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_crc32c_riscv_zbc.c
+ * Compute CRC-32C checksum using RISC-V Zbc carry-less multiply instructions
+ *
+ * This implementation uses the RISC-V Zbc (or Zbkc) extension for hardware-
+ * accelerated CRC-32C computation. It uses carry-less multiplication (clmul
+ * and clmulh) with polynomial folding and Barrett reduction.
+ *
+ * The algorithm is based on Google Abseil's implementation:
+ * https://github.com/abseil/abseil-cpp/pull/1986
+ * File: absl/crc/internal/crc_riscv.cc
+ *
+ * Copyright 2025 The Abseil Authors
+ * Licensed under the Apache License, Version 2.0
+ * Adapted for PostgreSQL under PostgreSQL license
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/port/pg_crc32c_riscv_zbc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#ifdef WORDS_BIGENDIAN
+#error "RISC-V Zbc CRC implementation does not support big-endian systems"
+#endif
+
+#include "port/pg_crc32c.h"
+
+/*
+ * 128-bit value for polynomial arithmetic
+ */
+typedef struct
+{
+ uint64 lo;
+ uint64 hi;
+} V128;
+
+/*
+ * Carry-less multiply instructions from RISC-V Zbc/Zbkc extension
+ */
+static inline uint64
+pg_clmul(uint64 a, uint64 b)
+{
+ uint64 _res;
+
+ __asm__(
+ " clmul %0, %1, %2\n"
+: "=r"(_res)
+: "r"(a), "r"(b));
+
+ return _res;
+}
+
+static inline uint64
+pg_clmulh(uint64 a, uint64 b)
+{
+ uint64 _res;
+
+ __asm__(
+ " clmulh %0, %1, %2"
+: "=r"(_res)
+: "r"(a), "r"(b));
+
+ return _res;
+}
+
+static inline V128
+pg_clmul128(uint64 a, uint64 b)
+{
+ V128 result;
+
+ result.lo = pg_clmul(a, b);
+ result.hi = pg_clmulh(a, b);
+ return result;
+}
+
+/*
+ * 128-bit operations
+ */
+static inline V128
+pg_v128_xor(V128 a, V128 b)
+{
+ V128 result;
+
+ result.lo = a.lo ^ b.lo;
+ result.hi = a.hi ^ b.hi;
+ return result;
+}
+
+static inline V128
+pg_v128_and_mask32(V128 a)
+{
+ V128 result;
+
+ result.lo = a.lo & UINT64CONST(0x00000000FFFFFFFF);
+ result.hi = a.hi & UINT64CONST(0x00000000FFFFFFFF);
+ return result;
+}
+
+static inline V128
+pg_v128_shift_right64(V128 a)
+{
+ V128 result;
+
+ result.lo = a.hi;
+ result.hi = 0;
+ return result;
+}
+
+static inline V128
+pg_v128_shift_right32(V128 a)
+{
+ V128 result;
+
+ result.lo = (a.lo >> 32) | (a.hi << 32);
+ result.hi = (a.hi >> 32);
+ return result;
+}
+
+static inline V128
+pg_v128_load(const unsigned char *p)
+{
+ V128 result;
+
+ /*
+ * Load 16 bytes as two 64-bit values. Use direct loads like Abseil
+ * reference implementation. RISC-V is always little-endian so no byte
+ * swapping needed.
+ */
+ result.lo = *(const uint64 *) p;
+ result.hi = *(const uint64 *) (p + 8);
+ return result;
+}
+
+/*
+ * CRC-32C (Castagnoli) polynomial folding constants. These are computed
+ * for the polynomial 0x1EDC6F41 (normal form) or 0x82F63B78 (reflected).
+ */
+static const uint64 kK5 = UINT64CONST(0x0f20c0dfe); /* Folding constant */
+static const uint64 kK6 = UINT64CONST(0x14cd00bd6); /* Folding constant */
+static const uint64 kK7 = UINT64CONST(0x0dd45aab8); /* 64->32 reduction */
+static const uint64 kP1 = UINT64CONST(0x105ec76f0); /* Barrett reduction */
+static const uint64 kP2 = UINT64CONST(0x0dea713f1); /* Barrett reduction */
+
+/*
+ * Core CRC-32C computation using carry-less multiplication.
+ *
+ * Input: CRC in working form (already inverted with ~crc)
+ * Output: CRC in working form (still inverted)
+ *
+ * Precondition: len >= 32 and len % 16 == 0
+ */
+static uint32
+pg_crc32c_clmul_core(uint32 crc_inverted, const unsigned char *buf, uint64 len)
+{
+ V128 x;
+
+ /* Load first 16-byte block and XOR with inverted CRC */
+ x = pg_v128_load(buf);
+ x.lo ^= (uint64) crc_inverted;
+ buf += 16;
+ len -= 16;
+
+ /* Fold 16-byte blocks into 128-bit accumulator */
+ while (len >= 16)
+ {
+ V128 block = pg_v128_load(buf);
+ V128 lo = pg_clmul128(x.lo, kK5);
+ V128 hi = pg_clmul128(x.hi, kK6);
+
+ x = pg_v128_xor(pg_v128_xor(lo, hi), block);
+ buf += 16;
+ len -= 16;
+ }
+
+ /* Reduce 128-bit to 64-bit */
+ {
+ V128 tmp = pg_clmul128(kK6, x.lo);
+
+ x = pg_v128_xor(pg_v128_shift_right64(x), tmp);
+ }
+
+ /* Reduce 64-bit to 32-bit */
+ {
+ V128 tmp = pg_v128_shift_right32(x);
+
+ x = pg_v128_and_mask32(x);
+ x = pg_clmul128(kK7, x.lo);
+ x = pg_v128_xor(x, tmp);
+ }
+
+ /* Barrett reduction to final 32-bit CRC */
+ {
+ V128 tmp = pg_v128_and_mask32(x);
+
+ tmp = pg_clmul128(kP2, tmp.lo);
+ tmp = pg_v128_and_mask32(tmp);
+ tmp = pg_clmul128(kP1, tmp.lo);
+ x = pg_v128_xor(x, tmp);
+ }
+
+ /* Extract result from second 32-bit lane */
+ return (uint32) ((x.lo >> 32) & UINT64CONST(0xFFFFFFFF));
+}
+
+/*
+ * Main CRC-32C computation function with RISC-V Zbc acceleration
+ */
+pg_crc32c
+pg_comp_crc32c_riscv_zbc(pg_crc32c crc, const void *data, size_t len)
+{
+ const unsigned char *p = data;
+ const size_t kMinLen = 32;
+ const size_t kChunkLen = 16;
+ size_t tail;
+
+ /* Use software fallback for small buffers */
+ if (len < kMinLen)
+ return pg_comp_crc32c_sb8(crc, data, len);
+
+ /*
+ * Process head bytes to align to 16-byte boundary if needed. The hardware
+ * algorithm requires 16-byte aligned access.
+ */
+ /* Process tail bytes with software (Abseil approach) */
+ tail = len % kChunkLen;
+ if (tail)
+ {
+ crc = pg_comp_crc32c_sb8(crc, p, tail);
+ p += tail;
+ len -= tail;
+ }
+
+ /*
+ * Process remaining bytes (now a multiple of 16) with hardware. The core
+ * algorithm requires at least 32 bytes.
+ */
+ if (len >= 32)
+ {
+ /*
+ * The Abseil core algorithm expects to receive 0xFFFFFFFF as the
+ * initial CRC value (corresponding to Abseil's initial value of 0
+ * after inversion). PostgreSQL's convention already passes 0xFFFFFFFF
+ * initially, so pass it directly. The core returns a value that needs
+ * final XOR with 0xFFFFFFFF (done by the caller).
+ */
+ crc = pg_crc32c_clmul_core(crc, p, len);
+ }
+
+ return crc;
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cf40c87043..372a80c067f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3393,6 +3393,7 @@ VirtualTransactionId
VirtualTupleTableSlot
VolatileFunctionStatus
Vsrt
+V128
WAIT_ORDER
WALAvailability
WALInsertLock
--
2.50.1
From c624562a93080d9a1b1411d5a069aec3820338ad Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Mon, 8 Jun 2026 11:14:50 -0400
Subject: [PATCH v6 3/3] Disable auto-vectorization on RISC-V with Clang older
than 22
Clang's loop vectorizer miscompiles data-dependent scatter-store loops of
the form "dst[idx[i]] = expr" on RISC-V when the V extension is enabled and
auto-vectorization runs at -O2 or above. The indexed scatter is lowered to
a unit-stride store, silently dropping the permutation and producing wrong
results.
This was first observed as authentication failures from contrib/pgcrypto on
the riscv64 buildfarm animal greenfly: des_init() builds its permutation
tables with exactly this idiom, so the resulting DES tables were corrupt.
The same source idiom appears elsewhere in the tree (for example in
src/timezone/zic.c, which is not meaningfully exercised by the regression
tests), so a per-call-site workaround in crypt-des.c does not scale -- a
later Clang release could vectorize a site we have not annotated.
The bug is fixed in Clang 22; Clang 20 and 21 are affected and the fix was
not backported. Clang 20 still ships as the default in current
distributions for riscv64, so rather than refuse to build with affected
compilers, disable auto-vectorization globally for the affected combination
by adding -fno-vectorize. The configure/meson probe keys on the compiler
(__clang__ && __riscv && __clang_major__ < 22) so that a fixed Clang, GCC,
or any non-RISC-V target is unaffected and pays no cost.
Author: Greg Burd <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
configure.ac | 19 +++++++++++++++++++
meson.build | 25 +++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/configure.ac b/configure.ac
index dc9c0fd247a..a2dba65457b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -804,6 +804,25 @@ choke me
[AC_MSG_ERROR([Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use -msse2 or use gcc.])])
fi
+# Defend against a Clang loop-vectorizer wrong-code bug on RISC-V. Clang
+# versions before 22 miscompile data-dependent scatter-store loops of the
+# form "dst[idx[i]] = expr" when the V extension is enabled and auto-
+# vectorization runs (at -O2 and above): the indexed scatter is lowered to
+# a unit-stride store, silently dropping the permutation. We hit this in
+# des_init() (contrib/pgcrypto), and the same idiom appears elsewhere in
+# the tree (e.g. zic.c), so a per-site workaround does not scale. The bug
+# is fixed in Clang 22; rather than refuse such compilers outright (Clang
+# 20 still ships in current distributions for riscv64), disable auto-
+# vectorization globally on the affected combination. The test keys on the
+# compiler so that a fixed Clang, or any other compiler, is unaffected.
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [
+@%:@if defined(__clang__) && defined(__riscv) && __clang_major__ < 22
+choke me
+@%:@endif
+])], [],
+[PGAC_PROG_CC_CFLAGS_OPT([-fno-vectorize])
+ PGAC_PROG_CXX_CFLAGS_OPT([-fno-vectorize])])
+
AC_PROG_CPP
AC_SUBST(GCC)
diff --git a/meson.build b/meson.build
index 286d3b01f15..12c0244fc4d 100644
--- a/meson.build
+++ b/meson.build
@@ -2175,6 +2175,31 @@ choke me
endif
+# Defend against a Clang loop-vectorizer wrong-code bug on RISC-V. Clang
+# versions before 22 miscompile data-dependent scatter-store loops of the
+# form "dst[idx[i]] = expr" when the V extension is enabled and auto-
+# vectorization runs (at -O2 and above): the indexed scatter is lowered to
+# a unit-stride store, silently dropping the permutation. We hit this in
+# des_init() (contrib/pgcrypto), and the same idiom appears elsewhere in
+# the tree (e.g. zic.c), so a per-site workaround does not scale. The bug
+# is fixed in Clang 22; rather than refuse such compilers outright (Clang
+# 20 still ships in current distributions for riscv64), disable auto-
+# vectorization globally on the affected combination. The test keys on the
+# compiler so that a fixed Clang, or any other compiler, is unaffected.
+if not cc.compiles('''
+#if defined(__clang__) && defined(__riscv) && __clang_major__ < 22
+choke me
+#endif''',
+ name: 'whether Clang on RISC-V needs auto-vectorization disabled',
+ args: test_c_args)
+ no_vectorize_cflags = cc.get_supported_arguments(['-fno-vectorize'])
+ cflags += no_vectorize_cflags
+ if have_cxx
+ cxxflags += cxx.get_supported_arguments(['-fno-vectorize'])
+ endif
+endif
+
+
# Check whether the C++ compiler supports designated initializers.
# These are used by PG_MODULE_MAGIC, and we use the result of this
# test to decide whether to enable the test_cplusplusext test module.
--
2.50.1