Re: Add RISC-V Zbb popcount optimization

Greg Burd Mon, 01 Jun 2026 13:55:10 -0700


On 2026-05-28 10:17:16 -0400, Andres Freund wrote:
> How confident are we that this bug just affects DES?


It's a Clang LoopVectorize wrong-code bug on rv64gcv that we happen to hit in
des_init(); DES is just the spot with riscv build-farm coverage (greenfly).

I reduced it to ~15 lines with no DES semantics: a data-dependent scatter
store dst[idx[i]-1] = i over a directly-visible array, vectorized at
-O2 -march=rv64gcv, comes out wrong. -O1, -fno-vectorize, building without
the V extension, and GCC are all correct:

  clang -O2 -march=rv64gcv                -> wrong
  clang -O1 -march=rv64gcv                -> ok
  clang -O2 -march=rv64gcv -fno-vectorize -> ok
  clang -O2            (no V)             -> ok
  gcc   -O2 -march=rv64gcv                -> ok

In the reduced case the scatter is miscompiled into a contiguous copy of the
index array into the destination (vle8.v/vse8.v), dropping both the index and
the stored value; in the full DES code it instead emits an indexed vsoxei8.v.
Either way it's the loop vectorizer, confirmed by -fno-vectorize fixing it.

So any scatter-store loop the vectorizer chooses to handle could be affected,
not just this one. That argues against my per-site barrier approach.

> That seems like a pretty odd fix for the problem. If the problem is
> auto-vectorization, we should stop auto-vectorization, not sprinkle memory
> barriers around.

Agreed. pg_memory_barrier() only works by incidentally blocking the
vectorizer (the memory clobber), which is misleading -- it reads as a
concurrency primitive -- fragile, and per-site. I'll drop it.

> Have you confirmed that, by using a newer clang, the merging of the fixes
> actually fixes the problem?
> ISTM a perfectly viable patch would be to just reject building with a
> non-very-recent clang on riscv.

Confirming empirically now (building trunk clang on the riscv box). Two facts
worth recording first:

 - Clang 20.1.2 was tagged 2025-04-02. The LLVM fixes I cited earlier all
   merged in Jan 2026, so 20.1.2 contains none of them.

 - However, none of those cited issues/PRs is actually this bug. #176105 is a
   cost-model change and rv32-only; #176001/#176077 are RISC-V backend
   (register coalescer / vmerge COPY-sink) fixes from fuzzer vector_size code;
   #171978 and #187458 are unrelated lowering bugs. I can't find any existing
   LLVM issue matching an indexed-scatter auto-vectorization correctness
   miscompile -- closest but distinct are #80792 and #187402/PR#187802 (LAA
   WAW, strided, still open). So a newer clang is not guaranteed to fix it.

 - Confirmed: clang 20.1.2 and 21.1.8 both miscompile; clang 22.1.6 is correct
   (cross-compiled to riscv64 and run natively on the box). The scatter is
   lowered to a unit-stride vse8.v copy (dropping the permutation) on ≤21, and
   to a correct vsoxei8.v indexed scatter on 22. So the fix is in clang 22 and
   was not backported to 21.1.8.

Given that, the plan:

 - Adopt your suggestion: a configure/meson check keyed on __clang_major__ < 22
   for riscv. Simplest is to refuse that combination.

 - No new LLVM issue needed -- it's already fixed in clang 22. I can still file
   one to request a 21.x backport if that's worthwhile.

Either way I'll create a patch that gates the use of the optimization flags
based on the clang version for review ASAP.

The other two patches; Zbb popcount, and Zbc CRC32C, are feature patches that 
are
valuable but separate.  I'll update that thread with those.

best.

-greg

Re: Add RISC-V Zbb popcount optimization

Reply via email to