On 2026-05-28 10:17:16 -0400, Andres Freund wrote: > How confident are we that this bug just affects DES?
It's a Clang LoopVectorize wrong-code bug on rv64gcv that we happen to hit in des_init(); DES is just the spot with riscv build-farm coverage (greenfly). I reduced it to ~15 lines with no DES semantics: a data-dependent scatter store dst[idx[i]-1] = i over a directly-visible array, vectorized at -O2 -march=rv64gcv, comes out wrong. -O1, -fno-vectorize, building without the V extension, and GCC are all correct: clang -O2 -march=rv64gcv -> wrong clang -O1 -march=rv64gcv -> ok clang -O2 -march=rv64gcv -fno-vectorize -> ok clang -O2 (no V) -> ok gcc -O2 -march=rv64gcv -> ok In the reduced case the scatter is miscompiled into a contiguous copy of the index array into the destination (vle8.v/vse8.v), dropping both the index and the stored value; in the full DES code it instead emits an indexed vsoxei8.v. Either way it's the loop vectorizer, confirmed by -fno-vectorize fixing it. So any scatter-store loop the vectorizer chooses to handle could be affected, not just this one. That argues against my per-site barrier approach. > That seems like a pretty odd fix for the problem. If the problem is > auto-vectorization, we should stop auto-vectorization, not sprinkle memory > barriers around. Agreed. pg_memory_barrier() only works by incidentally blocking the vectorizer (the memory clobber), which is misleading -- it reads as a concurrency primitive -- fragile, and per-site. I'll drop it. > Have you confirmed that, by using a newer clang, the merging of the fixes > actually fixes the problem? > ISTM a perfectly viable patch would be to just reject building with a > non-very-recent clang on riscv. Confirming empirically now (building trunk clang on the riscv box). Two facts worth recording first: - Clang 20.1.2 was tagged 2025-04-02. The LLVM fixes I cited earlier all merged in Jan 2026, so 20.1.2 contains none of them. - However, none of those cited issues/PRs is actually this bug. #176105 is a cost-model change and rv32-only; #176001/#176077 are RISC-V backend (register coalescer / vmerge COPY-sink) fixes from fuzzer vector_size code; #171978 and #187458 are unrelated lowering bugs. I can't find any existing LLVM issue matching an indexed-scatter auto-vectorization correctness miscompile -- closest but distinct are #80792 and #187402/PR#187802 (LAA WAW, strided, still open). So a newer clang is not guaranteed to fix it. - Confirmed: clang 20.1.2 and 21.1.8 both miscompile; clang 22.1.6 is correct (cross-compiled to riscv64 and run natively on the box). The scatter is lowered to a unit-stride vse8.v copy (dropping the permutation) on ≤21, and to a correct vsoxei8.v indexed scatter on 22. So the fix is in clang 22 and was not backported to 21.1.8. Given that, the plan: - Adopt your suggestion: a configure/meson check keyed on __clang_major__ < 22 for riscv. Simplest is to refuse that combination. - No new LLVM issue needed -- it's already fixed in clang 22. I can still file one to request a 21.x backport if that's worthwhile. Either way I'll create a patch that gates the use of the optimization flags based on the clang version for review ASAP. The other two patches; Zbb popcount, and Zbc CRC32C, are feature patches that are valuable but separate. I'll update that thread with those. best. -greg
