Hi,

Thanks for taking a look at this.

On 2026-05-28 14:17:16 +0000, Andres Freund wrote:
> How confident are we that this bug just affects DES?

It doesn't.  I scanned master for the scatter-write idiom that des_init() uses
(`dst[idx[i]] = expr`) and then compiled each candidate on greenfly with both
gcc-13 and clang-20 at -O2 with -march=rv64gcv, diffing the disassembly for
indexed-store instructions.  The static scan turned up ~30 sites; clang-20
actually emits RVV scatter/strided stores in three of them, where gcc-13 emits
scalar code:

  contrib/pgcrypto/crypt-des.c   des_init        3x vsoxei8 (the known case)
  src/timezone/zic.c        ~L2330 omittype[]    2x vsoxei8 (byte dest, same 
shape)
  contrib/pg_trgm/trgm_op.c compactTrgm          3x vsse8   (strided, not 
indexed)

The zic.c hit is the one that bothers me.  It's the same shape as des_init() --
byte-sized destination, scatter store via an indexing array -- and it's not
exercised by `make check`, since zic runs at `make install` to compile the IANA
tz data.  If clang-20 miscompiles zic the same way it miscompiles des_init(),
the buildfarm wouldn't notice and the resulting tzdata would silently be wrong.
I haven't yet built and run that path on greenfly with clang-20 to confirm the
miscompile actually triggers there; I'll do that next.

The trgm_op.c hit is structurally different (strided store, not indexed
scatter), so it isn't the #176001 pattern.  I'm flagging it because it's in the
same family of clang-20 RVV transformations on byte destinations, not because I
have a known miscompile for it.

Several other sites that match the source pattern (ginlogic.c MAYBE- twiddle,
spi.c attribute scatter, spgtextproc.c, ...) survive clang-20 codegen
unvectorized -- the autovec heuristics don't fire on the variable-length,
control-flow-dependent loops where this idiom tends to live in our tree.  So the
*visible* exposure today appears bounded to crypt-des.c and zic.c, but I would
not bet money that the next clang release won't extend the autovec to those.

One more data point: the scatter stores only appear at -O2 and -O3, not -O1.
Our default is -O2.

> Have you confirmed that, by using a newer clang, the merging of the
> fixes actually fixes the problem?

Not yet.  apt.llvm.org doesn't ship riscv64 packages, so I'm building clang from
llvm-project main on greenfly now (currently mid-build, ~24h on this CPU under
nice).  Once it's installed I'll rerun the asm audit and the test suite under
it; if HEAD generates correct code for crypt-des.c and zic.c that's a real data
point for "reject affected clang" and gives us a version range for the cutoff.

> ISTM a perfectly viable patch would be to just reject building with a
> non-very-recent clang on riscv.

It's the cleanest answer if HEAD is in fact fixed, and I'd lean that way.  The
cost is excluding the clang that ships in current Debian/ Ubuntu stable on
riscv64 (clang 20 in noble / trixie), which is what most riscv64 users actually
have today.  I'd rather have the clang-HEAD bisect data before picking the
cutoff -- "reject < clang-N" is much more defensible with N pinned to a specific
fix.

> That seems like a pretty odd fix for the problem. If the problem is
> auto-vectorization, we should stop auto-vectorization, not sprinkle
  > memory barriers around.

Agreed.  pg_memory_barrier() was the first thing that made the test pass and I
shipped it without going back to the right primitive.  The audit makes per-loop
pragmas look worse, not better: the affected sites aren't all in one file, and
one of them (zic.c) is third-party- ish code we mostly don't touch.  In rough
order of locality:

  1. #pragma clang loop vectorize(disable) on the four affected loops
     in des_init() (and the equivalent in zic.c, if confirmed).  Doesn't
     scale if the next clang release vectorizes a fifth site.
  2. A pg_attribute_no_vectorize on des_init() / the affected zic
     function.  Coarser, defensive against new scatter writes inside
     the same function.
  3. Configure-time -fno-tree-vectorize on crypt-des.c (and zic.c)
     for clang on riscv.  File scope.  Simple and static.
  4. Configure-time -fno-tree-vectorize globally for clang+riscv64
     until clang-N, where N is the bisected fix.  Or refuse < N.
  5. Hard error in configure for clang versions in the affected range.

After the audit my preference shifted from (1) to (3) or (4): the bug isn't
DES-specific, the patch shouldn't be either, and decorating loops one at a time
as we trip over them is exactly the "sprinkle barriers around" complaint applied
a level up.  (4) would also fix the secondary intermittent issue we've seen on
greenfly that I haven't been able to attribute to any specific loop.

I'll send v5 once I have:
  - clang-HEAD on greenfly and the rerun audit/test results
  - confirmation (or refutation) that the zic.c miscompile actually
    triggers, not just that the scatter store is emitted

If both confirm what the asm audit suggests, v5 will drop the
pg_memory_barrier() approach in 0001 and replace it with either (3)
or (4)+(5), depending on whether clang-HEAD passes.

The other two patches do improve CRC32 and popcount on RISC-V, they are still
worth considering.

best.

-greg


Reply via email to