https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976
Bug ID: 118976
Summary: Correctness Issue: SVE vectorization results in data
corruption when cpu has 128bit vectors
Product: gcc
Version: 14.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: lrbison at amazon dot com
Target Milestone: ---
Created attachment 60555
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60555&action=edit
Standalone Reproducer
Hello Team,
A customer came to me with a sha1 implementation that was producing corrupt
values on Graviton4 with -O3.
I isolated the problem to the generation of the trailing bytecount in
big-endian which is then included in the checksum. The original code snippet
is here, and several variants of it can be found online with some googling
for (i = 0; i < 8; i++) {
finalcount[i] = (unsigned char)((context->count[(i >= 4 ? 0 : 1)]
>> ((3-(i & 3)) * 8) ) & 255); /* Endian independent */
}
I've attached a stand-alone reproducer in which the problematic function is
called finalcount_av. I have found that gcc 11 and previous don't vectorize
and don't have the issue, while gcc 12.4 through gcc 14.2 produce corrupt
results. Although trunk doesn't exhibit the problem, I believe this is because
of changed optimization weights rather than because the error was fixed.
It is also worth noting that the corruption only occurs in hardware with
128-bit SVE vectors. On Graviton3 with 256-bit vectors the generated machine
code can exit early and not execute the problematic second half.
Here is a link to Compiler Explorer with the same function
https://godbolt.org/z/c99bMjene
Note that the value of NCOUNT can be set to either 2 or 4, with 4 preventing
the compiler from simply using the `rev` instruction on trunk. Notably though
setting NCOUNT to 4 generates correct code in all versions I tested.