| Issue |
179272
|
| Summary |
[Clang] Miscompile with local array variable in OpenMP SIMD loop / `llvm.loop.parallel_accesses`
|
| Labels |
clang
|
| Assignees |
|
| Reporter |
juliusikkala
|
Compiler explorer: https://godbolt.org/z/vnce5rE7e
```c
// Compile with `clang test.c -O3 -fopenmp -march=native`
// To observe correct behavior, omit `-fopenmp`.
#include <stdio.h>
void foo(int *A, int n) {
#pragma omp simd
for (long i = 0; i < n; i++) {
int t[32];
for (int j = 0; j < 32; ++j)
t[j] = i;
A[i] = t[i];
}
}
int main()
{
int A[32];
foo(A, 32);
for (int i = 0; i < 32; ++i)
printf("%d ", A[i]);
return 0;
}
```
With `-fopenmp`, the program prints:
`7 7 7 7 7 7 7 7 15 15 15 15 15 15 15 15 23 23 23 23 23 23 23 23 31 31 31 31 31 31 31 31`
Without, it prints:
`0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31`
The program is also sensitive to the length of the array. If it is less than 16, it appears that the program behaves correctly.
## Diagnosis
It appears that the `alloca` inst for `t` is hoisted outside of the loop, even when the loop is parallel and the same stack allocation should not be shared between all iterations. Ideally, I think it should get privatized, i.e. a larger `alloca` should be done upfront such that each parallel lane has its own instance of the array. Alternatively (and less preferrably), this case should not get vectorized.
## Context
In this issue, I'm only using OpenMP as a means to set `llvm.loop.parallel_accesses` for the loop and get an easy C reproducer for the problem. The actual context is work item loops annotated with `llvm.loop.parallel_accesses` in [pocl](https://github.com/pocl/pocl) (tagging @pjaaskel) and [Slang](https://github.com/shader-slang/slang) (in both, when targeting CPU execution).
To my knowledge, `pocl` deals with this issue by manually privatizing the `t` array such that each parallel loop iteration has its own array, so this issue does not directly surface there. The fact that upstream LLVM does not perform privatization for the array prevents some vectorization opportunities. LoopAccessAnalyzer/LoopVectorizationLegality seem to prevent vectorization when there are load-store dependencies to loop-invariant addresses, even when the loop has the `parallel_accesses` metadata. I suspect that it does this specifically to work around these problems with local variables getting incorrectly shared, but misses the one in this issue.
In the Slang case, this means that I can't safely annotate the work item loop with `parallel_accesses` as there is no privatization pass like in pocl. Local arrays in work items can cause this miscompile to occur.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs