https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122281
Tobias Burnus <burnus at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2025-10-30
Ever confirmed|0 |1
Summary|libgomp: cuCtxSynchronize |[OpenMP][SIMT] libgomp:
|error: an illegal memory |cuCtxSynchronize error: an
|access was encountered in |illegal memory access was
|code that reserves memory |encountered in code that
|correctly. |reserves memory correctly.
Keywords| |openmp
CC| |tschwinge at gcc dot gnu.org
--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
This is an NVPTX only issue:
With -O0 - or with any -O0 and -foffload=disable / -foffload=amdgcn-amdhsa,
the result is:
In ompwlower:
[datablock.h:703:25] #pragma omp atomic_load relaxed
D.234487 = *&count
[datablock.h:703:25] D.234488 = D.234487 + 1;
[datablock.h:703:25] #pragma omp atomic_store relaxed (D.234488)
and then in ompexp:
<bb 10> :
[datablock.h:703:25] _32 = .omp_data_i_8(D)->count;
[datablock.h:703:25 discrim 1] __atomic_fetch_add_8 (_32, 1, 0);
Which looks fine.
However, for -foffload=nvptx-none:
* omplower duplicates this code to:
[datablock.h:698:34] #pragma omp for nowait private(i.152)
for (i.152 = 0; i.152 < D.234916; i.152 = i.152 + 1)
....
[datablock.h:701:22] D.234805 = [datablock.h:701:22] *D.234804;
[datablock.h:701:13] if (D.234805 == 0.0) goto <D.234878>; else goto
<D.234879>;
<D.234878>:
[datablock.h:703:25] D.234913 = .omp_data_i->count;
[datablock.h:703:25] #pragma omp atomic_load relaxed
D.234808 = *D.234913
[datablock.h:703:25] D.234809 = D.234808 + 1;
[datablock.h:703:25] #pragma omp atomic_store relaxed (D.234809)
goto <D.234880>;
...
#pragma omp return(nowait)
}
goto <D.234873>;
<D.234872>:
...
[datablock.h:701:22] D.234805 = [datablock.h:701:22] *D.234804;
[datablock.h:701:13] if (D.234805 == 0.0) goto <D.234882>; else goto
<D.234883>;
<D.234882>:
[datablock.h:703:25] #pragma omp atomic_load relaxed
D.234808 = *&*D.234913
[datablock.h:703:25] D.234809 = D.234808 + 1;
[datablock.h:703:25] #pragma omp atomic_store relaxed (D.234809)
goto <D.234884>;
Which still kind of looks okay but in ompexp this gets converted to:
<bb 43> :
[datablock.h:703:25 discrim 3] __atomic_fetch_add_8 (D.235057, 1, 0);
--
<bb 37> :
[datablock.h:703:25] D.235057 = .omp_data_i->count;
[datablock.h:703:25 discrim 1] __atomic_fetch_add_8 (D.235057, 1, 0);
Obviously, the .omp_data_i->count is missing. It could be hoisted, but as
written, it needs to be there - especially as the loop is executed multiple
times.
I wonder whether an 'unshare_expr' is missing here?