https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123750
--- Comment #1 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- I now applied a patch from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123597 to gcc-16. It turns out the ice is still there for the pragma: #pragma omp target teams distribute parallel for collapse(2) map(tofrom:count) shared(pooled_offsets_starts,pooled_offsets_flat) is_device_ptr(pd,pooled_offsets_flat,pooled_offsets_starts) device(devnum) However, the loop will run in parallel, if the shared and is_device_ptr clauses are dropped, i.e. if i write: #pragma omp target teams distribute parallel for collapse(2) map(tofrom:count) device(devnum) If i use the shared clauses, i.e. #pragma omp target teams distribute parallel for collapse(2) map(tofrom:count) shared(pooled_offsets_starts,pooled_offsets_flat) device(devnum) then I will get: /home/benni/projects/arraylibrary/openmp/datablockcontainer.h: In member function 'void BlockedDataView<T>::build_blocks_rank2(size_t, size_t, bool) [with T = double]': /home/benni/projects/arraylibrary/openmp/datablockcontainer.h:251:21: internal compiler error: in gimplify_var_or_parm_decl, at gimplify.cc:3426 251 | #pragma omp target teams distribute parallel for collapse(2) map(tofrom:count) shared(pooled_offsets_starts,pooled_offsets_flat) device(devnum) If i use the is_device_ptr clauses,: #pragma omp target teams distribute parallel for collapse(2) map(tofrom:count) is_device_ptr(pd,pooled_offsets_flat,pooled_offsets_starts) device(devnum) then it will compile, but i will get at runtime: libgomp: cuCtxSynchronize error: an illegal memory access was encountered libgomp: cuModuleGetFunction (__do_global_dtors__entry) error: an illegal memory access was encountered libgomp: cuMemFree_v2 error: an illegal memory access was encountered libgomp: device finalization failed This all makes, however, no sense. the variables pd,pooled_offsets_flat,pooled_offsets_starts are all device pointers and were allocated with the right size by omp_target_alloc. Also, ehm, in that loop, if one sets the pointers of the arrays as shared.. well there should not be a problem in the code doing so. all threads should read and fill different values...
