https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123597
--- Comment #12 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- Hi, it maybe that the problem with combined constructs is more general. Jacub's comment indicates that the bindings would work with pure omp for, but not with omp loop. Tobias has here a bug where omp loop fails with a private clause while omp for works. This time, no templates are involved, but the private clause also makes some kind of combined statement: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117247 Notably, Tobias mentions that the code would "work" with his amd card, but he noted the same correct results for the matrix multiplication first, whose defect is apparent in the gimple dump, long before any backend translation. It may well be that amd just knows this and places atomics internally, or that amd hardware sees that the test cases are small and evaluates the loops serially. Coincidentially, this new ice on valid code here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123750 occurs for my math library that uses a longer combined statement, with shared, is_device_ptr and teams distribute parallel for clauses... and a templated type pointer that is a device ptr and implicitely thread private... In that case, I get an ICE with a segfault of gcc-16, on valid code that compiles and runs with clang. If i remove the shared clauses, I get the same cuctxsynchronize problem that Tobias reports in his bug, which oes away when i remove any clause that would execute the code in parallel and work on target with a single thread.. In that case, the gimple dump looks normal to my eyes (i tried to evade the scoping problem of this bug with brackets, but the ICE also occurs with combined clauses and constructs. Perhaps there is still a problem with the binding if clauses like shared and is_device_ptr are used? It may well be that all this, this matrix multiplication bug with collapse(2) and templated types, Tobias bug with a combined statement of shared and loop https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117247 and this ice on valid code with a longer combined openmp clause and template typed variables https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123750 are all rooted in a similar issue that gcc has difficulties evaluating combined OpenMP clauses correctly? And due to a late binding of the type, this may then be more easily provoked if one has variables of templated type?
