https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123597

--- Comment #12 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Hi, it maybe that the problem with combined constructs is more general.

Jacub's comment indicates that the bindings would work with pure omp for, but
not with omp loop. Tobias has here a bug where omp loop fails with a private
clause while omp for works.

This time, no templates are involved, but the private clause also makes some
kind of combined statement:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117247

Notably, Tobias mentions that the code would "work" with his amd card, but he
noted the same correct results for the matrix multiplication first, whose
defect is apparent in the gimple dump, long before any backend translation.

It may well be that amd just knows this and places atomics internally, or that
amd hardware sees that the test cases are small and evaluates the loops
serially. 

Coincidentially, this new ice on valid code here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123750

occurs for my math library that uses a longer combined statement, with shared,
is_device_ptr and teams distribute parallel for clauses... and a templated type
pointer that is a device ptr and implicitely thread private...

In that case, I get an ICE with a segfault of gcc-16, on valid code that
compiles and runs with clang. 

If i remove the shared clauses, I get the same cuctxsynchronize problem that
Tobias reports in his bug, which oes away when i remove any clause that would
execute the code in parallel and work on target with a single thread..

In that case, the gimple dump looks normal to my eyes (i tried to evade the
scoping problem of this bug with brackets, but the ICE also occurs with
combined clauses and constructs. 

Perhaps there is still a problem with the binding if clauses like shared and
is_device_ptr are used?


It may well be that all this, 

this matrix multiplication bug with collapse(2) and templated types,

Tobias bug with a combined statement of shared and loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117247

and this ice on valid code with a longer combined openmp clause and template
typed variables 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123750

are all rooted in a similar issue that gcc has difficulties evaluating combined
OpenMP clauses correctly? And due to a late binding of the type, this may then
be more easily provoked if one has variables of templated type?

Reply via email to