Hi Thomas,

On 13.11.23 11:59, Thomas Schwinge wrote:
    - 'gcc/cp/pt.cc:tsubst_omp_clauses',
    - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
      'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
    - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
    - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
      'gcc/tree-nested.cc:convert_local_omp_clauses'
    - 'gcc/tree-pretty-print.cc:dump_omp_clause'
[...]
Most of those seem to relate to executable directives
(That remark I don't understand.)

OpenMP classifies directives into categories. The following exists
(grep + count of TR12's json/directive/ files.)

      2 meta
      2 subsidiary
      2 utility
      4 informational
     11 declarative
     40 executable

where "declare target' + 'indirect' (like declare variant, declare simd) is 
declarative,
i.e.

"declare target directive - A declarative directive that ensures that procedures
and/or variables can be executed or accessed on a device."

For those, we usually add an attribute to the function declaration. And 
'executive' is
defined as

"executable directive - A directive that appears in an executable context and 
results in
implementation code and/or prescribes the manner in which associate user code must 
execute."

Those either are turned into a libgomp call or transform some code - possibly 
including
calling the library. That would be, e.g. 'omp target', 'omp parallel', 'omp 
atomic'.

The listed functions all deal with executable code, i.e. some tree ..._EXPR, 
possibly associated
with a structured block (at least the scan_* only apply for block-associated 
directives as they
scan of usage inside that block - affecting the directive/the directive's 
clauses).

* * *

We use noclone,noinline attributes for 'declare target', thus, there
should be no issue on this side and regarding tsubst_omp_clauses, as the
clause is either present or not (either bare or with a parse-time
constant logical), there is not much post processing needed.
That's not obvious to the casual reader of GCC source code, though.

True, but that's the general problem with code - without background, you
don't always understand the flow in the code and when something is called.

I think there is no good way how this can be solved; or rather, it can
be solved for a specific question but the next person looks for
something else and then has the same problem and the previous "solution"
(like comment) doesn't help.

In some cases, I think it helps to add comments, but I have the feeling
that your question is to specific (you look at a single patch) to be
really helpful here. But I am happy to be proven wrong and see useful
code changes/comments.

Thus, I bet that there is nothing to do for those.

Please verify, and add handling as well as test cases as necessary, or,
as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
'default: gcc_unreachable ();' etc., if indeed that clause is not
expected there.
What's the point of having it next to default if it is gcc_unreachable?
Instead of "bet", I suggest to document intentions: so that it's clear
that 'OMP_CLAUSE_INDIRECT' is not meant to be seen here vs. an accidental
omission.
Done - in this email thread, which can be found by patch archeology.
I mean there are several others which shouldn't be needed like
OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc.
Quite possible.  :-) I certainly wouldn't object to "handling" those,
too.

Generally, in my opinion, we should usually see 'case's listed for all
clause codes where we 'switch' on them, for example.

If there are plenty of 'default:', I am no sure it makes sense.

But in general, the question is (for a switch where most 'enum' values are used)
whether it would make more sense to remove the 'default: gcc_unreachable();'.
In that case, we do not handle the others - but the -Wswitch-enum will warn 
about it.
Thus, a bootstrap build will ensure that all values are covered due to 
-Werror=switch-enum.

Downside is that without -Werror/bootstrap, it will silently fall through but 
for
normal FE/ME code, it is guaranteed to be bootstrapped and will show up.

* * *

Also, for my understanding: why is 'build_indirect_map' done at kernel
invocation time (here) instead of at image load time?
The splay_tree is generated on the device itself - and we currently do
not start a kernel during GOMP_OFFLOAD_load_image. We could, the
question is whether it makes sense. (Generating the splay_tree on the
host for the device is a hassle and error prone as it needs to use
device pointers at the end.)
Hmm.  It seems conceptually cleaner to me to set this up upfront, and
avoids potentially slowing down every device kernel invocation (at least
another function call, and 'gomp_mutex_lock' check).  Though, I agree
this may be "in the noise" with regards to all the other stuff going on
in 'gomp_gcn_enter_kernel' and elsewhere...

I think the most common case is GOMP_INDIRECT_ADDR_MAP == NULL.

The question is whether the lock should/could be moved inside  if 
(!indirect_array)
or not. Probably yes:
* doing an atomic load for the outer '!indirect array', work on a local array 
for
the build up and only assign it at the end - and just after the lock check again
whether '!indirect array'.

That way, it is lock free once build but when build there is no race.

What I just realize, what's also unclear to me is how the current
implementation works with regards to several images getting loaded --
don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
(conceptually) appending to it?

Yes, I think that will happen - but it looks as if the same issue exists
also the other code? I think that's not the first variable that has that
issue?

I think we should try to cleanup that handling, also to support calling
a device function in a shared library from a target region in the main
program, which currently also fails.

All device routines that are in normal static libraries and in the
object files of the main program should simply work thanks to offload
LTO such that there is only a single GOMP_offload_register_ver call (per
device type) and GOMP_OFFLOAD_load_image call (per device).

Likewise if the offloading is only done via a single shared library. —
Any mixing will currently fail, unfortunately. This patch just adds
another item which does not handle it properly.

(Not good but IMHO also not a showstopper for this patch.)

In the general case, additional images may also get loaded during
execution.  We thus need proper locking of the shared data structure, uh?
Or, can we have separate on-device data structures per image?  (I've not
yet thought about that in detail.)

I think we could - but in the main-program 'omp target' case that calls
a shared-library 'declare target' function means that we need to handle
multiple GOMP_offload_register_ver / load_image calls such that they can
work together.

Obviously, it gets harder if the user keeps doing dlopen() / dlclose()
of libraries containing offload code where a target/compute region is
run before, between, and after those calls (but hopefully not running
when calling dlopen/dlclose).

Relatedly then, when images are unloaded, we also need to remove stale
items from the table, and release resources (for example, the
'GOMP_OFFLOAD_alloc' for 'map_target_addr').

True. I think the general assumption is that images only get unloaded at
the very end, which matches most but not all code. Yet another work item.

I think we should open a new PR about this topic and collect work items
there.

+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
Another thing regarding this test case: testing
'-foffload-options=amdgcn-amdhsa=-march=gfx900' offloading on our
amdnano4 system, I see:
...
Re-running this manually a few times, I got:
     pass: 5
     fail (as above): 3
     hang: 1

Looks like something that needs to be investigated.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to