Hi!

On Tue, 14 Jul 2015 11:36:24 +0200, I wrote:
> On Tue, 14 Jul 2015 13:46:04 +0800, Chung-Lin Tang <clt...@codesourcery.com> 
> wrote:
> > this patch provides a 'bool independent' field in struct loop, which
> > will be switched on by an "independent" clause in a #pragma acc loop 
> > directive.
> 
> Thanks!
> 
> 
> This patch has been developed in context of OpenACC kernels constructs,
> but, is there anything still to be done regarding OpenACC parallel
> constructs?  That is, are we currently *using* the "independent yes/no"
> information appropriately for these?

Tom mentioned:

| openacc spec:
| ...
| 2.7.9 independent clause
| In a kernels construct, the independent clause tells the implementation 
| that the iterations of this loop are data-independent with respect to 
| each other. This allows the implementation to generate code to execute 
| the iterations in parallel with no synchronization.
| In a parallel construct, the independent clause is implied on all loop 
| directives without a seq clause.
| ...
|
| I think you're sort of asking if the seq clause has been implemented.
|
| openacc spec:
| ...
| 2.7.5 seq clause
| The seq clause specifies that the associated loop or loops are to be 
| executed sequentially by the accelerator. This clause will override any 
| automatic parallelization or vectorization.
| ...

Thanks, and right, I also realized that.  ;-)

Yet, my request is still to properly "document" this.  That is, if the
idea is that gcc/omp-low.c:scan_omp_for makes sure to emit a diagnostic
for the invalid combination of a gang/worker/vector clause together with
a seq clause, then in combination with the code newly added to
gcc/omp-low.c:find_omp_for_region_data to set region->independent = true
inside OpenACC parallel constructs, can't we then
assert(region->independent == true) in expand_omp_for_static_chunk and
expand_omp_for_static_nochunk?

So far it looks to me as if for OpenACC parallel constructs, we only have
a producer of region->independent = true, but no consumer.  Even if the
latter one is "implicit", we should "document" (using an assertion) this
in some way, for clarity.

While looking at that code, are we convinced that the diagnostic
machinery and subsequent handling will do the right thing in such cases:

    [...]
    #pragma acc loop [gang/worker/vector]
      for [...]
    #pragma acc loop seq
        for[...]

To me it looks (but I have not verified) that in such cases, the inner
region's ctx->gwv_this will have been initialized from the outer_ctx, so
to some combination of gang/worker/vector, and will that then be used to
parallelize the inner loop, which shouldn't be done, as I'm understanding
this?

It seems to be as if this gang/worker/vector/seq/independent clause
handling code that is currently distributed over
gcc/omp-low.c:scan_omp_for and gcc/omp-low.c:find_omp_for_region_data
(and, worse, also the front ends; see below) should be merged into one
place.

gcc/fortran/openmp.c:resolve_oacc_loop_blocks emits a diagnostic for seq
with independent; gcc/omp-low.c:scan_omp_for doesn't.  Should it?  Then,
why do we need this Fortran-specific checking code?  Should the
additional checking being done there get moved to OMP lowering?

In the C and C++ front ends, there's related checking code for those
clause attached to OpenACC routine constructs; not sure if that checking
should also be handled during OMP lowering, in one place?  I couldn't
find similar code in the Fortran front end (but didn't look very hard).

This is reminiscent of the discussion started in
<http://news.gmane.org/find-root.php?message_id=%3C87mw0zirnq.fsf%40schwinge.name%3E>
and following messages, about using
gcc/omp-low.c:check_omp_nesting_restrictions to do such checking (which
Cesar has not yet followed up on, as far as I know).

A few more points:

Does OMP_CLAUSE_INDEPENDENT need to be handled in
gcc/c-family/c-omp.c:c_oacc_split_loop_clauses?

While looking at that, ;-) again a few more things...  Are others clauses
missing to be handled there: tile, device_type (probably not; has all
been processed before?)?  Why is the firstprivate clause being duplicated
for loop_clauses/non_loop_clauses?  In OpenACC, there is no firstprivate
clause for loop constructs.  Why is the private clause being duplicated
for loop_clauses/non_loop_clauses?  My understanding is that the private
clause in a combined OpenACC parallel loop construct is to be handled as
if it were attached to the loop construct (... which is in contrast to
firstprivate -- yes OpenACC 2.0a is a bit assymetric in that regard).

What about the corresponding Fortran code,
gcc/fortran/trans-openmp.c:gfc_trans_oacc_combined_directive?

Do we have test cases to cover all this?


Grüße,
 Thomas

Attachment: pgpBi5kHnGJoP.pgp
Description: PGP signature

Reply via email to