Hi! On Tue, 14 Jul 2015 11:36:24 +0200, I wrote: > On Tue, 14 Jul 2015 13:46:04 +0800, Chung-Lin Tang <clt...@codesourcery.com> > wrote: > > this patch provides a 'bool independent' field in struct loop, which > > will be switched on by an "independent" clause in a #pragma acc loop > > directive. > > Thanks! > > > This patch has been developed in context of OpenACC kernels constructs, > but, is there anything still to be done regarding OpenACC parallel > constructs? That is, are we currently *using* the "independent yes/no" > information appropriately for these?
Tom mentioned: | openacc spec: | ... | 2.7.9 independent clause | In a kernels construct, the independent clause tells the implementation | that the iterations of this loop are data-independent with respect to | each other. This allows the implementation to generate code to execute | the iterations in parallel with no synchronization. | In a parallel construct, the independent clause is implied on all loop | directives without a seq clause. | ... | | I think you're sort of asking if the seq clause has been implemented. | | openacc spec: | ... | 2.7.5 seq clause | The seq clause specifies that the associated loop or loops are to be | executed sequentially by the accelerator. This clause will override any | automatic parallelization or vectorization. | ... Thanks, and right, I also realized that. ;-) Yet, my request is still to properly "document" this. That is, if the idea is that gcc/omp-low.c:scan_omp_for makes sure to emit a diagnostic for the invalid combination of a gang/worker/vector clause together with a seq clause, then in combination with the code newly added to gcc/omp-low.c:find_omp_for_region_data to set region->independent = true inside OpenACC parallel constructs, can't we then assert(region->independent == true) in expand_omp_for_static_chunk and expand_omp_for_static_nochunk? So far it looks to me as if for OpenACC parallel constructs, we only have a producer of region->independent = true, but no consumer. Even if the latter one is "implicit", we should "document" (using an assertion) this in some way, for clarity. While looking at that code, are we convinced that the diagnostic machinery and subsequent handling will do the right thing in such cases: [...] #pragma acc loop [gang/worker/vector] for [...] #pragma acc loop seq for[...] To me it looks (but I have not verified) that in such cases, the inner region's ctx->gwv_this will have been initialized from the outer_ctx, so to some combination of gang/worker/vector, and will that then be used to parallelize the inner loop, which shouldn't be done, as I'm understanding this? It seems to be as if this gang/worker/vector/seq/independent clause handling code that is currently distributed over gcc/omp-low.c:scan_omp_for and gcc/omp-low.c:find_omp_for_region_data (and, worse, also the front ends; see below) should be merged into one place. gcc/fortran/openmp.c:resolve_oacc_loop_blocks emits a diagnostic for seq with independent; gcc/omp-low.c:scan_omp_for doesn't. Should it? Then, why do we need this Fortran-specific checking code? Should the additional checking being done there get moved to OMP lowering? In the C and C++ front ends, there's related checking code for those clause attached to OpenACC routine constructs; not sure if that checking should also be handled during OMP lowering, in one place? I couldn't find similar code in the Fortran front end (but didn't look very hard). This is reminiscent of the discussion started in <http://news.gmane.org/find-root.php?message_id=%3C87mw0zirnq.fsf%40schwinge.name%3E> and following messages, about using gcc/omp-low.c:check_omp_nesting_restrictions to do such checking (which Cesar has not yet followed up on, as far as I know). A few more points: Does OMP_CLAUSE_INDEPENDENT need to be handled in gcc/c-family/c-omp.c:c_oacc_split_loop_clauses? While looking at that, ;-) again a few more things... Are others clauses missing to be handled there: tile, device_type (probably not; has all been processed before?)? Why is the firstprivate clause being duplicated for loop_clauses/non_loop_clauses? In OpenACC, there is no firstprivate clause for loop constructs. Why is the private clause being duplicated for loop_clauses/non_loop_clauses? My understanding is that the private clause in a combined OpenACC parallel loop construct is to be handled as if it were attached to the loop construct (... which is in contrast to firstprivate -- yes OpenACC 2.0a is a bit assymetric in that regard). What about the corresponding Fortran code, gcc/fortran/trans-openmp.c:gfc_trans_oacc_combined_directive? Do we have test cases to cover all this? Grüße, Thomas
pgpBi5kHnGJoP.pgp
Description: PGP signature