On Tue, Jul 14, 2015 at 05:35:28PM +0800, Chung-Lin Tang wrote:
> The wording of OpenACC independent is more simple:
> "... the independent clause tells the implementation that the iterations of 
> this loop
> are data-independent with respect to each other." -- OpenACC spec 2.7.9
> 
> I would say this implies even more relaxed conditions than OpenMP simd 
> safelen,
> essentially saying that the compiler doesn't even need dependence analysis; 
> just
> assume independence of iterations.

safelen is also saying that the compiler doesn't even need dependence
analysis.  It is just that only some transformations of the loop are ok
without dependence analysis, others need to be with dependence analysis.
Classical vectorization optimizations (instead of doing one iteration
at a time you can do up to safelen consecutive iterations together) for the
first statement in the loop, then second statement, etc. are ok without
dependence analysis, but e.g. reversing the loop and running first the last
iteration and so on up to first, or running the iterations in random orders
is not ok.

> > So if OpenACC independent means there are no dependencies in between
> > iterations, the OpenMP counterpart here is #pragma omp for simd schedule 
> > (auto)
> > or #pragma omp distribute parallel for simd schedule (auto).
> 
> schedule(auto) appears to correspond to the OpenACC 'auto' clause, or
> what is implied in a kernels compute construct, but I'm not sure it implies
> no dependencies between iterations?

By the schedule(auto) I meant that the user tells the compiler it can
parallelize the loop with whatever schedule it wants.  Other schedules are
quite well defined, if the team has that many threads, which of the thread
gets which iteration, so user could rely on a particular parallelization and
the loop iterations still could not be 100% independent.  With
schedule(auto) you say it is up to the compiler to schedule them, thus they
really have to be all independent.

> Putting aside the semantic issues, as of currently safelen>0 turns on a 
> certain amount of
> vectorization code that we are not currently using (and not likely at all for 
> nvptx).
> Right now, we're just trying to pass the new flag to a kernels tree-parloops 
> based pass.

In any case, when setting your flag you should also set safelen = INT_MAX,
as the OpenACC independent implies that you can vectorize the loop with any
vectorization factor without performing dependency analysis on the loop.
OpenACC is (hopefully) not just about PTX and most other targets will want
to vectorize such loops.

        Jakub

Reply via email to