The way Intel present #pragma simd (to users, to the OpenMP committee, to the C 
and C++ committees, etc) is that it is not a hint, it has a meaning.
The meaning is defined in term of evaluation order.
Both C and C++ define an evaluation order for sequential programs. #pragma simd 
relaxes the sequential order into a partial order:
0. subsequent iterations of the loop are chunked together and execute in 
lockstep
1. there is no change in the order of evaluation of expression within an 
iteration
2. if X and Y are expressions in the loop, and X(i) is the evaluation of X in 
iteration i, then for X sequenced before Y and iteration i evaluated before 
iteration j, X(i) is sequenced before Y(j).

A corollary is that the sequential order is always allowed, since it satisfies 
the partial order.
However, the partial order allows the compiler to group copies of the same 
expression next to each other, and then to combine the scalar instructions into 
a vector instruction.
There are other corollaries, such as that if multiple loop iterations write 
into an object defined outside of the loop then it has to be an undefined 
behavior, the vector moral equivalent of a data race. That is what induction 
variables and reductions are necessary exception to this rule and require 
explicit support.

As far as correctness, by this definition, the programmer expressed that it is 
correct, and the compiler should not try to prove correctness. 

On performance heuristics side, the Intel compiler tries to not second guess 
the user. There are users who work much harder than just add a #pragma simd on 
unmodified sequential loops. There are various changes that may be necessary, 
and users who worked hard to get their loops in a good shape are unhappy if the 
compiler does second guess them.

Robert.

-----Original Message-----
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Renato 
Golin
Sent: Monday, February 17, 2014 7:14 AM
To: tpri...@computer.org
Cc: gcc
Subject: Re: Vectorizer Pragmas

On 17 February 2014 14:47, Tim Prince <n...@aol.com> wrote:
> I'm continuing discussions with former Intel colleagues.  If you are 
> asking for insight into how Intel priorities vary over time, I don't 
> expect much, unless the next beta compiler provides some inferences.  
> They have talked about implementing all of OpenMP 4.0 except user 
> defined reduction this year.  That would imply more activity in that 
> area than on cilkplus,

I'm expecting this. Any proposal to support Cilk in LLVM would be purely 
temporary and not endorsed in any way.


> although some fixes have come in the latter.  On the other hand I had 
> an issue on omp simd reduction(max: ) closed with the decision "will 
> not be fixed."

We still haven't got pragmas for induction/reduction logic, so I'm not too 
worried about them.


> I have an icc problem report in on fixing omp simd safelen so it is 
> more like the standard and less like the obsolete pragma simd vectorlength.

Our width metadata is slightly different in that it means "try to use that 
length", rather than "it's safe to use that length", this is why I'm holding on 
use safelen for the moment.


> Also, I have some problem reports active attempting to get 
> clarification of their omp target implementation.

Same here... RTFM is not enough in this case. ;)


> You may have noticed that omp parallel for simd in current Intel 
> compilers can be used for combined thread and simd parallelism, 
> including the case where the outer loop is parallelizable and 
> vectorizable but the inner one is not.

That's my fear of going with omp simd directly. I don't want to be throwing 
threads all over the place when all I really want is vector code.

For the time, my proposal is to use legacy pragmas: vector/novector, 
unroll/nounroll and simd vectorlength which map nicely to the metadata we 
already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up 
with simple non-threaded pragmas, we should use those and deprecate the legacy 
ones.

If GCC is trying to do the same thing regarding non-threaded-vector code, I'd 
be glad to be involved in the discussion. Some LLVM folks think this should be 
an OpenMP discussion, I personally think it's pushing the boundaries a bit too 
much on an inherently threaded library extension.

cheers,
--renato

Reply via email to