On Wed, 16 Oct 2013, Tobias Burnus wrote: > Frederic Riss wrote: > > Just one question. You describe the pragma in the doco patch as: > > > > +This pragma tells the compiler that the immediately following @code{for} > > +loop can be executed in any loop index order without affecting the result. > > +The pragma aids optimization and in particular vectorization as the > > +compiler can then assume a vectorization safelen of infinity. > > > > I'm not a specialist, but I was always told that the 'original' > > meaning of ivdep (which I believe was introduced by Cray), was that > > the compiler could assume that there are only forward dependencies in > > the loop, but not that it can be executed in any order. > > The nice thing about #pragma ivdep is that there is no real standard. And > the explanation of the different vendors is also not completely clear. > > > Some overview about this is given in the following file on pages 13-14 for > Cray Reaseach PVP, MIPSPRO & Open64, Intel ICC, Multiflow > http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/GREPS2007-Benoit.pdf > > That's summerized as: > - vector: ignore lexical upward dependencies (Cray PVP, Intel ICC) > - parallel: ignore loop-carried dependencies (MIPSPRO, Open64) > - liberal: ignore loop-variant dependencies (Multiflow) > > > The quotes for Cray and Intel are below. > > Cray: > http://docs.cray.com/books/004-2179-001/html-004-2179-001/brtlrwh.html#EKZ5MRWH > "The ivdep directive tells the compiler to ignore vector dependencies for > the loop immediately following the directive. Conditions other than vector > dependencies can inhibit vectorization. If these conditions are satisfactory, > the loop vectorizes. This directive is useful for some loops that contain > pointers and indirect addressing. The format of this directive is as follows: > #pragma _CRI ivdep"
Which suggests we use #pragma GCC ivdep to not collide with eventually different semantics in existing programs that use variants of this pragma? > Intel: > http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm > "The ivdep pragma instructs the compiler to ignore assumed vector > dependencies. > To ensure correct code, the compiler treats an assumed dependence as a proven > dependence, which prevents vectorization. This pragma overrides that > decision. > Use this pragma only when you know that the assumed loop dependencies are > safe > to ignore." This suggests that _known_ dependences are still treated as dependences. But what is known obviously depends on the implementation which may not know that a[i] and a[i+1] depend but merely assume it. Not a standard-proof definition of the pragma ;) That said, safelen even overrides know dependences (but with unknown distance vector)! (that looks like a bug to me, or at least a QOI issue) > > > The Intel docs give this example: > ... > > Given your description, this loop wouldn't be a candidate for ivdep, > > as reversing the loop index order changes the semantics. I believe > > that the way you interpret it (ie. setting vectorization safe length > > to INT_MAX) is correct with respect to this other definition, though. > > Do you have a suggestion for a better wording? My idea was to interpret > this part similar to OpenMP's simd with safelen=infinity. (Actually, I > believe loop->safelen was added for OpenMPv4's and/or Cilk Plus's "simd".) > > OpenMPv4.0, http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf , states > for this (excerpt from page 70): > "A SIMD loop has logical iterations numbered 0, 1,...,N-1 where N is the > number of loop iterations, and the logical numbering denotes the sequence > in which the iterations would be executed if the associated loop(s) were > executed with no SIMD instructions. If the safelen clause is used then no > two iterations executed concurrently with SIMD instructions can have a > greater distance in the logical iteration space than its value. The > parameter of the safelen clause must be a constant positive integer > expression. The number of iterations that are executed concurrently at > any given time is implementation defined. Each concurrent iteration will > be executed by a different SIMD lane. Each set of concurrent iterations > is a SIMD chunk." OTOH, if we are mapping ivdep to safelen why not simply allow #pragma GCC safelen 4 ? > > > Oh, and are there any plans to maintain this information in some way > > till the back-end? Software pipelining could be another huge winner > > for that kind of dependency analysis simplification. > > I don't know until when loop->safelen is kept. As it is late in the > middle-end, providing the backend with this information should be > simple. It's kept as long as we preserve loops which at the moment is until after RTL loop optimizations are finished. Extending this isn't hard, I just didn't see a reason to do that. Richard.