On Wed, 27 Nov 2013, Jakub Jelinek wrote: > On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote: > > Hmm. I'm still thinking that we should handle this during the regular > > transform step. > > I wonder if it can't be done instead just in vectorizable_load, > if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is > invariant, just emit the (broadcasted) load not inside of the loop, but on > the loop preheader edge.
It is safe even for !LOOP_REQUIRES_VERSIONING_FOR_ALIAS. It's just a missed optimization I even noted when originally implementing support for invariant loads ... > > *** gcc/tree-vect-data-refs.c (revision 205435) > > --- gcc/tree-vect-data-refs.c (working copy) > > *************** again: > > *** 3668,3673 **** > > --- 3668,3682 ---- > > } > > STMT_VINFO_STRIDE_LOAD_P (stmt_info) = true; > > } > > + else if (loop_vinfo > > + && integer_zerop (DR_STEP (dr))) > > + { > > + /* All loads from a non-varying address will be disambiguated > > + by data-ref analysis or via a runtime alias check and thus > > + they will become invariant. Force them to be vectorized > > + as external. */ > > + STMT_VINFO_DEF_TYPE (stmt_info) = vect_external_def; > > + } > > I think this is unsafe for simd loops. > I'd say: > int a[1024], b[1024]; > > int foo (void) > { > int i; > #pragma omp simd safelen(8) > for (i = 0; i < 1024; i++) > { > a[i] = i; > b[i] = a[0]; > } > } > > is valid testcase, the loop behaves the same if you execute it > sequentially, or vectorize using SIMD (hardware or emulated) instructions > with vectorization factor of 2, 4 or 8, as long as you do all memory > operations (either using scalar insns or simd instructions) in the order > they were written, which I believe the vectorizer right now handles > correctly, but the hoisting this patch wants to perform is not fine, > unless data ref analysis would prove that it can't alias. For non-simd > loops we of course perform that data ref analysis and either version for > alias, or prove that the drs can't alias, but for simd loops we take as > given that the loop is safe to be vectorized. It is, but not for hoisting. Ick. I hate this behind-the-back stuff - so safelen doesn't mean that a[i] and a[0] do not alias. Note that this will break with SLP stuff at least as that will re-order reads/writes. Not sure how safelen applies to SLP though. That is a[i] = i; b[i] = a[0]; a[i+1] = i+1; b[i+1] = a[1]; will eventually end up re-ordering reads/writes in non-obvious ways. > So, the above say with emulated SIMD is safe to be executed as: > for (i = 0; i < 1024; i += 8) > { > int tmp; > for (tmp = i; tmp < i + 8; tmp++) > a[tmp] = tmp; > for (tmp = i; tmp < i + 8; tmp++) > b[tmp] = a[0]; > } > but not as: > int tempa[8], tmp; > /* Hoisted HW or emulated load + splat. */ > for (tmp = 0; tmp < 8; tmp++) > tempa[tmp] = a[0]; > for (i = 0; i < 1024; i += 8) > { > for (tmp = i; tmp < i + 8; tmp++) > a[tmp] = tmp; > for (tmp = i; tmp < i + 8; tmp++) > b[tmp] = tempa[tmp]; > } > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer