On Wed, May 6, 2015 at 7:02 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, May 4, 2015 at 9:47 PM, Abderrazek Zaafrani > <az.zaafr...@gmail.com> wrote: >> This is an old thread and we are still running into similar issues: >> Code is not being vectorized on 64-bit target due to scev not being >> able to optimally analyze overflow condition. >> >> While the original test case shown here seems to work now, it does not >> work if the start value is not a constant and the loop index variable >> is of unsigned type: Ex >> >> void loop2( double const * __restrict__ x_in, double * __restrict__ >> x_out, double const * __restrict__ c, unsigned int N, unsigned int >> start) { >> for(unsigned int i=start; i!=N; ++i) >> x_out[i] = c[i]*x_in[i]; >> } >> >> Here is our unit test: >> >> int foo(int* A, int* B, unsigned start, unsigned B) >> { >> int s; >> for (unsigned k = start; k <start+B; k++) >> s += A[k] * B[k]; >> return s; >> } >> >> Our unit test case is extracted from a matrix multiply of a >> two-dimensional array and all loops are blocked by hand by a factor of >> B. Even though a bit modified, above loop corresponds to the innermost >> loop of the blocked matrix multiply. >> >> We worked on patch to solve the problem (see attachment.) >> The attached patch passed bootstrap and make check on x86_64-linux. >> Ok for trunk? > > Apart from coding style / API issues the case you handle is very special > (IVs with step 1 only?!) I believe it is also wrong - the assumption that > if there is a symbolic or constant expression for the number of iterations > a BIV will not wrap is not true. niter analysis can very well compute > the number of iterations for a loop with wrapping IVs. For your unit test > this only works because of the special-casing of step 1 IVs. I happen to look into similar issue right now. scev_probably_wraps_p and thus chrec_convert_1 should be improved using niter information. Actually all information (and the wrap behavior) has already been computed in tree-ssa-loop-niter.c. We just need to find a way to used it.
> > Technically it might be more interesting to compute wrapping of IVs > during niter analysis in some more generic way (we have iv->no_overflow > computed by simple_iv, but that is rather not useful here). For it iv->no_overflow is computed in simple_iv as below: tmp = analyze_scalar_evolution (use_loop, ev); ev = resolve_mixers (use_loop, tmp); if (folded_casts && tmp != ev) *folded_casts = true; It's inaccurate because calling resolve_mixers doesn't mean the result scev will wrap. resolve_mixers could have just done exact the same transformation as instantiate_parameters. Also chrec_convert_aggressive is incomplete and need to revised too. Thanks, bin > > Richard. > >> Thanks, >> Abderrazek Zaafrani