Richard, The patch proposed by you does not work properly for g++.dg/vect/pr70729-nest.cc test since the reference for this->S_n has been cached as dependent for outer loop and loop is not vectorized:
g++ -Ofast -fopenmp -mavx2 pr70729-nest.cc -c -fdump-tree-vect-details grep 'LOOP VECTORIZED' pr70729-nest.cc.149t.vect <not found> You missed additional check I added before check on cached dependence. 2016-08-09 13:00 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Tue, Aug 9, 2016 at 11:20 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Yes it is impossible since all basic blocks are handled from outer >> loops to innermost so we can not have the sequence with wrong >> dependence, i.e. we cached that reference is independent (due to >> safelen) but the same reference in outer loop must be evaluated as >> dependent. So we must re-evaluate only dependent references in loops >> having non-zero safelen attribute. > > Hmm. I don't like depending on this implementation detail. Does the > attached patch work > which simply avoids any positive/negative caching on safelen affected > refs? It also makes > the query cheaper by avoiding the dive into child loops. > > Richard. > >> 2016-08-09 11:44 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Fri, Aug 5, 2016 at 4:57 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> I added additional check before caching dependencies since (1) all >>>> statements in loop are handled in loop postorder order, i.e. form >>>> outer to inner; (2) we can change dependency for reference in subloops >>>> which have non-zero safelen attribute. So I propose to re-evaluate it >>>> in such cases. I don't see why we need to avoid dependence caching for >>>> all loop nests since pragma omp simd is used very rarely. >>> >>> You think it is impossible to construct a testcase which hits the >>> correctness issue? >>> "very rarely" is not a good argument to generate wrong code. >>> >>> Richard. >>> >>>> 2016-08-05 16:50 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Fri, Aug 5, 2016 at 3:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> Here is updated patch which implements your proposal - I pass loop >>>>>> instead of stmt to determine either REF is defined inside LOOP nest or >>>>>> not. I checked that for pr70729-nest.cc the reference this->S_n for >>>>>> statements which are out of innermost loop are not considered as >>>>>> independent as you pointed out. >>>>>> >>>>>> Regression testing did not show any new failures and both failed tests >>>>>> from libgomp.fortran suite now passed. >>>>>> >>>>>> Is it OK for trunk? >>>>> >>>>> I don't quite understand >>>>> >>>>> + /* Ignore dependence for loops having greater safelen. */ >>>>> + if (new_safelen == safelen >>>>> + && bitmap_bit_p (&ref->dep_loop, LOOP_DEP_BIT (loop->num, >>>>> stored_p))) >>>>> return false; >>>>> >>>>> this seems to suggest (correctly I think) that we cannot rely on the >>>>> caching >>>>> for safelen, neither for optimal results (you seem to address that) but >>>>> also >>>>> not for correctness (we cache the no-dep result from a safelen run and >>>>> then happily re-use that info for a ref that is not safe for safelen). >>>>> >>>>> It seems to me we need to avoid any caching if we made things independent >>>>> because of safelen and simply not do the dep test afterwards. this means >>>>> inlining ref_indep_loop_p_1 partly into _2 (not sure if there's a great >>>>> way >>>>> to do this w/o confusing the control flow). >>>>> >>>>> Richard. >>>>> >>>>>> ChangeLog: >>>>>> 2016-08-05 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> >>>>>> PR tree-optimization/71734 >>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p): Add new argument REF_LOOP. >>>>>> (outermost_indep_loop): Pass LOOP argumnet where REF was defined to >>>>>> ref_indep_loop_p. >>>>>> (ref_indep_loop_p_1): Fix commentary. >>>>>> (ref_indep_loop_p_2): Add additional argument REF_LOOP, introduce new >>>>>> variable NEW_SAFELEN which may have new value for SAFELEN, ignore >>>>>> dependencde for loop having greater safelen value, pass REF_LOOP in >>>>>> recursive call. >>>>>> (can_sm_ref_p): Pass LOOP as additional argument to ref_indep_loop_p. >>>>>> >>>>>> 2016-08-03 16:44 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Fri, Jul 29, 2016 at 4:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Hi Richard. >>>>>>>> >>>>>>>> It turned out that the fix proposed by you does not work for liggomp >>>>>>>> tests simd3 and simd4. >>>>>>>> The reason is that we can't change safelen value for references not >>>>>>>> defined inside loop. So I add missed check on it to patch. >>>>>>>> Is it OK for trunk? >>>>>>> >>>>>>> Hmm, I don't like the walk of all subloops in ref_defined_in_loop_p as >>>>>>> that operation can end up being quadratic in the loop depth/width. >>>>>>> >>>>>>> But I also wonder about correctness given that LIM "commons" >>>>>>> references. So we can have >>>>>>> >>>>>>> for (;;) >>>>>>> .. = ref; (1) >>>>>>> for (;;) // safelen == 2 (2) >>>>>>> ... = ref; >>>>>>> >>>>>>> and when looking at the ref at (1) which according to you should not >>>>>>> have safelen applied your function will happily return that ref is >>>>>>> defined >>>>>>> in the inner loop. >>>>>>> >>>>>>> So it looks like to be able to apply safelen the caller of >>>>>>> ref_indep_loop_p >>>>>>> needs to pass down a ref plus a location (a stmt). In which case your >>>>>>> function can simply use flow_loop_nested_p (loop, gimple_bb >>>>>>> (stmt)->loop_father); >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> ChangeLog: >>>>>>>> 2016-07-29 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> PR tree-optimization/71734 >>>>>>>> * tree-ssa-loop-im.c (ref_defined_in_loop_p): New function. >>>>>>>> (ref_indep_loop_p_2): Change SAFELEN value for REF defined inside LOOP. >>>>>>>> >>>>>>>> 2016-07-29 13:08 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>> Sorry H.J. >>>>>>>>> >>>>>>>>> I checked both these tests manually but forgot to pass "-fopenmp" >>>>>>>>> option. >>>>>>>>> I will fix the issue asap. >>>>>>>>> >>>>>>>>> 2016-07-29 0:33 GMT+03:00 H.J. Lu <hjl.to...@gmail.com>: >>>>>>>>>> On Thu, Jul 28, 2016 at 6:49 AM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> I prepare a patch which is based on yours. New test is also >>>>>>>>>>> included. >>>>>>>>>>> Bootstrapping and regression testing did not show any new failures. >>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> ChangeLog: >>>>>>>>>>> 2016-07-28 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> PR tree-optimization/71734 >>>>>>>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Pass value of safelen >>>>>>>>>>> attribute instead of REF_LOOP and use it. >>>>>>>>>>> (ref_indep_loop_p_2): Use SAFELEN argument instead of REF_LOOP and >>>>>>>>>>> set it for Loops having non-zero safelen attribute. >>>>>>>>>>> (ref_indep_loop_p): Pass zero as initial value for safelen. >>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>> * g++.dg/vect/pr70729-nest.cc: New test. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Does this cause >>>>>>>>>> >>>>>>>>>> FAIL: libgomp.fortran/pr71734-1.f90 -O3 -fomit-frame-pointer >>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>> test >>>>>>>>>> FAIL: libgomp.fortran/pr71734-1.f90 -O3 -g execution test >>>>>>>>>> FAIL: libgomp.fortran/pr71734-2.f90 -O3 -fomit-frame-pointer >>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>> test >>>>>>>>>> FAIL: libgomp.fortran/pr71734-2.f90 -O3 -g execution test >>>>>>>>>> >>>>>>>>>> on AVX machines and >>>>>>>>>> >>>>>>>>>> FAIL: libgomp.fortran/simd3.f90 -O3 -fomit-frame-pointer >>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>> test >>>>>>>>>> FAIL: libgomp.fortran/simd3.f90 -O3 -g execution test >>>>>>>>>> FAIL: libgomp.fortran/simd4.f90 -O3 -fomit-frame-pointer >>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>> test >>>>>>>>>> FAIL: libgomp.fortran/simd4.f90 -O3 -g execution test >>>>>>>>>> >>>>>>>>>> on non-AVX machines? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> H.J.