On Tue, Aug 9, 2016 at 2:05 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > I checked that this move helps. > Does it mean that I've got approval to integrate it to trunk?
Yes, if it survives bootstrap & regtest. Richard. > 2016-08-09 14:33 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Tue, Aug 9, 2016 at 1:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> The patch proposed by you does not work properly for >>> g++.dg/vect/pr70729-nest.cc test since the reference for this->S_n has >>> been cached as dependent for outer loop and loop is not vectorized: >>> >>> g++ -Ofast -fopenmp -mavx2 pr70729-nest.cc -c >>> -fdump-tree-vect-details >>> grep 'LOOP VECTORIZED' pr70729-nest.cc.149t.vect >>> <not found> >>> >>> You missed additional check I added before check on cached dependence. >> >> Ok, but it should get the correctness right? >> >> I suppose that if you move the cache checks inside the else clause it >> would work? >> I'd be ok with that change. >> >> Richard. >> >>> 2016-08-09 13:00 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Tue, Aug 9, 2016 at 11:20 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Yes it is impossible since all basic blocks are handled from outer >>>>> loops to innermost so we can not have the sequence with wrong >>>>> dependence, i.e. we cached that reference is independent (due to >>>>> safelen) but the same reference in outer loop must be evaluated as >>>>> dependent. So we must re-evaluate only dependent references in loops >>>>> having non-zero safelen attribute. >>>> >>>> Hmm. I don't like depending on this implementation detail. Does the >>>> attached patch work >>>> which simply avoids any positive/negative caching on safelen affected >>>> refs? It also makes >>>> the query cheaper by avoiding the dive into child loops. >>>> >>>> Richard. >>>> >>>>> 2016-08-09 11:44 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Fri, Aug 5, 2016 at 4:57 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> I added additional check before caching dependencies since (1) all >>>>>>> statements in loop are handled in loop postorder order, i.e. form >>>>>>> outer to inner; (2) we can change dependency for reference in subloops >>>>>>> which have non-zero safelen attribute. So I propose to re-evaluate it >>>>>>> in such cases. I don't see why we need to avoid dependence caching for >>>>>>> all loop nests since pragma omp simd is used very rarely. >>>>>> >>>>>> You think it is impossible to construct a testcase which hits the >>>>>> correctness issue? >>>>>> "very rarely" is not a good argument to generate wrong code. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> 2016-08-05 16:50 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Fri, Aug 5, 2016 at 3:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Here is updated patch which implements your proposal - I pass loop >>>>>>>>> instead of stmt to determine either REF is defined inside LOOP nest or >>>>>>>>> not. I checked that for pr70729-nest.cc the reference this->S_n for >>>>>>>>> statements which are out of innermost loop are not considered as >>>>>>>>> independent as you pointed out. >>>>>>>>> >>>>>>>>> Regression testing did not show any new failures and both failed tests >>>>>>>>> from libgomp.fortran suite now passed. >>>>>>>>> >>>>>>>>> Is it OK for trunk? >>>>>>>> >>>>>>>> I don't quite understand >>>>>>>> >>>>>>>> + /* Ignore dependence for loops having greater safelen. */ >>>>>>>> + if (new_safelen == safelen >>>>>>>> + && bitmap_bit_p (&ref->dep_loop, LOOP_DEP_BIT (loop->num, >>>>>>>> stored_p))) >>>>>>>> return false; >>>>>>>> >>>>>>>> this seems to suggest (correctly I think) that we cannot rely on the >>>>>>>> caching >>>>>>>> for safelen, neither for optimal results (you seem to address that) >>>>>>>> but also >>>>>>>> not for correctness (we cache the no-dep result from a safelen run and >>>>>>>> then happily re-use that info for a ref that is not safe for safelen). >>>>>>>> >>>>>>>> It seems to me we need to avoid any caching if we made things >>>>>>>> independent >>>>>>>> because of safelen and simply not do the dep test afterwards. this >>>>>>>> means >>>>>>>> inlining ref_indep_loop_p_1 partly into _2 (not sure if there's a >>>>>>>> great way >>>>>>>> to do this w/o confusing the control flow). >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> ChangeLog: >>>>>>>>> 2016-08-05 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> PR tree-optimization/71734 >>>>>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p): Add new argument REF_LOOP. >>>>>>>>> (outermost_indep_loop): Pass LOOP argumnet where REF was defined to >>>>>>>>> ref_indep_loop_p. >>>>>>>>> (ref_indep_loop_p_1): Fix commentary. >>>>>>>>> (ref_indep_loop_p_2): Add additional argument REF_LOOP, introduce new >>>>>>>>> variable NEW_SAFELEN which may have new value for SAFELEN, ignore >>>>>>>>> dependencde for loop having greater safelen value, pass REF_LOOP in >>>>>>>>> recursive call. >>>>>>>>> (can_sm_ref_p): Pass LOOP as additional argument to ref_indep_loop_p. >>>>>>>>> >>>>>>>>> 2016-08-03 16:44 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Fri, Jul 29, 2016 at 4:00 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Hi Richard. >>>>>>>>>>> >>>>>>>>>>> It turned out that the fix proposed by you does not work for liggomp >>>>>>>>>>> tests simd3 and simd4. >>>>>>>>>>> The reason is that we can't change safelen value for references not >>>>>>>>>>> defined inside loop. So I add missed check on it to patch. >>>>>>>>>>> Is it OK for trunk? >>>>>>>>>> >>>>>>>>>> Hmm, I don't like the walk of all subloops in ref_defined_in_loop_p >>>>>>>>>> as >>>>>>>>>> that operation can end up being quadratic in the loop depth/width. >>>>>>>>>> >>>>>>>>>> But I also wonder about correctness given that LIM "commons" >>>>>>>>>> references. So we can have >>>>>>>>>> >>>>>>>>>> for (;;) >>>>>>>>>> .. = ref; (1) >>>>>>>>>> for (;;) // safelen == 2 (2) >>>>>>>>>> ... = ref; >>>>>>>>>> >>>>>>>>>> and when looking at the ref at (1) which according to you should not >>>>>>>>>> have safelen applied your function will happily return that ref is >>>>>>>>>> defined >>>>>>>>>> in the inner loop. >>>>>>>>>> >>>>>>>>>> So it looks like to be able to apply safelen the caller of >>>>>>>>>> ref_indep_loop_p >>>>>>>>>> needs to pass down a ref plus a location (a stmt). In which case >>>>>>>>>> your >>>>>>>>>> function can simply use flow_loop_nested_p (loop, gimple_bb >>>>>>>>>> (stmt)->loop_father); >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> ChangeLog: >>>>>>>>>>> 2016-07-29 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> PR tree-optimization/71734 >>>>>>>>>>> * tree-ssa-loop-im.c (ref_defined_in_loop_p): New function. >>>>>>>>>>> (ref_indep_loop_p_2): Change SAFELEN value for REF defined inside >>>>>>>>>>> LOOP. >>>>>>>>>>> >>>>>>>>>>> 2016-07-29 13:08 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>>>>> Sorry H.J. >>>>>>>>>>>> >>>>>>>>>>>> I checked both these tests manually but forgot to pass "-fopenmp" >>>>>>>>>>>> option. >>>>>>>>>>>> I will fix the issue asap. >>>>>>>>>>>> >>>>>>>>>>>> 2016-07-29 0:33 GMT+03:00 H.J. Lu <hjl.to...@gmail.com>: >>>>>>>>>>>>> On Thu, Jul 28, 2016 at 6:49 AM, Yuri Rumyantsev >>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I prepare a patch which is based on yours. New test is also >>>>>>>>>>>>>> included. >>>>>>>>>>>>>> Bootstrapping and regression testing did not show any new >>>>>>>>>>>>>> failures. >>>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>> 2016-07-28 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> PR tree-optimization/71734 >>>>>>>>>>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Pass value of safelen >>>>>>>>>>>>>> attribute instead of REF_LOOP and use it. >>>>>>>>>>>>>> (ref_indep_loop_p_2): Use SAFELEN argument instead of REF_LOOP >>>>>>>>>>>>>> and >>>>>>>>>>>>>> set it for Loops having non-zero safelen attribute. >>>>>>>>>>>>>> (ref_indep_loop_p): Pass zero as initial value for safelen. >>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>> * g++.dg/vect/pr70729-nest.cc: New test. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Does this cause >>>>>>>>>>>>> >>>>>>>>>>>>> FAIL: libgomp.fortran/pr71734-1.f90 -O3 -fomit-frame-pointer >>>>>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>>>>> test >>>>>>>>>>>>> FAIL: libgomp.fortran/pr71734-1.f90 -O3 -g execution test >>>>>>>>>>>>> FAIL: libgomp.fortran/pr71734-2.f90 -O3 -fomit-frame-pointer >>>>>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>>>>> test >>>>>>>>>>>>> FAIL: libgomp.fortran/pr71734-2.f90 -O3 -g execution test >>>>>>>>>>>>> >>>>>>>>>>>>> on AVX machines and >>>>>>>>>>>>> >>>>>>>>>>>>> FAIL: libgomp.fortran/simd3.f90 -O3 -fomit-frame-pointer >>>>>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>>>>> test >>>>>>>>>>>>> FAIL: libgomp.fortran/simd3.f90 -O3 -g execution test >>>>>>>>>>>>> FAIL: libgomp.fortran/simd4.f90 -O3 -fomit-frame-pointer >>>>>>>>>>>>> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >>>>>>>>>>>>> test >>>>>>>>>>>>> FAIL: libgomp.fortran/simd4.f90 -O3 -g execution test >>>>>>>>>>>>> >>>>>>>>>>>>> on non-AVX machines? >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> H.J.