On Tue, Sep 22, 2020 at 10:30 AM HAO CHEN GUI <guih...@linux.ibm.com> wrote: > > Bin, > > I just tested your patch on current trunk. Here is my summary. > > 1. About some iv aren't moved out of inner loop (Lijia mentioned in his > last email) > > <bb 13> [local count: 955630226]: > # l_32 = PHI <1(12), l_54(21)> > # ivtmp_165 = PHI <_446(12), ivtmp_155(21)> > _26 = (integer(kind=8)) l_32; > _27 = _25 + _26; > y__I_lsm.119_136 = (*y_135(D))[_27]; > y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0; > _37 = _36 * stride.88_111; > _38 = _35 + _37; > _39 = _26 + _38; > _40 = (*a_137(D))[_39]; > > The offset _39 is not loop independent as it relies on _26. But _38 and > _37 should be loop independent. So Lijia thought they should be moved > out of loop. > > I checked the following pass and found that these statements are > eliminated after vertorizing and dce. > > In vect dump, > > simple.F:27:23: note: ------>vectorizing statement: _37 = _36 * > stride.88_111; > simple.F:27:23: note: ------>vectorizing statement: _38 = _35 + _37; > simple.F:27:23: note: ------>vectorizing statement: _39 = _26 + _38; > simple.F:27:23: note: ------>vectorizing statement: _40 = (*a_137(D))[_39]; > simple.F:27:23: note: transform statement. > simple.F:27:23: note: transform load. ncopies = 1 > simple.F:27:23: note: create vector_type-pointer variable to type: > vector(2) real(kind=8) vectorizing an array ref: (*a_137(D)) > simple.F:27:23: note: created vectp_a.131_383 > simple.F:27:23: note: add new stmt: vect__40.132_374 = MEM <vector(2) > real(kind=8)> [(real(kind=8) *)vectp_a.130_376]; > > In dce dump, > > Deleting : _39 = _26 + _38; > > Deleting : _38 = _35 + _37; > > Deleting : _37 = _36 * stride.88_111; > > So it's reasonable to only consider data reference after loop > interchange. Other statements may be eliminated or be moved out of loop > in last lim pass if they're real expensive. > > 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77% > performance improvement with this patch. It has no impact on other > benchmarks. > > 3. The patch passed bootstrapped and regression test on > powerpc64le-linux-gnu. > > I think the patch works fine. Could you please add it into trunk? Thanks > a lot. Hmm, IIRC the patch was intended to show what the missing transform is, and I think it has latent bugs which I haven't got time to refine. As Richard mentioned, could you please explore this with the existing LIM facility, rather than introducing new code implementing existing transforms?
Thanks, bin > > > On 8/9/2020 下午 6:18, Bin.Cheng wrote: > > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote: > >> Hi, > >> > >> I want to follow Lijia's work as I gained the performance benefit on > >> some SPEC workloads by adding a im pass after loop interchange. Could > >> you send me the latest patches? I could do further testing. Thanks a lot. > > Hi, > > Hmm, not sure if this refers to me? I only provided an example patch > > (which isn't complete) before Lijia's. Unfortunately I don't have any > > latest patch about this either. > > As Richard suggested, maybe you (if you work on this) can simplify the > > implementation. Anyway, we only need to hoist memory references here. > > > > Thanks, > > bin > >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html > >>