On Tue, Sep 22, 2020 at 10:30 AM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
>
> Bin,
>
> I just tested your patch on current trunk.  Here is my summary.
>
> 1. About some iv aren't moved out of inner loop (Lijia mentioned in his
> last email)
>
>   <bb 13> [local count: 955630226]:
>    # l_32 = PHI <1(12), l_54(21)>
>    # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>    _26 = (integer(kind=8)) l_32;
>    _27 = _25 + _26;
>    y__I_lsm.119_136 = (*y_135(D))[_27];
>    y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>    _37 = _36 * stride.88_111;
>    _38 = _35 + _37;
>    _39 = _26 + _38;
>    _40 = (*a_137(D))[_39];
>
> The offset _39 is not loop independent as it relies on _26. But _38 and
> _37 should be loop independent. So Lijia thought they should be moved
> out of loop.
>
> I checked the following pass and found that these  statements are
> eliminated after vertorizing and dce.
>
> In vect dump,
>
> simple.F:27:23: note:  ------>vectorizing statement: _37 = _36 *
> stride.88_111;
> simple.F:27:23: note:  ------>vectorizing statement: _38 = _35 + _37;
> simple.F:27:23: note:  ------>vectorizing statement: _39 = _26 + _38;
> simple.F:27:23: note:  ------>vectorizing statement: _40 = (*a_137(D))[_39];
> simple.F:27:23: note:  transform statement.
> simple.F:27:23: note:  transform load. ncopies = 1
> simple.F:27:23: note:  create vector_type-pointer variable to type:
> vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
> simple.F:27:23: note:  created vectp_a.131_383
> simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM <vector(2)
> real(kind=8)> [(real(kind=8) *)vectp_a.130_376];
>
> In dce dump,
>
> Deleting : _39 = _26 + _38;
>
> Deleting : _38 = _35 + _37;
>
> Deleting : _37 = _36 * stride.88_111;
>
> So it's reasonable to only consider data reference after loop
> interchange. Other statements may be eliminated or be moved out of loop
> in last lim pass if they're real expensive.
>
> 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77%
> performance improvement with this patch. It has no impact on other
> benchmarks.
>
> 3. The patch passed bootstrapped and regression test on
> powerpc64le-linux-gnu.
>
> I think the patch works fine. Could you please add it into trunk? Thanks
> a lot.
Hmm, IIRC the patch was intended to show what the missing transform
is, and I think it has latent bugs which I haven't got time to refine.
As Richard mentioned, could you please explore this with the existing
LIM facility, rather than introducing new code implementing existing
transforms?

Thanks,
bin
>
>
> On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
> >> Hi,
> >>
> >> I want to follow Lijia's work as I gained the performance benefit on
> >> some SPEC workloads by adding a im pass after loop interchange.  Could
> >> you send me the latest patches? I could do further testing. Thanks a lot.
> > Hi,
> > Hmm, not sure if this refers to me?  I only provided an example patch
> > (which isn't complete) before Lijia's.  Unfortunately I don't have any
> > latest patch about this either.
> > As Richard suggested, maybe you (if you work on this) can simplify the
> > implementation.  Anyway, we only need to hoist memory references here.
> >
> > Thanks,
> > bin
> >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
> >>

Reply via email to