On Tue, Sep 22, 2020 at 4:31 AM HAO CHEN GUI via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Bin,
>
> I just tested your patch on current trunk.  Here is my summary.
>
> 1. About some iv aren't moved out of inner loop (Lijia mentioned in his
> last email)
>
>   <bb 13> [local count: 955630226]:
>    # l_32 = PHI <1(12), l_54(21)>
>    # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>    _26 = (integer(kind=8)) l_32;
>    _27 = _25 + _26;
>    y__I_lsm.119_136 = (*y_135(D))[_27];
>    y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>    _37 = _36 * stride.88_111;
>    _38 = _35 + _37;
>    _39 = _26 + _38;
>    _40 = (*a_137(D))[_39];
>
> The offset _39 is not loop independent as it relies on _26. But _38 and
> _37 should be loop independent. So Lijia thought they should be moved
> out of loop.
>
> I checked the following pass and found that these  statements are
> eliminated after vertorizing and dce.
>
> In vect dump,
>
> simple.F:27:23: note:  ------>vectorizing statement: _37 = _36 *
> stride.88_111;
> simple.F:27:23: note:  ------>vectorizing statement: _38 = _35 + _37;
> simple.F:27:23: note:  ------>vectorizing statement: _39 = _26 + _38;
> simple.F:27:23: note:  ------>vectorizing statement: _40 = (*a_137(D))[_39];
> simple.F:27:23: note:  transform statement.
> simple.F:27:23: note:  transform load. ncopies = 1
> simple.F:27:23: note:  create vector_type-pointer variable to type:
> vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
> simple.F:27:23: note:  created vectp_a.131_383
> simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM <vector(2)
> real(kind=8)> [(real(kind=8) *)vectp_a.130_376];
>
> In dce dump,
>
> Deleting : _39 = _26 + _38;
>
> Deleting : _38 = _35 + _37;
>
> Deleting : _37 = _36 * stride.88_111;
>
> So it's reasonable to only consider data reference after loop
> interchange. Other statements may be eliminated or be moved out of loop
> in last lim pass if they're real expensive.
>
> 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77%
> performance improvement with this patch. It has no impact on other
> benchmarks.
>
> 3. The patch passed bootstrapped and regression test on
> powerpc64le-linux-gnu.
>
> I think the patch works fine. Could you please add it into trunk? Thanks
> a lot.

I'd like to see us instead apply the existing LIM code (w/o store-motion)
to the loop nests we applied interchange to instead of adding yet another
invariant motion code.  It should be reasonably "easy" to make the code
have the complexity of the region it operates on (again, without store-motion
which would introduce some complexities).

Btw, with an extra LIM pass scheduled after interchange I observed ~20%
improvement on bwaves for x86_64, mainly due to better induction variable
selection IIRC.

Richard.

>
> On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
> >> Hi,
> >>
> >> I want to follow Lijia's work as I gained the performance benefit on
> >> some SPEC workloads by adding a im pass after loop interchange.  Could
> >> you send me the latest patches? I could do further testing. Thanks a lot.
> > Hi,
> > Hmm, not sure if this refers to me?  I only provided an example patch
> > (which isn't complete) before Lijia's.  Unfortunately I don't have any
> > latest patch about this either.
> > As Richard suggested, maybe you (if you work on this) can simplify the
> > implementation.  Anyway, we only need to hoist memory references here.
> >
> > Thanks,
> > bin
> >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
> >>

Reply via email to