Bin,

I just tested your patch on current trunk.  Here is my summary.

1. About some iv aren't moved out of inner loop (Lijia mentioned in his last email)

 <bb 13> [local count: 955630226]:
  # l_32 = PHI <1(12), l_54(21)>
  # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
  _26 = (integer(kind=8)) l_32;
  _27 = _25 + _26;
  y__I_lsm.119_136 = (*y_135(D))[_27];
  y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
  _37 = _36 * stride.88_111;
  _38 = _35 + _37;
  _39 = _26 + _38;
  _40 = (*a_137(D))[_39];

The offset _39 is not loop independent as it relies on _26. But _38 and _37 should be loop independent. So Lijia thought they should be moved out of loop.

I checked the following pass and found that these  statements are eliminated after vertorizing and dce.

In vect dump,

simple.F:27:23: note:  ------>vectorizing statement: _37 = _36 * stride.88_111;
simple.F:27:23: note:  ------>vectorizing statement: _38 = _35 + _37;
simple.F:27:23: note:  ------>vectorizing statement: _39 = _26 + _38;
simple.F:27:23: note:  ------>vectorizing statement: _40 = (*a_137(D))[_39];
simple.F:27:23: note:  transform statement.
simple.F:27:23: note:  transform load. ncopies = 1
simple.F:27:23: note:  create vector_type-pointer variable to type: vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
simple.F:27:23: note:  created vectp_a.131_383
simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM <vector(2) real(kind=8)> [(real(kind=8) *)vectp_a.130_376];

In dce dump,

Deleting : _39 = _26 + _38;

Deleting : _38 = _35 + _37;

Deleting : _37 = _36 * stride.88_111;

So it's reasonable to only consider data reference after loop interchange. Other statements may be eliminated or be moved out of loop in last lim pass if they're real expensive.

2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77% performance improvement with this patch. It has no impact on other benchmarks.

3. The patch passed bootstrapped and regression test on powerpc64le-linux-gnu.

I think the patch works fine. Could you please add it into trunk? Thanks a lot.


On 8/9/2020 下午 6:18, Bin.Cheng wrote:
On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
Hi,

I want to follow Lijia's work as I gained the performance benefit on
some SPEC workloads by adding a im pass after loop interchange.  Could
you send me the latest patches? I could do further testing. Thanks a lot.
Hi,
Hmm, not sure if this refers to me?  I only provided an example patch
(which isn't complete) before Lijia's.  Unfortunately I don't have any
latest patch about this either.
As Richard suggested, maybe you (if you work on this) can simplify the
implementation.  Anyway, we only need to hoist memory references here.

Thanks,
bin
https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html

Reply via email to