Hi, As we discussed in the thread https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, I'm working to teach IVOPTs to consider D-form group access during unrolling. The difference on D-form and other forms during unrolling is we can put the stride into displacement field to avoid additional step increment. eg:
With X-form (uf step increment): ... LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride ... With D-form (one step increment for each base): ... LD A = baseA, OFF LD B = baseB, OFF ST C = baseC, OFF LD A = baseA, OFF+stride LD B = baseB, OFF+stride ST C = baseC, OFF+stride LD A = baseA, OFF+2*stride LD B = baseB, OFF+2*stride ST C = baseC, OFF+2*stride ... baseA += stride * uf baseB += stride * uf baseC += stride * uf Imagining that if the loop get unrolled by 8 times, then 3 step updates with D-form vs. 8 step updates with X-form. Here we only need to check stride meet D-form field requirement, since if OFF doesn't meet, we can construct baseA' with baseA + OFF. This patch set consists four parts: [PATCH 1/4 GCC11] Add middle-end unroll factor estimation Add unroll factor estimation in middle-end. It mainly refers to current RTL unroll factor determination in function decide_unrolling and its sub calls. As Richard B. suggested, we probably can force unroll factor with this and avoid duplicate unroll factor calculation, but I think it need more benchmarking work and should be handled separately. [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p Add one target hook to determine whether the current memory access with the given mode, stride and other flags have available D-form supports. [PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling Teach IVOPTs to identify address type iv group with D-form preferred, and flag dform_p of their derived iv cands. Considering unroll factor, increase iv cost with (uf - 1) * cost_step if it's not a dform iv cand. [PATCH 4/4 GCC11] rs6000: P9 D-form test cases Add some test cases, mainly copied from Kelvin's patch. Bootstrapped and regress tested on powerpc64le-linux-gnu. I'll take two weeks leave soon, please expect late responses. Thanks a lot in advance! BR, Kewen ------------ gcc/cfgloop.h | 3 + gcc/config/rs6000/rs6000.c | 56 ++++++++++++++++- gcc/doc/tm.texi | 14 +++++ gcc/doc/tm.texi.in | 4 ++ gcc/target.def | 21 ++++++- gcc/testsuite/gcc.target/powerpc/p9-dform-0.c | 43 +++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-dform-1.c | 55 +++++++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-dform-2.c | 12 ++++ gcc/testsuite/gcc.target/powerpc/p9-dform-3.c | 15 +++++ gcc/testsuite/gcc.target/powerpc/p9-dform-4.c | 12 ++++ gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h | 34 +++++++++++ gcc/tree-ssa-loop-ivopts.c | 84 +++++++++++++++++++++++++- gcc/tree-ssa-loop-manip.c | 254 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gcc/tree-ssa-loop-manip.h | 3 +- gcc/tree-ssa-loop.c | 33 ++++++++++ gcc/tree-ssa-loop.h | 2 + 16 files changed, 640 insertions(+), 5 deletions(-)