https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-05-11
             Status|UNCONFIRMED                 |NEW

--- Comment #8 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
The patch which caused the problem was

commit 6d099a76a0f6a040a3e678f2bce7fc69cc3257d8
Author: Jiufu Guo <guoji...@linux.ibm.com>
Date:   Mon Oct 28 05:23:24 2019 +0000

    rs6000: Enable limited unrolling at -O2

    In PR88760, there are a few disscussion about improve or tune unroller for
    targets. And we would agree to enable unroller for small loops at O2 first.
    And we could see performance improvement(~10%) for below code:
    ```
      subroutine foo (i, i1, block)
        integer :: i, i1
        integer :: block(9, 9, 9)
        block(i:9,1,i1) = block(i:9,1,i1) - 10
      end subroutine foo

    ```
    This kind of code occurs a few times in exchange2 benchmark.

    Similar C code:
    ```
      for (i = 0; i < n; i++)
        arr[i] = arr[i] - 10;
    ```

    On powerpcle, for O2 , enable -funroll-loops and limit
    PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
    overall improvement for SPEC2017.

    This patch is only for rs6000 in which we see visible performance
improvement.

    gcc/
    2019-10-25  Jiufu Guo  <guoji...@linux.ibm.com>

        PR tree-optimization/88760
        * config/rs6000/rs6000-common.c (rs6000_option_optimization_table):
        Enable -funroll-loops for -O2 and above.
        * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
        PARAM_MAX_UNROLL_TIMES to 2 and PARAM_MAX_UNROLLED_INSNS to 20, and
        do not turn on web and rngreg implicitly, if the unroller is not
        explicitly enabled.

    gcc.testsuite/
    2019-10-25  Jiufu Guo  <guoji...@linux.ibm.com>

        PR tree-optimization/88760
        * gcc.target/powerpc/small-loop-unroll.c: New test.
        * c-c++-common/tsan/thread_leak2.c: Update test.
        * gcc.dg/pr59643.c: Update test.
        * gcc.target/powerpc/loop_align.c: Update test.
        * gcc.target/powerpc/ppc-fma-1.c: Update test.
        * gcc.target/powerpc/ppc-fma-2.c: Update test.
        * gcc.target/powerpc/ppc-fma-3.c: Update test.
        * gcc.target/powerpc/ppc-fma-4.c: Update test.
        * gcc.target/powerpc/pr78604.c: Update test.

    From-SVN: r277501

So, this patch seems to have exposed a problem with the unrolling in
general, or with the parameters for POWER.

Reply via email to