Am 20.11.19 um 21:45 schrieb Janne Blomqvist:
BTW, since this is done for the purpose of optimization, have you done
testing on some suitable benchmark suite such as polyhedron, whether
it a) generates any different code b) does it make it go faster?

I haven't run any actual benchmarks.

However, there is a simple example which shows its advantages.
Consider

      subroutine foo(n,m)
      m = 0
      do 100 i=1,100
        call bar
        m = m + n
 100  continue
      end

(I used old-style DO loops just because :-)

Without the optimization, the inner loop is translated to

.L2:
        xorl    %eax, %eax
        call    bar_
        movl    (%r12), %eax
        addl    %eax, 0(%rbp)
        subl    $1, %ebx
        jne     .L2

and with the optimization to

.L2:
        xorl    %eax, %eax
        call    bar_
        addl    %r12d, 0(%rbp)
        subl    $1, %ebx
        jne     .L2

so the load of the address is missing.  (Why do we zero %eax
before each call? It should not be a variadic call right?)

Of course, Fortran language rules specify that the call to bar
cannot do anything to n, but apparently we do not tell the gcc
middle end that this is the case, or maybe that there is
no way to really specify this.

(Actually, I just tried out

      subroutine foo(n,m)
      integer :: dummy_n, dummy_m
      dummy_n = n
      dummy_m = 0
      do 100 i=1,100
         call bar
         dummy_m = dummy_m + dummy_n
 100  continue
      m = dummy_m
      end

This is optimized even further:

.L2:
        xorl    %eax, %eax
        call    bar_
        subl    $1, %ebx
        jne     .L2
        imull   $100, %r12d, %r12d

So, a copy in / copy out for variables where we can not be sure that
no value is assigned?  Does anybody see a downside for that?)

Is there a risk of performance regressions due to higher register pressure?

I don't think so. Either the compiler realizes that it can
keep the variable in a register (then it makes no difference),
or it has to load it fresh from its address (then there is
one additional register needed).

Regards

        Thomas

Reply via email to