On Wed, Nov 20, 2019 at 11:35 PM Thomas König <t...@tkoenig.net> wrote: > > Am 20.11.19 um 21:45 schrieb Janne Blomqvist: > > BTW, since this is done for the purpose of optimization, have you done > > testing on some suitable benchmark suite such as polyhedron, whether > > it a) generates any different code b) does it make it go faster? > > I haven't run any actual benchmarks. > > However, there is a simple example which shows its advantages. > Consider > > subroutine foo(n,m) > m = 0 > do 100 i=1,100 > call bar > m = m + n > 100 continue > end > > (I used old-style DO loops just because :-) > > Without the optimization, the inner loop is translated to > > .L2: > xorl %eax, %eax > call bar_ > movl (%r12), %eax > addl %eax, 0(%rbp) > subl $1, %ebx > jne .L2 > > and with the optimization to > > .L2: > xorl %eax, %eax > call bar_ > addl %r12d, 0(%rbp) > subl $1, %ebx > jne .L2 > > so the load of the address is missing. (Why do we zero %eax > before each call? It should not be a variadic call right?)
Not sure. Maybe some belt and suspenders thing? I guess someone better versed in ABI minutiae knows better. It's not Fortran-specific though, the C frontend does the same when calling a void function. AFAIK on reasonably current OoO CPU's xor'ing a register with itself is handled by the renamer and doesn't consume an execute slot, so it's in effect a zero-cycle instruction. Still bloats the code slightly, though. > Of course, Fortran language rules specify that the call to bar > cannot do anything to n Hmm, does it? What about the following modification to your testcase: module nmod integer :: n end module nmod subroutine foo(n,m) m = 0 do 100 i=1,100 call bar m = m + n 100 continue end subroutine foo subroutine bar() use nmod n = 0 end subroutine bar program main use nmod implicit none integer :: m n = 1 m = 0 call foo(n, m) print *, m end program main > So, a copy in / copy out for variables where we can not be sure that > no value is assigned? Does anybody see a downside for that?) In principle sounds good, unless my concerns above are real and affect this case too. > > Is there a risk of performance regressions due to higher register pressure? > > I don't think so. Either the compiler realizes that it can > keep the variable in a register (then it makes no difference), > or it has to load it fresh from its address (then there is > one additional register needed). Yes, true. Good point. -- Janne Blomqvist