Am 20.11.19 um 21:45 schrieb Janne Blomqvist:
BTW, since this is done for the purpose of optimization, have you done testing on some suitable benchmark suite such as polyhedron, whether it a) generates any different code b) does it make it go faster?
I haven't run any actual benchmarks. However, there is a simple example which shows its advantages. Consider subroutine foo(n,m) m = 0 do 100 i=1,100 call bar m = m + n 100 continue end (I used old-style DO loops just because :-) Without the optimization, the inner loop is translated to .L2: xorl %eax, %eax call bar_ movl (%r12), %eax addl %eax, 0(%rbp) subl $1, %ebx jne .L2 and with the optimization to .L2: xorl %eax, %eax call bar_ addl %r12d, 0(%rbp) subl $1, %ebx jne .L2 so the load of the address is missing. (Why do we zero %eax before each call? It should not be a variadic call right?) Of course, Fortran language rules specify that the call to bar cannot do anything to n, but apparently we do not tell the gcc middle end that this is the case, or maybe that there is no way to really specify this. (Actually, I just tried out subroutine foo(n,m) integer :: dummy_n, dummy_m dummy_n = n dummy_m = 0 do 100 i=1,100 call bar dummy_m = dummy_m + dummy_n 100 continue m = dummy_m end This is optimized even further: .L2: xorl %eax, %eax call bar_ subl $1, %ebx jne .L2 imull $100, %r12d, %r12d So, a copy in / copy out for variables where we can not be sure that no value is assigned? Does anybody see a downside for that?)
Is there a risk of performance regressions due to higher register pressure?
I don't think so. Either the compiler realizes that it can keep the variable in a register (then it makes no difference), or it has to load it fresh from its address (then there is one additional register needed). Regards Thomas