Re: [fpc-devel] Register allocation question
Am 10.04.2011 12:38, schrieb Sergei Gorelkin: > > By now I had run the test suite in x86_64-linux, without regressions. Feel free to commit it then. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
10.04.2011 00:49, Florian Klämpfl пишет: Am 09.04.2011 22:22, schrieb Sergei Gorelkin: 09.04.2011 23:10, Florian Klämpfl пишет: Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? I understand the concern, but it should be handled somehow already. If we consider a non-leaf function that is complex enough to consume all 14 registers, what difference does the order of allocation make? It is not needed to use all 14, but it might be more benefical to use those which are preserved across a function call. When making a call, it must know which registers will be destroyed and which won't, otherwise result will be wrong anyway. What I see confirms what I think: non-leaf functions continue to use rbx, rsi and rdi, not r8..r11. So the code for those does not change? Some do not change (that's why I was initially writing that it doesn't work), other change, but never to the worse. For example, if register was e.g. rbx but its live range was not intersecting the call, then it is replaced by volatile one like r8. If its live range intersects the call, it can be changed to other nonvolatile register like rdi. Likewise, registers within volatile group are interchanged. But I don't see it replacing nonvolatile register with volatile one if that would require adding spilling instructions. By now I had run the test suite in x86_64-linux, without regressions. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 22:22, schrieb Sergei Gorelkin: > 09.04.2011 23:10, Florian Klämpfl пишет: >> >> Problem is, this might hurt non leaf functions. Maybe the register >> allocators can be initialized differently for leave and non-leave >> functions? > > I understand the concern, but it should be handled somehow already. If > we consider a non-leaf function that is complex enough to consume all 14 > registers, what difference does the order of allocation make? It is not needed to use all 14, but it might be more benefical to use those which are preserved across a function call. > When > making a call, it must know which registers will be destroyed and which > won't, otherwise result will be wrong anyway. > What I see confirms what I think: non-leaf functions continue to use > rbx, rsi and rdi, not r8..r11. So the code for those does not change? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
09.04.2011 23:10, Florian Klämpfl пишет: Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? I understand the concern, but it should be handled somehow already. If we consider a non-leaf function that is complex enough to consume all 14 registers, what difference does the order of allocation make? When making a call, it must know which registers will be destroyed and which won't, otherwise result will be wrong anyway. What I see confirms what I think: non-leaf functions continue to use rbx, rsi and rdi, not r8..r11. Must admit I don't understand how it happens: trgobj.preserved_by_proc is nowhere read, saved_standard_registers are only encountered in prolog and epilog generation code. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 21:34, schrieb Daniël Mantione: > I think the challenge is do design some generic infrastructure to tell > the register allocator about biasing it should do, and then to add some > heuristics somewhere else (like leaf/non-leaf) to give the register > allocator the proper instructions. True, but we even don't find the time to extend the reg. allocator to handle overlapping registers better so starting with different register allocation initializations is a good approach imo. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Op Sat, 9 Apr 2011, schreef Florian Klämpfl: Am 09.04.2011 21:04, schrieb Sergei Gorelkin: 09.04.2011 22:26, Sergei Gorelkin ?: 09.04.2011 22:13, Jonas Maebe ?: Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Hmm, that was the first thing I tried, but it doesnt't seem to make any difference :( No, it works, I simply looked at the wrong place. As usual :-/ The right place to look is the function not calling other functions, not just any function simple enough. Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without the change. Note the prolog and epilog (almost) gone. This is of course a very quick test, and I'll run the testsuite to check more thoroughly. If no issues pop up, it is ok to commit? Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? This is a form of "biasing", the register allocator is biased to put certain values in certain registers. It's a very old trick to get better register allocations, and the iterated coalescing we do gets much better results than old biased algorithms. However, I had noted that in many cases the iterated coalescing still leaves a lot of freedom during the actual allocations and adding some biasing at this point may be helpfull. I think the challenge is do design some generic infrastructure to tell the register allocator about biasing it should do, and then to add some heuristics somewhere else (like leaf/non-leaf) to give the register allocator the proper instructions. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 21:04, schrieb Sergei Gorelkin: > 09.04.2011 22:26, Sergei Gorelkin пишет: >> 09.04.2011 22:13, Jonas Maebe пишет: >>> >>> Simply changing the register order in the array to trgcpu.create in >>> Tcgx86_64.init_register_allocators should do it. >>> >> Hmm, that was the first thing I tried, but it doesnt't seem to make >> any difference :( >> > No, it works, I simply looked at the wrong place. As usual :-/ > > The right place to look is the function not calling other functions, not > just any function simple enough. > Attached are assembler listings of system.indexqword() compiled for > win64 with -O2, with and without the change. Note the prolog and epilog > (almost) gone. > > This is of course a very quick test, and I'll run the testsuite to check > more thoroughly. > If no issues pop up, it is ok to commit? Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
09.04.2011 22:15, Florian Klämpfl пишет: The registers are allocated in the order defined in tcgx86_64.init_registers_allocators. However, there are rax etc. in front of rbx etc. The reason why rbx etc. are used might be calls to other procedures. Can you give an example which is affected by the problem mentioned above? I attached an example to the answer to Jonas, in adjacent branch. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
09.04.2011 22:26, Sergei Gorelkin пишет: 09.04.2011 22:13, Jonas Maebe пишет: Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Hmm, that was the first thing I tried, but it doesnt't seem to make any difference :( No, it works, I simply looked at the wrong place. As usual :-/ The right place to look is the function not calling other functions, not just any function simple enough. Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without the change. Note the prolog and epilog (almost) gone. This is of course a very quick test, and I'll run the testsuite to check more thoroughly. If no issues pop up, it is ok to commit? Sergei SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64: ; Temps allocated between rsp+32 and rsp+56 ; [377] begin sub rsp,104 ; Var buf located in register rcx ; Var len located in register rdx ; Var b located in register r8 ; Var $result located in register rax ; Var psrc located in register rbx ; Var pend located in register rsi mov qword ptr [rsp+32],rbx mov qword ptr [rsp+40],rdi mov qword ptr [rsp+48],rsi ; [378] psrc:=@buf; mov rbx,rcx ; [381] if (len < 0) or mov rax,rdx cmp rax,0 jl @@j373 ; [382] (len > high(PtrInt) div 4) or mov rax,rdx mov rsi,2305843009213693951 cmp rax,rsi jg @@j373 ; [383] (psrc+len < psrc) then mov rax,rdx shl rax,3 add rax,rbx cmp rax,rbx jnb @@j374 @@j373: ; [384] pend:=pqword(high(PtrUInt)-sizeof(qword)) mov rsi,-9 jmp @@j383 @@j374: ; [386] pend:=psrc+len; shl rdx,3 add rdx,rbx mov rsi,rdx ; [400] while psrcSYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64: ; Temps allocated between rsp+32 and rsp+32 ; [377] begin sub rsp,72 ; Var buf located in register rcx ; Var len located in register rdx ; Var b located in register r8 ; Var $result located in register rax ; Var psrc located in register r9 ; Var pend located in register r10 ; [378] psrc:=@buf; mov r9,rcx ; [381] if (len < 0) or mov rax,rdx cmp rax,0 jl @@j373 ; [382] (len > high(PtrInt) div 4) or mov rax,rdx mov r10,2305843009213693951 cmp rax,r10 jg @@j373 ; [383] (psrc+len < psrc) then mov rax,rdx shl rax,3 add rax,r9 cmp rax,r9 jnb @@j374 @@j373: ; [384] pend:=pqword(high(PtrUInt)-sizeof(qword)) mov r10,-9 jmp @@j383 @@j374: ; [386] pend:=psrc+len; shl rdx,3 add rdx,r9 mov r10,rdx ; [400] while psrc___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
09.04.2011 22:13, Jonas Maebe пишет: On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote: I wonder whether it is possible to assign a priority (or order) of registers for FPC's register allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes in simple procedures, resulting in nicer code. Maybe somebody could share some clues about if this is possible and where to start looking? Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Hmm, that was the first thing I tried, but it doesnt't seem to make any difference :( Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 20:08, schrieb Sergei Gorelkin: > Hello, > > I wonder whether it is possible to assign a priority (or order) of > registers for FPC's register allocator. Currently registers are > allocated in the order of ordinals defined in cpubase.pas. On i386 it > doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in > Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11. > Reversing this order would help avoiding stackframes in simple > procedures, resulting in nicer code. > > Maybe somebody could share some clues about if this is possible and > where to start looking? The registers are allocated in the order defined in tcgx86_64.init_registers_allocators. However, there are rax etc. in front of rbx etc. The reason why rbx etc. are used might be calls to other procedures. Can you give an example which is affected by the problem mentioned above? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote: > I wonder whether it is possible to assign a priority (or order) of registers > for FPC's register allocator. Currently registers are allocated in the order > of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, > but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always > used before 'volatile' ones r8..r11. Reversing this order would help avoiding > stackframes in simple procedures, resulting in nicer code. > > Maybe somebody could share some clues about if this is possible and where to > start looking? Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Register allocation question
Hello, I wonder whether it is possible to assign a priority (or order) of registers for FPC's register allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes in simple procedures, resulting in nicer code. Maybe somebody could share some clues about if this is possible and where to start looking? Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel