[fpc-devel] Register allocation question
Hello, I wonder whether it is possible to assign a priority (or order) of registers for FPC's register allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes in simple procedures, resulting in nicer code. Maybe somebody could share some clues about if this is possible and where to start looking? Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote: I wonder whether it is possible to assign a priority (or order) of registers for FPC's register allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes in simple procedures, resulting in nicer code. Maybe somebody could share some clues about if this is possible and where to start looking? Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 20:08, schrieb Sergei Gorelkin: Hello, I wonder whether it is possible to assign a priority (or order) of registers for FPC's register allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes in simple procedures, resulting in nicer code. Maybe somebody could share some clues about if this is possible and where to start looking? The registers are allocated in the order defined in tcgx86_64.init_registers_allocators. However, there are rax etc. in front of rbx etc. The reason why rbx etc. are used might be calls to other procedures. Can you give an example which is affected by the problem mentioned above? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
09.04.2011 22:26, Sergei Gorelkin пишет: 09.04.2011 22:13, Jonas Maebe пишет: Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Hmm, that was the first thing I tried, but it doesnt't seem to make any difference :( No, it works, I simply looked at the wrong place. As usual :-/ The right place to look is the function not calling other functions, not just any function simple enough. Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without the change. Note the prolog and epilog (almost) gone. This is of course a very quick test, and I'll run the testsuite to check more thoroughly. If no issues pop up, it is ok to commit? Sergei SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64: ; Temps allocated between rsp+32 and rsp+56 ; [377] begin sub rsp,104 ; Var buf located in register rcx ; Var len located in register rdx ; Var b located in register r8 ; Var $result located in register rax ; Var psrc located in register rbx ; Var pend located in register rsi mov qword ptr [rsp+32],rbx mov qword ptr [rsp+40],rdi mov qword ptr [rsp+48],rsi ; [378] psrc:=@buf; mov rbx,rcx ; [381] if (len 0) or mov rax,rdx cmp rax,0 jl @@j373 ; [382] (len high(PtrInt) div 4) or mov rax,rdx mov rsi,2305843009213693951 cmp rax,rsi jg @@j373 ; [383] (psrc+len psrc) then mov rax,rdx shl rax,3 add rax,rbx cmp rax,rbx jnb @@j374 @@j373: ; [384] pend:=pqword(high(PtrUInt)-sizeof(qword)) mov rsi,-9 jmp @@j383 @@j374: ; [386] pend:=psrc+len; shl rdx,3 add rdx,rbx mov rsi,rdx ; [400] while psrcpend do jmp @@j383 ALIGN 8 @@j382: ; [402] if psrc^=b then mov rdx,qword ptr [rbx] cmp rdx,r8 jne @@j386 ; [404] result:=psrc-pqword(@buf); mov rdx,rcx mov rdi,rbx sub rdi,rdx mov rdx,rdi mov rdi,rdx sar rdi,63 and rdi,7 add rdx,rdi sar rdx,3 mov rax,rdx ; [405] exit; jmp @@j369 @@j386: ; [407] inc(psrc); add rbx,8 @@j383: mov rdx,rbx cmp rdx,rsi jb @@j382 ; [409] result:=-1; mov rax,-1 @@j369: ; [410] end; mov rbx,qword ptr [rsp+32] mov rdi,qword ptr [rsp+40] mov rsi,qword ptr [rsp+48] add rsp,104 ret _TEXT ENDS SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64: ; Temps allocated between rsp+32 and rsp+32 ; [377] begin sub rsp,72 ; Var buf located in register rcx ; Var len located in register rdx ; Var b located in register r8 ; Var $result located in register rax ; Var psrc located in register r9 ; Var pend located in register r10 ; [378] psrc:=@buf; mov r9,rcx ; [381] if (len 0) or mov rax,rdx cmp rax,0 jl @@j373 ; [382] (len high(PtrInt) div 4) or mov rax,rdx mov r10,2305843009213693951 cmp rax,r10 jg @@j373 ; [383] (psrc+len psrc) then mov rax,rdx shl rax,3 add rax,r9 cmp rax,r9 jnb @@j374 @@j373: ; [384] pend:=pqword(high(PtrUInt)-sizeof(qword)) mov r10,-9 jmp @@j383 @@j374: ; [386] pend:=psrc+len; shl rdx,3 add rdx,r9 mov r10,rdx ; [400] while psrcpend do jmp @@j383 ALIGN 8 @@j382: ; [402] if psrc^=b then mov rdx,qword ptr [r9] cmp rdx,r8 jne @@j386 ; [404] result:=psrc-pqword(@buf); mov rdx,rcx mov r11,r9 sub r11,rdx mov rdx,r11 mov r11,rdx sar r11,63 and r11,7 add rdx,r11 sar rdx,3 mov rax,rdx ; [405] exit; jmp @@j369 @@j386: ; [407] inc(psrc); add r9,8 @@j383: mov rdx,r9 cmp rdx,r10 jb @@j382 ; [409] result:=-1; mov
Re: [fpc-devel] Register allocation question
09.04.2011 22:15, Florian Klämpfl пишет: The registers are allocated in the order defined in tcgx86_64.init_registers_allocators. However, there are rax etc. in front of rbx etc. The reason why rbx etc. are used might be calls to other procedures. Can you give an example which is affected by the problem mentioned above? I attached an example to the answer to Jonas, in adjacent branch. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 21:04, schrieb Sergei Gorelkin: 09.04.2011 22:26, Sergei Gorelkin пишет: 09.04.2011 22:13, Jonas Maebe пишет: Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Hmm, that was the first thing I tried, but it doesnt't seem to make any difference :( No, it works, I simply looked at the wrong place. As usual :-/ The right place to look is the function not calling other functions, not just any function simple enough. Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without the change. Note the prolog and epilog (almost) gone. This is of course a very quick test, and I'll run the testsuite to check more thoroughly. If no issues pop up, it is ok to commit? Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Op Sat, 9 Apr 2011, schreef Florian Klämpfl: Am 09.04.2011 21:04, schrieb Sergei Gorelkin: 09.04.2011 22:26, Sergei Gorelkin ?: 09.04.2011 22:13, Jonas Maebe ?: Simply changing the register order in the array to trgcpu.create in Tcgx86_64.init_register_allocators should do it. Hmm, that was the first thing I tried, but it doesnt't seem to make any difference :( No, it works, I simply looked at the wrong place. As usual :-/ The right place to look is the function not calling other functions, not just any function simple enough. Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without the change. Note the prolog and epilog (almost) gone. This is of course a very quick test, and I'll run the testsuite to check more thoroughly. If no issues pop up, it is ok to commit? Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? This is a form of biasing, the register allocator is biased to put certain values in certain registers. It's a very old trick to get better register allocations, and the iterated coalescing we do gets much better results than old biased algorithms. However, I had noted that in many cases the iterated coalescing still leaves a lot of freedom during the actual allocations and adding some biasing at this point may be helpfull. I think the challenge is do design some generic infrastructure to tell the register allocator about biasing it should do, and then to add some heuristics somewhere else (like leaf/non-leaf) to give the register allocator the proper instructions. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 21:34, schrieb Daniël Mantione: I think the challenge is do design some generic infrastructure to tell the register allocator about biasing it should do, and then to add some heuristics somewhere else (like leaf/non-leaf) to give the register allocator the proper instructions. True, but we even don't find the time to extend the reg. allocator to handle overlapping registers better so starting with different register allocation initializations is a good approach imo. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] fcl-web is not copied by make install
On 4/9/2011 11:26, Joost van der Sluis wrote: On Sat, 2011-04-09 at 00:22 -0700, ABorka wrote: Is it intentional that the fcl-web package is not copied when make install is called? make all compiles the units properly, they are just not copied by make install. Are you sure? Which files do you think are not copied? Do you cross-compile? Joost. Well, Im pretty sure. Not even the directory is created for it in the c:/pp/units/i386-win32/ directory when make install is executed. If I create an empty directory for it there, no change, it remains empty. The other fcl packages are copied properly. For FCL-web these are the make install outputs: C:/pp/bin/i386-win32/make.EXE -C fcl-web distinstall make.EXE[4]: Entering directory `C:/fpc_svn/packages/fcl-web' C:/fpc_svn/compiler/ppc386.exe fpmake.pp -Ur -Xs -O2 -n -FuC:/fpc_svn/rtl/units/i386-win32 -FuC:/fpc_svn/packages/hash/units/i386-win32 -FuC:/fpc_svn/packages/paszlib/units/i386-win32 -FuC:/fpc_svn/packages/fcl-process/units/i386-win32 -FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -FE. -FUunits/i386-win32 -di386 -dRELEASE .\fpmake.exe install --localunitdir=../.. --globalunitdir=.. --os=win32 --cpu=i386 -o -Ur -o -Xs -o -O2 -o -n -o -FuC:/fpc_svn/rtl/units/i386-win32 -o -FuC:/fpc_svn/packages/hash/units/i386-win32 -o -FuC:/fpc_svn/packages/paszlib/units/i386-win32 -o -FuC:/fpc_svn/packages/fcl-process/units/i386-win32 -o -FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -o -FE. -o -FUunits/i386-win32 -o -di386 -o -dRELEASE --compiler=C:/fpc_svn/compiler/ppc386.exe --prefix= Installation package fcl-web for target i386-win32 succeeded make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fcl-web' For fastcgi the make install lines are: C:/pp/bin/i386-win32/make.EXE -C fastcgi distinstall make.EXE[4]: Entering directory `C:/fpc_svn/packages/fastcgi' C:/fpc_svn/utils/fpcm/fpcmake.exe -p -Ti386-win32 Makefile.fpc Processing Makefile.fpc Writing Package.fpc C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/cp.exe -fp Package.fpc /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.ppu /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.o /pp/units/i386-win32/fastcgi make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fastcgi' It seems, something is missing for FCL-web, because there is no cp.exe called at all to copy over the units. AB ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] fcl-web is not copied by make install
On 4/9/2011 12:43, ABorka wrote: On 4/9/2011 11:26, Joost van der Sluis wrote: On Sat, 2011-04-09 at 00:22 -0700, ABorka wrote: Is it intentional that the fcl-web package is not copied when make install is called? make all compiles the units properly, they are just not copied by make install. Are you sure? Which files do you think are not copied? Do you cross-compile? Joost. Well, Im pretty sure. Not even the directory is created for it in the c:/pp/units/i386-win32/ directory when make install is executed. If I create an empty directory for it there, no change, it remains empty. The other fcl packages are copied properly. For FCL-web these are the make install outputs: C:/pp/bin/i386-win32/make.EXE -C fcl-web distinstall make.EXE[4]: Entering directory `C:/fpc_svn/packages/fcl-web' C:/fpc_svn/compiler/ppc386.exe fpmake.pp -Ur -Xs -O2 -n -FuC:/fpc_svn/rtl/units/i386-win32 -FuC:/fpc_svn/packages/hash/units/i386-win32 -FuC:/fpc_svn/packages/paszlib/units/i386-win32 -FuC:/fpc_svn/packages/fcl-process/units/i386-win32 -FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -FE. -FUunits/i386-win32 -di386 -dRELEASE .\fpmake.exe install --localunitdir=../.. --globalunitdir=.. --os=win32 --cpu=i386 -o -Ur -o -Xs -o -O2 -o -n -o -FuC:/fpc_svn/rtl/units/i386-win32 -o -FuC:/fpc_svn/packages/hash/units/i386-win32 -o -FuC:/fpc_svn/packages/paszlib/units/i386-win32 -o -FuC:/fpc_svn/packages/fcl-process/units/i386-win32 -o -FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -o -FE. -o -FUunits/i386-win32 -o -di386 -o -dRELEASE --compiler=C:/fpc_svn/compiler/ppc386.exe --prefix= Installation package fcl-web for target i386-win32 succeeded make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fcl-web' For fastcgi the make install lines are: C:/pp/bin/i386-win32/make.EXE -C fastcgi distinstall make.EXE[4]: Entering directory `C:/fpc_svn/packages/fastcgi' C:/fpc_svn/utils/fpcm/fpcmake.exe -p -Ti386-win32 Makefile.fpc Processing Makefile.fpc Writing Package.fpc C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/cp.exe -fp Package.fpc /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.ppu /pp/units/i386-win32/fastcgi C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.o /pp/units/i386-win32/fastcgi make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fastcgi' It seems, something is missing for FCL-web, because there is no cp.exe called at all to copy over the units. AB Actually, it seems it copies this one package to the wrong place, not to c:/pp/ like it copies the others ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] make clean does not delete the fcl-web units
Just like the make install does not copy the FCL-web units to the right place, make clean does not remove them either. Here is the output: . . . C:/pp/bin/i386-win32/make.EXE -C fcl-web distclean make.EXE[2]: Entering directory `C:/fpc_svn/packages/fcl-web' make.EXE[2]: Nothing to be done for `distclean'. make.EXE[2]: Leaving directory `C:/fpc_svn/packages/fcl-web' . . . It leaves the units/... files in there. Win XP 32bit, FPC 2.5.1 SVN trunk, everything is the default ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
09.04.2011 23:10, Florian Klämpfl пишет: Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? I understand the concern, but it should be handled somehow already. If we consider a non-leaf function that is complex enough to consume all 14 registers, what difference does the order of allocation make? When making a call, it must know which registers will be destroyed and which won't, otherwise result will be wrong anyway. What I see confirms what I think: non-leaf functions continue to use rbx, rsi and rdi, not r8..r11. Must admit I don't understand how it happens: trgobj.preserved_by_proc is nowhere read, saved_standard_registers are only encountered in prolog and epilog generation code. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Register allocation question
Am 09.04.2011 22:22, schrieb Sergei Gorelkin: 09.04.2011 23:10, Florian Klämpfl пишет: Problem is, this might hurt non leaf functions. Maybe the register allocators can be initialized differently for leave and non-leave functions? I understand the concern, but it should be handled somehow already. If we consider a non-leaf function that is complex enough to consume all 14 registers, what difference does the order of allocation make? It is not needed to use all 14, but it might be more benefical to use those which are preserved across a function call. When making a call, it must know which registers will be destroyed and which won't, otherwise result will be wrong anyway. What I see confirms what I think: non-leaf functions continue to use rbx, rsi and rdi, not r8..r11. So the code for those does not change? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] make clean does not delete the fcl-web units
On Sat, 2011-04-09 at 13:05 -0700, ABorka wrote: Just like the make install does not copy the FCL-web units to the right place, make clean does not remove them either. Here is the output: . . . C:/pp/bin/i386-win32/make.EXE -C fcl-web distclean make.EXE[2]: Entering directory `C:/fpc_svn/packages/fcl-web' make.EXE[2]: Nothing to be done for `distclean'. make.EXE[2]: Leaving directory `C:/fpc_svn/packages/fcl-web' make clean make distclean? Joost ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] fcl-web is not copied by make install
On Sat, 2011-04-09 at 12:43 -0700, ABorka wrote: On 4/9/2011 11:26, Joost van der Sluis wrote: On Sat, 2011-04-09 at 00:22 -0700, ABorka wrote: Is it intentional that the fcl-web package is not copied when make install is called? make all compiles the units properly, they are just not copied by make install. Are you sure? Which files do you think are not copied? Do you cross-compile? Joost. Well, Im pretty sure. Not even the directory is created for it in the c:/pp/units/i386-win32/ directory when make install is executed. If I create an empty directory for it there, no change, it remains empty. The other fcl packages are copied properly. For FCL-web these are the make install outputs: C:/pp/bin/i386-win32/make.EXE -C fcl-web distinstall distinstall? What is that for a beast? I'll look into it. Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] make clean does not delete the fcl-web units
On 4/9/2011 14:37, Joost van der Sluis wrote: On Sat, 2011-04-09 at 13:05 -0700, ABorka wrote: Just like the make install does not copy the FCL-web units to the right place, make clean does not remove them either. Here is the output: . . . C:/pp/bin/i386-win32/make.EXE -C fcl-web distclean make.EXE[2]: Entering directory `C:/fpc_svn/packages/fcl-web' make.EXE[2]: Nothing to be done for `distclean'. make.EXE[2]: Leaving directory `C:/fpc_svn/packages/fcl-web' make clean make distclean? Joost Yes, even though I enter make clean into the command line deom the main FPC svn checkout directory, the output is still as I indicated for this package. This distinstall and distclean are there now when one does a make install or make clean. And not just for this package. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel