[fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

Hello,

I wonder whether it is possible to assign a priority (or order) of registers for FPC's register 
allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On 
i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) 
are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes 
in simple procedures, resulting in nicer code.


Maybe somebody could share some clues about if this is possible and where to 
start looking?

Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Jonas Maebe

On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote:

 I wonder whether it is possible to assign a priority (or order) of registers 
 for FPC's register allocator. Currently registers are allocated in the order 
 of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, 
 but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always 
 used before 'volatile' ones r8..r11. Reversing this order would help avoiding 
 stackframes in simple procedures, resulting in nicer code.
 
 Maybe somebody could share some clues about if this is possible and where to 
 start looking?

Simply changing the register order in the array to trgcpu.create in 
Tcgx86_64.init_register_allocators should do it.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 20:08, schrieb Sergei Gorelkin:
 Hello,
 
 I wonder whether it is possible to assign a priority (or order) of
 registers for FPC's register allocator. Currently registers are
 allocated in the order of ordinals defined in cpubase.pas. On i386 it
 doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in
 Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11.
 Reversing this order would help avoiding stackframes in simple
 procedures, resulting in nicer code.
 
 Maybe somebody could share some clues about if this is possible and
 where to start looking?


The registers are allocated in the order defined in
tcgx86_64.init_registers_allocators. However, there are rax etc. in
front of rbx etc. The reason why rbx etc. are used might be calls to
other procedures. Can you give an example which is affected by the
problem mentioned above?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:26, Sergei Gorelkin пишет:

09.04.2011 22:13, Jonas Maebe пишет:


Simply changing the register order in the array to trgcpu.create in
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make any 
difference :(


No, it works, I simply looked at the wrong place. As usual :-/

The right place to look is the function not calling other functions, not just any function simple 
enough.
Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without 
the change. Note the prolog and epilog (almost) gone.


This is of course a very quick test, and I'll run the testsuite to check more 
thoroughly.
If no issues pop up, it is ok to commit?

Sergei
SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64:
; Temps allocated between rsp+32 and rsp+56
; [377] begin
sub rsp,104
; Var buf located in register rcx
; Var len located in register rdx
; Var b located in register r8
; Var $result located in register rax
; Var psrc located in register rbx
; Var pend located in register rsi
mov qword ptr [rsp+32],rbx
mov qword ptr [rsp+40],rdi
mov qword ptr [rsp+48],rsi
; [378] psrc:=@buf;
mov rbx,rcx
; [381] if (len  0) or
mov rax,rdx
cmp rax,0
jl  @@j373
; [382] (len  high(PtrInt) div 4) or
mov rax,rdx
mov rsi,2305843009213693951
cmp rax,rsi
jg  @@j373
; [383] (psrc+len  psrc) then
mov rax,rdx
shl rax,3
add rax,rbx
cmp rax,rbx
jnb @@j374
@@j373:
; [384] pend:=pqword(high(PtrUInt)-sizeof(qword))
mov rsi,-9
jmp @@j383
@@j374:
; [386] pend:=psrc+len;
shl rdx,3
add rdx,rbx
mov rsi,rdx
; [400] while psrcpend do
jmp @@j383
ALIGN 8
@@j382:
; [402] if psrc^=b then
mov rdx,qword ptr [rbx]
cmp rdx,r8
jne @@j386
; [404] result:=psrc-pqword(@buf);
mov rdx,rcx
mov rdi,rbx
sub rdi,rdx
mov rdx,rdi
mov rdi,rdx
sar rdi,63
and rdi,7
add rdx,rdi
sar rdx,3
mov rax,rdx
; [405] exit;
jmp @@j369
@@j386:
; [407] inc(psrc);
add rbx,8
@@j383:
mov rdx,rbx
cmp rdx,rsi
jb  @@j382
; [409] result:=-1;
mov rax,-1
@@j369:
; [410] end;
mov rbx,qword ptr [rsp+32]
mov rdi,qword ptr [rsp+40]
mov rsi,qword ptr [rsp+48]
add rsp,104
ret
_TEXT   ENDS
SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64:
; Temps allocated between rsp+32 and rsp+32
; [377] begin
sub rsp,72
; Var buf located in register rcx
; Var len located in register rdx
; Var b located in register r8
; Var $result located in register rax
; Var psrc located in register r9
; Var pend located in register r10
; [378] psrc:=@buf;
mov r9,rcx
; [381] if (len  0) or
mov rax,rdx
cmp rax,0
jl  @@j373
; [382] (len  high(PtrInt) div 4) or
mov rax,rdx
mov r10,2305843009213693951
cmp rax,r10
jg  @@j373
; [383] (psrc+len  psrc) then
mov rax,rdx
shl rax,3
add rax,r9
cmp rax,r9
jnb @@j374
@@j373:
; [384] pend:=pqword(high(PtrUInt)-sizeof(qword))
mov r10,-9
jmp @@j383
@@j374:
; [386] pend:=psrc+len;
shl rdx,3
add rdx,r9
mov r10,rdx
; [400] while psrcpend do
jmp @@j383
ALIGN 8
@@j382:
; [402] if psrc^=b then
mov rdx,qword ptr [r9]
cmp rdx,r8
jne @@j386
; [404] result:=psrc-pqword(@buf);
mov rdx,rcx
mov r11,r9
sub r11,rdx
mov rdx,r11
mov r11,rdx
sar r11,63
and r11,7
add rdx,r11
sar rdx,3
mov rax,rdx
; [405] exit;
jmp @@j369
@@j386:
; [407] inc(psrc);
add r9,8
@@j383:
mov rdx,r9
cmp rdx,r10
jb  @@j382
; [409] result:=-1;
mov 

Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:15, Florian Klämpfl пишет:


The registers are allocated in the order defined in
tcgx86_64.init_registers_allocators. However, there are rax etc. in
front of rbx etc. The reason why rbx etc. are used might be calls to
other procedures. Can you give an example which is affected by the
problem mentioned above?


I attached an example to the answer to Jonas, in adjacent branch.

Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 21:04, schrieb Sergei Gorelkin:
 09.04.2011 22:26, Sergei Gorelkin пишет:
 09.04.2011 22:13, Jonas Maebe пишет:

 Simply changing the register order in the array to trgcpu.create in
 Tcgx86_64.init_register_allocators should do it.

 Hmm, that was the first thing I tried, but it doesnt't seem to make
 any difference :(

 No, it works, I simply looked at the wrong place. As usual :-/
 
 The right place to look is the function not calling other functions, not
 just any function simple enough.
 Attached are assembler listings of system.indexqword() compiled for
 win64 with -O2, with and without the change. Note the prolog and epilog
 (almost) gone.
 
 This is of course a very quick test, and I'll run the testsuite to check
 more thoroughly.
 If no issues pop up, it is ok to commit?

Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Daniël Mantione



Op Sat, 9 Apr 2011, schreef Florian Klämpfl:


Am 09.04.2011 21:04, schrieb Sergei Gorelkin:

09.04.2011 22:26, Sergei Gorelkin ?:

09.04.2011 22:13, Jonas Maebe ?:


Simply changing the register order in the array to trgcpu.create in
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make
any difference :(


No, it works, I simply looked at the wrong place. As usual :-/

The right place to look is the function not calling other functions, not
just any function simple enough.
Attached are assembler listings of system.indexqword() compiled for
win64 with -O2, with and without the change. Note the prolog and epilog
(almost) gone.

This is of course a very quick test, and I'll run the testsuite to check
more thoroughly.
If no issues pop up, it is ok to commit?


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?


This is a form of biasing, the register allocator is biased to put 
certain values in certain registers. It's a very old trick to get better 
register allocations, and the iterated coalescing we do gets much better 
results than old biased algorithms.


However, I had noted that in many cases the iterated coalescing still 
leaves a lot of freedom during the actual allocations and adding some 
biasing at this point may be helpfull.


I think the challenge is do design some generic infrastructure to tell the 
register allocator about biasing it should do, and then to add some 
heuristics somewhere else (like leaf/non-leaf) to give the register 
allocator the proper instructions.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 21:34, schrieb Daniël Mantione:
 I think the challenge is do design some generic infrastructure to tell
 the register allocator about biasing it should do, and then to add some
 heuristics somewhere else (like leaf/non-leaf) to give the register
 allocator the proper instructions.

True, but we even don't find the time to extend the reg. allocator to
handle overlapping registers better so starting with different register
allocation initializations is a good approach imo.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] fcl-web is not copied by make install

2011-04-09 Thread ABorka

On 4/9/2011 11:26, Joost van der Sluis wrote:

On Sat, 2011-04-09 at 00:22 -0700, ABorka wrote:

Is it intentional that the fcl-web package is not copied when make
install is called?
make all compiles the units properly, they are just not copied by
make install.


Are you sure? Which files do you think are not copied? Do you
cross-compile?

Joost.


Well, Im pretty sure. Not even the directory is created for it in the
c:/pp/units/i386-win32/ directory when make install is executed.
If I create an empty directory for it there, no change, it remains empty.
The other fcl packages are copied properly.

For FCL-web these are the make install outputs:

C:/pp/bin/i386-win32/make.EXE -C fcl-web distinstall
make.EXE[4]: Entering directory `C:/fpc_svn/packages/fcl-web'
C:/fpc_svn/compiler/ppc386.exe fpmake.pp -Ur -Xs -O2 -n 
-FuC:/fpc_svn/rtl/units/i386-win32 
-FuC:/fpc_svn/packages/hash/units/i386-win32 
-FuC:/fpc_svn/packages/paszlib/units/i386-win32 
-FuC:/fpc_svn/packages/fcl-process/units/i386-win32 
-FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -FE. 
-FUunits/i386-win32 -di386 -dRELEASE
.\fpmake.exe install --localunitdir=../.. --globalunitdir=.. --os=win32 
--cpu=i386 -o -Ur -o -Xs -o -O2 -o -n -o 
-FuC:/fpc_svn/rtl/units/i386-win32 -o 
-FuC:/fpc_svn/packages/hash/units/i386-win32 -o 
-FuC:/fpc_svn/packages/paszlib/units/i386-win32 -o 
-FuC:/fpc_svn/packages/fcl-process/units/i386-win32 -o 
-FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -o -FE. -o 
-FUunits/i386-win32 -o -di386 -o -dRELEASE 
--compiler=C:/fpc_svn/compiler/ppc386.exe --prefix=

Installation package fcl-web for target i386-win32 succeeded
make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fcl-web'


For fastcgi the make install lines are:

C:/pp/bin/i386-win32/make.EXE -C fastcgi distinstall
make.EXE[4]: Entering directory `C:/fpc_svn/packages/fastcgi'
C:/fpc_svn/utils/fpcm/fpcmake.exe -p -Ti386-win32 Makefile.fpc
Processing Makefile.fpc
 Writing Package.fpc
C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/cp.exe -fp Package.fpc /pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.ppu 
/pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.o 
/pp/units/i386-win32/fastcgi

make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fastcgi'


It seems, something is missing for FCL-web, because there is no cp.exe 
called at all to copy over the units.


AB

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] fcl-web is not copied by make install

2011-04-09 Thread ABorka

On 4/9/2011 12:43, ABorka wrote:

On 4/9/2011 11:26, Joost van der Sluis wrote:

On Sat, 2011-04-09 at 00:22 -0700, ABorka wrote:

Is it intentional that the fcl-web package is not copied when make
install is called?
make all compiles the units properly, they are just not copied by
make install.


Are you sure? Which files do you think are not copied? Do you
cross-compile?

Joost.


Well, Im pretty sure. Not even the directory is created for it in the
c:/pp/units/i386-win32/ directory when make install is executed.
If I create an empty directory for it there, no change, it remains empty.
The other fcl packages are copied properly.

For FCL-web these are the make install outputs:

C:/pp/bin/i386-win32/make.EXE -C fcl-web distinstall
make.EXE[4]: Entering directory `C:/fpc_svn/packages/fcl-web'
C:/fpc_svn/compiler/ppc386.exe fpmake.pp -Ur -Xs -O2 -n
-FuC:/fpc_svn/rtl/units/i386-win32
-FuC:/fpc_svn/packages/hash/units/i386-win32
-FuC:/fpc_svn/packages/paszlib/units/i386-win32
-FuC:/fpc_svn/packages/fcl-process/units/i386-win32
-FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -FE.
-FUunits/i386-win32 -di386 -dRELEASE
.\fpmake.exe install --localunitdir=../.. --globalunitdir=.. --os=win32
--cpu=i386 -o -Ur -o -Xs -o -O2 -o -n -o
-FuC:/fpc_svn/rtl/units/i386-win32 -o
-FuC:/fpc_svn/packages/hash/units/i386-win32 -o
-FuC:/fpc_svn/packages/paszlib/units/i386-win32 -o
-FuC:/fpc_svn/packages/fcl-process/units/i386-win32 -o
-FuC:/fpc_svn/packages/fpmkunit/units/i386-win32 -o -FE. -o
-FUunits/i386-win32 -o -di386 -o -dRELEASE
--compiler=C:/fpc_svn/compiler/ppc386.exe --prefix=
Installation package fcl-web for target i386-win32 succeeded
make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fcl-web'


For fastcgi the make install lines are:

C:/pp/bin/i386-win32/make.EXE -C fastcgi distinstall
make.EXE[4]: Entering directory `C:/fpc_svn/packages/fastcgi'
C:/fpc_svn/utils/fpcm/fpcmake.exe -p -Ti386-win32 Makefile.fpc
Processing Makefile.fpc
Writing Package.fpc
C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/cp.exe -fp Package.fpc /pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/ginstall.exe -m 755 -d /pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.ppu
/pp/units/i386-win32/fastcgi
C:/pp/bin/i386-win32/cp.exe -fp units/i386-win32/fastcgi.o
/pp/units/i386-win32/fastcgi
make.EXE[4]: Leaving directory `C:/fpc_svn/packages/fastcgi'


It seems, something is missing for FCL-web, because there is no cp.exe
called at all to copy over the units.

AB


Actually, it seems it copies this one package to the wrong place, not to 
c:/pp/  like it copies the others


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] make clean does not delete the fcl-web units

2011-04-09 Thread ABorka
Just like the make install does not copy the FCL-web units to the 
right place, make clean does not remove them either. Here is the output:


.
.
.
C:/pp/bin/i386-win32/make.EXE -C fcl-web distclean
make.EXE[2]: Entering directory `C:/fpc_svn/packages/fcl-web'
make.EXE[2]: Nothing to be done for `distclean'.
make.EXE[2]: Leaving directory `C:/fpc_svn/packages/fcl-web'
.
.
.


It leaves the units/... files in there.
Win XP 32bit, FPC 2.5.1 SVN trunk, everything is the default

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 23:10, Florian Klämpfl пишет:


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?


I understand the concern, but it should be handled somehow already. If we consider a non-leaf 
function that is complex enough to consume all 14 registers, what difference does the order of 
allocation make? When making a call, it must know which registers will be destroyed and which won't, 
otherwise result will be wrong anyway.

What I see confirms what I think: non-leaf functions continue to use rbx, rsi 
and rdi, not r8..r11.
Must admit I don't understand how it happens: trgobj.preserved_by_proc is nowhere read, 
saved_standard_registers are only encountered in prolog and epilog generation code.


Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 22:22, schrieb Sergei Gorelkin:
 09.04.2011 23:10, Florian Klämpfl пишет:

 Problem is, this might hurt non leaf functions. Maybe the register
 allocators can be initialized differently for leave and non-leave
 functions?
 
 I understand the concern, but it should be handled somehow already. If
 we consider a non-leaf function that is complex enough to consume all 14
 registers, what difference does the order of allocation make? 

It is not needed to use all 14, but it might be more benefical to use
those which are preserved across a function call.

 When
 making a call, it must know which registers will be destroyed and which
 won't, otherwise result will be wrong anyway.
 What I see confirms what I think: non-leaf functions continue to use
 rbx, rsi and rdi, not r8..r11.

So the code for those does not change?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] make clean does not delete the fcl-web units

2011-04-09 Thread Joost van der Sluis
On Sat, 2011-04-09 at 13:05 -0700, ABorka wrote:
 Just like the make install does not copy the FCL-web units to the 
 right place, make clean does not remove them either. Here is the output:
 
 .
 .
 .
 C:/pp/bin/i386-win32/make.EXE -C fcl-web distclean
 make.EXE[2]: Entering directory `C:/fpc_svn/packages/fcl-web'
 make.EXE[2]: Nothing to be done for `distclean'.
 make.EXE[2]: Leaving directory `C:/fpc_svn/packages/fcl-web'

make clean  make distclean?

Joost

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] fcl-web is not copied by make install

2011-04-09 Thread Joost van der Sluis
On Sat, 2011-04-09 at 12:43 -0700, ABorka wrote:
 On 4/9/2011 11:26, Joost van der Sluis wrote:
  On Sat, 2011-04-09 at 00:22 -0700, ABorka wrote:
  Is it intentional that the fcl-web package is not copied when make
  install is called?
  make all compiles the units properly, they are just not copied by
  make install.
 
  Are you sure? Which files do you think are not copied? Do you
  cross-compile?
 
  Joost.
 
 Well, Im pretty sure. Not even the directory is created for it in the
 c:/pp/units/i386-win32/ directory when make install is executed.
 If I create an empty directory for it there, no change, it remains empty.
 The other fcl packages are copied properly.
 
 For FCL-web these are the make install outputs:
 
 C:/pp/bin/i386-win32/make.EXE -C fcl-web distinstall

distinstall? What is that for a beast?

I'll look into it.

Joost.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] make clean does not delete the fcl-web units

2011-04-09 Thread ABorka

On 4/9/2011 14:37, Joost van der Sluis wrote:

On Sat, 2011-04-09 at 13:05 -0700, ABorka wrote:

Just like the make install does not copy the FCL-web units to the
right place, make clean does not remove them either. Here is the output:

.
.
.
C:/pp/bin/i386-win32/make.EXE -C fcl-web distclean
make.EXE[2]: Entering directory `C:/fpc_svn/packages/fcl-web'
make.EXE[2]: Nothing to be done for `distclean'.
make.EXE[2]: Leaving directory `C:/fpc_svn/packages/fcl-web'


make clean  make distclean?

Joost


Yes, even though I enter make clean into the command line deom the 
main FPC svn checkout directory, the output is still as I indicated for 
this package.
This distinstall and distclean are there now when one does a make 
install or make clean. And not just for this package.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel