Re: [fpc-devel] Register allocation question

2011-04-10 Thread Florian Klämpfl
Am 10.04.2011 12:38, schrieb Sergei Gorelkin:
> 
> By now I had run the test suite in x86_64-linux, without regressions.

Feel free to commit it then.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-10 Thread Sergei Gorelkin

10.04.2011 00:49, Florian Klämpfl пишет:

Am 09.04.2011 22:22, schrieb Sergei Gorelkin:

09.04.2011 23:10, Florian Klämpfl пишет:


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave
functions?


I understand the concern, but it should be handled somehow already. If
we consider a non-leaf function that is complex enough to consume all 14
registers, what difference does the order of allocation make?


It is not needed to use all 14, but it might be more benefical to use
those which are preserved across a function call.


When
making a call, it must know which registers will be destroyed and which
won't, otherwise result will be wrong anyway.
What I see confirms what I think: non-leaf functions continue to use
rbx, rsi and rdi, not r8..r11.


So the code for those does not change?


Some do not change (that's why I was initially writing that it doesn't work), other change, but 
never to the worse. For example, if register was e.g. rbx but its live range was not intersecting 
the call, then it is replaced by volatile one like r8. If its live range intersects the call, it can 
be changed to other nonvolatile register like rdi. Likewise, registers within volatile group are 
interchanged. But I don't see it replacing nonvolatile register with volatile one if that would 
require adding spilling instructions.


By now I had run the test suite in x86_64-linux, without regressions.


Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 22:22, schrieb Sergei Gorelkin:
> 09.04.2011 23:10, Florian Klämpfl пишет:
>>
>> Problem is, this might hurt non leaf functions. Maybe the register
>> allocators can be initialized differently for leave and non-leave
>> functions?
> 
> I understand the concern, but it should be handled somehow already. If
> we consider a non-leaf function that is complex enough to consume all 14
> registers, what difference does the order of allocation make? 

It is not needed to use all 14, but it might be more benefical to use
those which are preserved across a function call.

> When
> making a call, it must know which registers will be destroyed and which
> won't, otherwise result will be wrong anyway.
> What I see confirms what I think: non-leaf functions continue to use
> rbx, rsi and rdi, not r8..r11.

So the code for those does not change?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 23:10, Florian Klämpfl пишет:


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?


I understand the concern, but it should be handled somehow already. If we consider a non-leaf 
function that is complex enough to consume all 14 registers, what difference does the order of 
allocation make? When making a call, it must know which registers will be destroyed and which won't, 
otherwise result will be wrong anyway.

What I see confirms what I think: non-leaf functions continue to use rbx, rsi 
and rdi, not r8..r11.
Must admit I don't understand how it happens: trgobj.preserved_by_proc is nowhere read, 
saved_standard_registers are only encountered in prolog and epilog generation code.


Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 21:34, schrieb Daniël Mantione:
> I think the challenge is do design some generic infrastructure to tell
> the register allocator about biasing it should do, and then to add some
> heuristics somewhere else (like leaf/non-leaf) to give the register
> allocator the proper instructions.

True, but we even don't find the time to extend the reg. allocator to
handle overlapping registers better so starting with different register
allocation initializations is a good approach imo.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Daniël Mantione



Op Sat, 9 Apr 2011, schreef Florian Klämpfl:


Am 09.04.2011 21:04, schrieb Sergei Gorelkin:

09.04.2011 22:26, Sergei Gorelkin ?:

09.04.2011 22:13, Jonas Maebe ?:


Simply changing the register order in the array to trgcpu.create in
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make
any difference :(


No, it works, I simply looked at the wrong place. As usual :-/

The right place to look is the function not calling other functions, not
just any function simple enough.
Attached are assembler listings of system.indexqword() compiled for
win64 with -O2, with and without the change. Note the prolog and epilog
(almost) gone.

This is of course a very quick test, and I'll run the testsuite to check
more thoroughly.
If no issues pop up, it is ok to commit?


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?


This is a form of "biasing", the register allocator is biased to put 
certain values in certain registers. It's a very old trick to get better 
register allocations, and the iterated coalescing we do gets much better 
results than old biased algorithms.


However, I had noted that in many cases the iterated coalescing still 
leaves a lot of freedom during the actual allocations and adding some 
biasing at this point may be helpfull.


I think the challenge is do design some generic infrastructure to tell the 
register allocator about biasing it should do, and then to add some 
heuristics somewhere else (like leaf/non-leaf) to give the register 
allocator the proper instructions.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 21:04, schrieb Sergei Gorelkin:
> 09.04.2011 22:26, Sergei Gorelkin пишет:
>> 09.04.2011 22:13, Jonas Maebe пишет:
>>>
>>> Simply changing the register order in the array to trgcpu.create in
>>> Tcgx86_64.init_register_allocators should do it.
>>>
>> Hmm, that was the first thing I tried, but it doesnt't seem to make
>> any difference :(
>>
> No, it works, I simply looked at the wrong place. As usual :-/
> 
> The right place to look is the function not calling other functions, not
> just any function simple enough.
> Attached are assembler listings of system.indexqword() compiled for
> win64 with -O2, with and without the change. Note the prolog and epilog
> (almost) gone.
> 
> This is of course a very quick test, and I'll run the testsuite to check
> more thoroughly.
> If no issues pop up, it is ok to commit?

Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:15, Florian Klämpfl пишет:


The registers are allocated in the order defined in
tcgx86_64.init_registers_allocators. However, there are rax etc. in
front of rbx etc. The reason why rbx etc. are used might be calls to
other procedures. Can you give an example which is affected by the
problem mentioned above?


I attached an example to the answer to Jonas, in adjacent branch.

Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:26, Sergei Gorelkin пишет:

09.04.2011 22:13, Jonas Maebe пишет:


Simply changing the register order in the array to trgcpu.create in
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make any 
difference :(


No, it works, I simply looked at the wrong place. As usual :-/

The right place to look is the function not calling other functions, not just any function simple 
enough.
Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without 
the change. Note the prolog and epilog (almost) gone.


This is of course a very quick test, and I'll run the testsuite to check more 
thoroughly.
If no issues pop up, it is ok to commit?

Sergei
SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64:
; Temps allocated between rsp+32 and rsp+56
; [377] begin
sub rsp,104
; Var buf located in register rcx
; Var len located in register rdx
; Var b located in register r8
; Var $result located in register rax
; Var psrc located in register rbx
; Var pend located in register rsi
mov qword ptr [rsp+32],rbx
mov qword ptr [rsp+40],rdi
mov qword ptr [rsp+48],rsi
; [378] psrc:=@buf;
mov rbx,rcx
; [381] if (len < 0) or
mov rax,rdx
cmp rax,0
jl  @@j373
; [382] (len > high(PtrInt) div 4) or
mov rax,rdx
mov rsi,2305843009213693951
cmp rax,rsi
jg  @@j373
; [383] (psrc+len < psrc) then
mov rax,rdx
shl rax,3
add rax,rbx
cmp rax,rbx
jnb @@j374
@@j373:
; [384] pend:=pqword(high(PtrUInt)-sizeof(qword))
mov rsi,-9
jmp @@j383
@@j374:
; [386] pend:=psrc+len;
shl rdx,3
add rdx,rbx
mov rsi,rdx
; [400] while psrcSYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64:
; Temps allocated between rsp+32 and rsp+32
; [377] begin
sub rsp,72
; Var buf located in register rcx
; Var len located in register rdx
; Var b located in register r8
; Var $result located in register rax
; Var psrc located in register r9
; Var pend located in register r10
; [378] psrc:=@buf;
mov r9,rcx
; [381] if (len < 0) or
mov rax,rdx
cmp rax,0
jl  @@j373
; [382] (len > high(PtrInt) div 4) or
mov rax,rdx
mov r10,2305843009213693951
cmp rax,r10
jg  @@j373
; [383] (psrc+len < psrc) then
mov rax,rdx
shl rax,3
add rax,r9
cmp rax,r9
jnb @@j374
@@j373:
; [384] pend:=pqword(high(PtrUInt)-sizeof(qword))
mov r10,-9
jmp @@j383
@@j374:
; [386] pend:=psrc+len;
shl rdx,3
add rdx,r9
mov r10,rdx
; [400] while psrc___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:13, Jonas Maebe пишет:


On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote:


I wonder whether it is possible to assign a priority (or order) of registers 
for FPC's register allocator. Currently registers are allocated in the order of 
ordinals defined in cpubase.pas. On i386 it doesn't make any difference, but on 
x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always used before 
'volatile' ones r8..r11. Reversing this order would help avoiding stackframes 
in simple procedures, resulting in nicer code.

Maybe somebody could share some clues about if this is possible and where to 
start looking?


Simply changing the register order in the array to trgcpu.create in 
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make any 
difference :(

Sergei


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 20:08, schrieb Sergei Gorelkin:
> Hello,
> 
> I wonder whether it is possible to assign a priority (or order) of
> registers for FPC's register allocator. Currently registers are
> allocated in the order of ordinals defined in cpubase.pas. On i386 it
> doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in
> Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11.
> Reversing this order would help avoiding stackframes in simple
> procedures, resulting in nicer code.
> 
> Maybe somebody could share some clues about if this is possible and
> where to start looking?


The registers are allocated in the order defined in
tcgx86_64.init_registers_allocators. However, there are rax etc. in
front of rbx etc. The reason why rbx etc. are used might be calls to
other procedures. Can you give an example which is affected by the
problem mentioned above?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Jonas Maebe

On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote:

> I wonder whether it is possible to assign a priority (or order) of registers 
> for FPC's register allocator. Currently registers are allocated in the order 
> of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, 
> but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always 
> used before 'volatile' ones r8..r11. Reversing this order would help avoiding 
> stackframes in simple procedures, resulting in nicer code.
> 
> Maybe somebody could share some clues about if this is possible and where to 
> start looking?

Simply changing the register order in the array to trgcpu.create in 
Tcgx86_64.init_register_allocators should do it.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

Hello,

I wonder whether it is possible to assign a priority (or order) of registers for FPC's register 
allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On 
i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) 
are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes 
in simple procedures, resulting in nicer code.


Maybe somebody could share some clues about if this is possible and where to 
start looking?

Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel