Only two changes appear in the compiled RTL, both in SysUtils, but one is pretty significant.  Here's the "IsLeapYear" function under i386-win32:

Trunk:

.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
    .balign 16,0x90.globl    SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
    pushl    %ebx
# Peephole Optimization: Mov2Nop 3 done
# Peephole Optimization: %cx = %ax; changed to minimise pipeline stall (MovXXX2MovXXX)
    movzwl    %ax,%ecx
# Peephole Optimization: MovAndTest2Test done
    testl    $3,%ecx
    jne    .Lj6455
    movl    %ecx,%ebx
    movl    $1374389535,%eax
    mull    %ebx
    shrl    $5,%edx
    imull    $100,%edx
    subl    %edx,%ebx
    jne    .Lj6456
    movl    $1374389535,%eax
    mull    %ecx
    shrl    $7,%edx
    imull    $400,%edx
    subl    %edx,%ecx
    jne    .Lj6455
.Lj6456:
    movb    $1,%al
    jmp    .Lj6460
.Lj6455:
    xorb    %al,%al
.Lj6460:
    popl    %ebx
    ret

New algorithm:

.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
    .balign 16,0x90
.globl    SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
    movzwl    %ax,%edx
    andl    $3,%edx
    jne    .Lj6455
    imulw    $23593,%ax,%dx
    rorw    $2,%dx
    cmpw    $655,%dx
    ja    .Lj6456
    imulw    $23593,%ax,%ax
    rorw    $4,%ax
    cmpw    $163,%ax
    jnbe    .Lj6455
.Lj6456:
    movb    $1,%al
    ret
.Lj6455:
    xorb    %al,%al
    ret

Besides the significantly smaller code, there's significantly less register strain, which may open up some new peephole optimisations (e.g. I could theoretically use the volatile %cx to store the result of "imulw $23593,%ax,%dx" (via "movw %dx,%cx" right after it, which would execute simultaneously with "rorw $2,%dx") and then call "mov %cx,%ax" in place of the second multiplication, after which the peephole optimizer could have a lot of fun with simplification!

(Note: The reason why the multiplication by 23593 appears twice is because that number, or $5C29, is the multiplicative inverse of 25 modulo 2^16. The function is checking for "(Year mod 100) = 0" and "(Year mod 400) <> 0", and 100 = 25 * 2^2 and 400 = 25 * 2^4)

The most optimal code I can theorise so far is:

.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
    .balign 16,0x90
.globl    SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
    testl   $3,%ax
    jne    .Lj6455
    imulw    $23593,%ax,%ax
    rorw    $2,%ax
    cmpw    $655,%ax
    ja    .Lj6456
    rorw    $2,%ax
    cmpw    $163,%ax
    setbeb  %al
    ret
.Lj6456:
    movb    $1,%al
    ret
.Lj6455:
    xorb    %al,%al
    ret

Changing "jnbe .Lj6455" to "setbeb %al; ret" is the easiest to implement, but it's a start!

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to