[fpc-devel] New deep optimisation

J. Gareth Moreton via fpc-devel Fri, 01 Oct 2021 09:11:14 -0700

Hi everyone,

I've started playing around with an optimisation on x86 platforms thatlooks for common instructions that appear on both branches of a Jccinstruction (i.e. after the label it jumps to and after the jumpitself), and so far I'm having a lot of success. For example, in theMath unit - before:

...

# Peephole Optimization: %rdx = %rdi; removed unnecessary instruction(MovMov2MovNop 6b}

    call    fpc_do_is
    testb    %al,%al
    je    .Lj196
    movq    %rdi,%rdx
    movq    %rsi,%rcx
    call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
    movb    %al,%bl
    jmp    .Lj197
    .p2align 4,,10
    .p2align 3
.Lj196:
    movq    %rdi,%rdx
    movq    %rsi,%rcx
    call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
    movb    %al,%bl
.Lj197:
    movb %bl,%al
    ...

After:

    ...

# Peephole Optimization: %rdx = %rdi; removed unnecessary instruction(MovMov2MovNop 6b}

    call    fpc_do_is

# Peephole Optimization: Swapped test and mov instructions to improveoptimisation potential

    movq    %rdi,%rdx

# Peephole Optimization: Swapped test and mov instructions to improveoptimisation potential

    movq    %rsi,%rcx
    testb    %al,%al

# Peephole Optimization: Moved mov instruction common to both branchesto before jump# Peephole Optimization: Moved mov instruction common to both branchesto before jump# Peephole Optimization: Moved destination label ahead of commoninstructions

    je    .Lj198
    call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
    movb    %al,%bl
    jmp    .Lj197
    .p2align 4,,10
    .p2align 3
.Lj198:
    call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
    movb    %al,%bl
.Lj197:
    movb    %bl,%al
    ...

In the above example, the parameter configuration prior to the two CALLinstructions are identical, so it can move these to before the branchingjump.

However, some optimisations are not triggering because they expect ajump or SETcc instruction to appear directly after a TEST instruction,for example, and I can't just track the FLAGS register because it has tocheck the condition that's being used too (e.g. "MovAndTest2Test"requires the condition be C_E or C_NE).


There are a couple of solutions to this:

- Some instructions like those in the post-peephole stage could beadapted to look ahead further for an appropriate instruction, stoppingif it finds one or if it finds another instruction that modifies theflags. This will produce more complicated compiler code though.

- Have a flag that tells the compiler to run pass 1 again after pass 2(and have my common instruction optimisations in pass 2). This wouldallow deeper optimisations but may cause significant slowdown in thecompiler, so I would only recommend this flag be honoured under -O3 and -O4.

I'm trying to weigh the pros and cons of each, not least because in somecases, my common instruction optimisations aren't as efficient in pass 2because other pass 1 optimisations ensure the instructions either sideof the branch are no longer identical.

Currently I'm seeing if I can avoid rerunning pass 1 and insteadimproving the problematic optimisations to be more flexible with thelocation of their SETcc and Jcc instructions.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

[fpc-devel] New deep optimisation

Reply via email to