Re: [fpc-devel] Prototype optimisation... Sliding Window

J. Gareth Moreton via fpc-devel Fri, 25 Feb 2022 06:19:34 -0800


On 25/02/2022 08:29, Marco Borsari via fpc-devel wrote:

This is very useful, thank you.
I think FPC has an excellent register allocator, but frustrated on 32 bit
by scarce resources and by the lack of reloading check.


Unfortunately the equivalent procedure isn't optimised on i386-win32:

.Lj679:
    movl    %eax,%edx
.Lj680:
    movl    %edx,-832(%ebp)
    leal    (,%edx,8),%ecx
    movl    -824(%ebp),%edx
    movl    76(%edx),%eax
    cltd
    idivl    %ecx
    imull    -832(%ebp),%eax
    movl    %eax,-828(%ebp)
    addl    8(%ebp),%eax
    movl    %eax,-828(%ebp)
    movl    -832(%ebp),%eax
    leal    (,%eax,8),%ecx
    movl    -824(%ebp),%edx
    movl    76(%edx),%eax
    cltd
    idivl    %ecx
    movl    %edx,%esi

The compiler has no way of knowing that -832(%ebp) contains the value of%edx at the start and hence loaded into %eax (which is used for theinitial address instead of %edx, although the optimisation would stillfail even if they used the same registers) in the repeated sequence. Alot of these optimisations may require a means of adding 'hints' to theassembly language list to indicate the state of things.


A more minor example in the same unit (dbgdwarf):

    movl    %eax,%esi
    movl    60(%eax),%edx
    movl    -564(%ebp),%eax
    cmpl    72(%eax),%edx
    jl    .Lj359
    movl    60(%esi),%edx
    movl    -564(%ebp),%eax
    cmpl    76(%eax),%edx

This only gets optimised to:

    movl    %eax,%esi
    movl    60(%eax),%edx
    movl    -564(%ebp),%eax
    cmpl    72(%eax),%edx
    jl    .Lj359
    movl    60(%esi),%edx
    cmpl    76(%eax),%edx

This is because the peephole optimiser changes %esi to %eax in the "movl60(%eax),%edx" instruction on account that it will minimise a pipelinestall (it doesn't have to wait for %esi to get loaded when %eax isdefinitely loaded). If there was a means of leaving a hint that %esi =%eax at that point, then it might be possible to better optimise it tothe ideal:


    movl    %eax,%esi
    movl    60(%eax),%edx
    movl    -564(%ebp),%eax
    cmpl    72(%eax),%edx
    jl    .Lj359
    cmpl    76(%eax),%edx

This is what my proposed feature over athttps://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74 ismeant to help with (the showcase uses the "extra optimisationinformation" to store information on the state of the upper 32 bits ofregisters in x86_64, so it can make deeper optimisations knowing whetherit's set to zero or not).


Some other things might need some deeper thought:

    movl    -16(%ebp),%edx
    movl    (%edx),%eax
    movl    20(%eax),%eax
    movl    20(%eax),%eax
    movzbl    169(%eax),%eax
    pushl    %eax
    movl    -16(%ebp),%edx
    movl    (%edx),%eax

For some reason, the second "movl -16(%ebp),%edx" isn't removed. I'm notsure yet whether this is because the sliding window is too small (thefirst one gets removed due to another "movl -16(%ebp),%edx" that appearsearlier, so this entry does NOT appear in the sliding window, only theearlier one) or because the compiler makes some incorrect assumptionsabout PUSH instructions and hence thinks the value of %edx is unreliable.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Prototype optimisation... Sliding Window

Reply via email to