On 25/02/2022 08:29, Marco Borsari via fpc-devel wrote:
This is very useful, thank you.
I think FPC has an excellent register allocator, but frustrated on 32 bit
by scarce resources and by the lack of reloading check.

Unfortunately the equivalent procedure isn't optimised on i386-win32:

.Lj679:
    movl    %eax,%edx
.Lj680:
    movl    %edx,-832(%ebp)
    leal    (,%edx,8),%ecx
    movl    -824(%ebp),%edx
    movl    76(%edx),%eax
    cltd
    idivl    %ecx
    imull    -832(%ebp),%eax
    movl    %eax,-828(%ebp)
    addl    8(%ebp),%eax
    movl    %eax,-828(%ebp)
    movl    -832(%ebp),%eax
    leal    (,%eax,8),%ecx
    movl    -824(%ebp),%edx
    movl    76(%edx),%eax
    cltd
    idivl    %ecx
    movl    %edx,%esi

The compiler has no way of knowing that -832(%ebp) contains the value of %edx at the start and hence loaded into %eax (which is used for the initial address instead of %edx, although the optimisation would still fail even if they used the same registers) in the repeated sequence.  A lot of these optimisations may require a means of adding 'hints' to the assembly language list to indicate the state of things.

A more minor example in the same unit (dbgdwarf):

    movl    %eax,%esi
    movl    60(%eax),%edx
    movl    -564(%ebp),%eax
    cmpl    72(%eax),%edx
    jl    .Lj359
    movl    60(%esi),%edx
    movl    -564(%ebp),%eax
    cmpl    76(%eax),%edx

This only gets optimised to:

    movl    %eax,%esi
    movl    60(%eax),%edx
    movl    -564(%ebp),%eax
    cmpl    72(%eax),%edx
    jl    .Lj359
    movl    60(%esi),%edx
    cmpl    76(%eax),%edx

This is because the peephole optimiser changes %esi to %eax in the "movl 60(%eax),%edx" instruction on account that it will minimise a pipeline stall (it doesn't have to wait for %esi to get loaded when %eax is definitely loaded).  If there was a means of leaving a hint that %esi = %eax at that point, then it might be possible to better optimise it to the ideal:

    movl    %eax,%esi
    movl    60(%eax),%edx
    movl    -564(%ebp),%eax
    cmpl    72(%eax),%edx
    jl    .Lj359
    cmpl    76(%eax),%edx

This is what my proposed feature over at https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74 is meant to help with (the showcase uses the "extra optimisation information" to store information on the state of the upper 32 bits of registers in x86_64, so it can make deeper optimisations knowing whether it's set to zero or not).

Some other things might need some deeper thought:

    movl    -16(%ebp),%edx
    movl    (%edx),%eax
    movl    20(%eax),%eax
    movl    20(%eax),%eax
    movzbl    169(%eax),%eax
    pushl    %eax
    movl    -16(%ebp),%edx
    movl    (%edx),%eax

For some reason, the second "movl -16(%ebp),%edx" isn't removed. I'm not sure yet whether this is because the sliding window is too small (the first one gets removed due to another "movl -16(%ebp),%edx" that appears earlier, so this entry does NOT appear in the sliding window, only the earlier one) or because the compiler makes some incorrect assumptions about PUSH instructions and hence thinks the value of %edx is unreliable.

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to