Re: [fpc-devel] Prototype optimisation... Sliding Window

J. Gareth Moreton via fpc-devel Fri, 11 Mar 2022 05:34:03 -0800

So far I haven't managed to do very accurate timing measurements - I'lltry to resolve that and get more accurate figures, especially as I'mstill writing up the whitepaper explaining everything before I make amerge request. Compilation time I'm not hugely concerned about sincethis sliding window feature is only enabled on -O3 and higher, but thefaster I can get it, the better, especially as this family of Pascalcompilers pride themselves on compilation speed.


Gareth aka. Kit


On 11/03/2022 12:43, Flávio Etrusco via fpc-devel wrote:

Kudos! In the end, did it make much of a difference in the compilation time?

Em sex., 25 de fev. de 2022 às 02:08, J. Gareth Moreton via fpc-devel
<fpc-devel@lists.freepascal.org> escreveu:

I did it!

After a good week of work and getting things wrong, I finally found a
solution that works nicely and is extensible, at least for x86.  A bit
of refactoring and it can be ported to other platforms.  I'm just
running the test suites to see if I can break things now.  Honestly the
hard part was making sure all the registers were tracked properly.

https://gitlab.com/CuriousKit/optimisations/-/commits/sliding-window

Please feel free to test it and break it!

It will likely undergo adjustments over time to refactor bits and
pieces, add extra instructions (especially as it doesn't support SHRX
and SHLX yet, for example) and see if I can make the sliding window more
efficient - I increased it in size from 8 to 16 and then 32 entries,
since even at 16, some optimisations were missed in the RTL, but this
depends on a number of factors.

It's gotten some pretty good optimisations.  On x86_64-win64 under -O4,
the three lazarus binaries... before:

lazarus.exe:

313,990,206
313,873,982

lazbuild.exe

59,704,391
59,725,895

startlazarus.exe

27,471,147
27,461,419

And the compiler itself, ppcx64.exe:

3,777,536
3,766,272

Remember that though I call the component a "sliding window", it's only
because it's very similar to how LZ77 finds start points for run-length
encoding, and the comments and variable names mention run-length
encoding as it scans sequences of identical instructions.  However, at
no point is it actually performing data compression, and the end-result
is common subexpression elimination.  All the space savings are from the
removal of redundant sequences of instructions, giving both a space and
a speed boost.

Almost every source file in the compiler and the RTL shows some kind of
improvement.  A lot of them are just redundant pointer deallocations, so
this will help with cache misses and the like - that aside though, here
are a couple of my favourites... one from dbgdwarf -
TDebugInfoDwarf.appendsym_fieldvar_with_name_offset:

Before:

      ...
.Lj682:
      leaq    (,%r13,8),%rcx
      movq    120(%rdi),%rax
      cqto
      idivq    %rcx
      imulq    %r13,%rax
      movq    %rax,%r12
      addq    56(%rbp),%r12
      leaq    (,%r13,8),%rcx
      movq    120(%rdi),%rax
      cqto
      idivq    %rcx
      movq    %rdx,%rsi
      cmpb    $0,U_$SYSTEMS_$$_TARGET_INFO+276(%rip)

After:

      ...
.Lj682:
      leaq    (,%r13,8),%rcx
      movq    120(%rdi),%rax
      cqto
      idivq    %rcx
      imulq    %r13,%rax
      movq    %rax,%r12
      addq    56(%rbp),%r12
      movq    %rdx,%rsi
      cmpb    $0,U_$SYSTEMS_$$_TARGET_INFO+276(%rip)

This one has successfully removed an IDIV instruction because the
algorithm was able to detect that the subroutine wanted the remainder in
%edx, and it was still available from the first IDIV call because it
hadn't been overwritten, and neither %r13 nor %rdi had changed values so
the references are the same.

This one is from SysUtils' IsLeapYear function and is one I have
personally wanted to optimise further ever since I first saw its
disassembly after I implemented the fast "x mod const = 0" algorithm.

Before:

      ...
      imulw    $23593,%cx,%ax
      rorw    $2,%ax
      cmpw    $655,%ax
      ja    .Lj6979
      imulw    $23593,%cx,%ax
      rorw    $4,%ax
      cmpw    $163,%ax
      setbeb    %al
      ret
.Lj6979:
      ...

After:

      ...
      imulw    $23593,%cx,%ax
      rorw    $2,%ax
      cmpw    $655,%ax
      ja    .Lj6979
      rorw    $2,%ax
      cmpw    $163,%ax
      setbeb    %al
      ret
.Lj6979:
      ...

In this case, the RLE doesn't produce an exact match since the end
result of %ax is different, but the important point to note is that as
long as %cx doesn't change value (which it doesn't). the first sequence
can be transformed into the second sequence via a "rorw $2,%ax"
instruction, so the end result is that "imulw $23593,%cx,%ax; rorw
$4,%ax" is transmuted into "rorw $2,%ax" based on the previous value of
%ax, thus removing a multiplication.

Because this has been a mammoth undertaking and is quite a major
addition to the compiler, I'm going to hold off making a merge request
until I write a document on it, like I've done in the past with some of
the other larger optimisations I've made.  The branch is available at
the link at the top of this e-mail though if anyone wants to take a look
at it.

One final note is that this optimisation is rather slow and specialist,
so much so that I've added a new optimizer switch named 'cs_opt_asmcse'
("Assembly CSE", to differentiate it from "Node CSE"); this is enabled
by default with -O3, and requires the peephole optimizer to also be
enabled in order to function.

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Prototype optimisation... Sliding Window

Reply via email to