Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-18 Thread Simon Kissel
Hi Florian,

> Compile the benchmark with (where fpcnew is the newly build fpc):

Bero has confirmed, works for us as well. This rocks!

> The changes help also on arm and arm can be build using the same
> command line, however, at least on a Raspi3B+ the
> improvement is less significant than on i386 (still the old cache
> flush (?) issue which is outside of the scope of FPC?).

We'll try that next. And yes, on the bloody Kirkwood CPU which we use
a context switch will result in a CPU cache flush.

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-18 Thread Florian Klämpfl
Am 17.11.2018 um 22:28 schrieb Florian Klämpfl:
> Am 17.11.2018 um 22:10 schrieb Simon Kissel:
>> Hi Florian,
>>
>>> With some compiler tuning and a few tricks (two changes to the code
>>> and hand-simulated peephole optimizations, but I
>>> think these tricks can also the compiler do):
>>
>> Nice - what changes did you do?
>>
>> Changing the code of course is cheating, but there might be something
>> to learn for us, here.
> 
> I prevented the compiler to put certain variables in registers by taking 
> their address :) But I did so only to test if
> this helps and for i386 this helps as the decision which variables go into 
> registers is not that easy, but see below.
> 
>>
>> Would be great if whatever trick you did could be part of the
>> compiler.
> 
> Meanwhile the compiler can do it (not yet committed). Same VM as yesterday, 
> all rates are a little bit lower, not sure
> why (probably to many VMs open :)), but this applies to all three executables.
> 
> florian@ubuntu32:~$ ./vipribenchmemcache_nodeps

With rev. 40346 I have committed my last changes. As the code is still 
experimental, it needs to be activated by the
command line when building FPC:

make clean all "OPT=-Aas -dtls_threadvars -O4 -dSPILLING_NEW"

(add -Cp... -Op... options if the target system is known)

Compile the benchmark with (where fpcnew is the newly build fpc):

fpcnew -O4 -Sd -FWvipri.wpo -OWDEVIRTCALLS,OPTVMTS vipribenchmemcache_nodeps.dpr
fpcnew -O4 -Sd -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS vipribenchmemcache_nodeps.dpr

The changes help also on arm and arm can be build using the same command line, 
however, at least on a Raspi3B+ the
improvement is less significant than on i386 (still the old cache flush (?) 
issue which is outside of the scope of FPC?).
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel