Re: [fpc-devel] LEA instruction speed

Tomas Hajny via fpc-devel Tue, 10 Oct 2023 17:48:13 -0700

On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:

I'm all for receiving results for all kinds of processor, as it helps
me to make more informed choices on flags as well as confirming that
Agner Fog''s instruction tables are correct. Also, results for older
processors can be hard to come by sometimes.


Currently, most architectures have a fast LEA, and the default
"Athlon" option lines up with this.  Of the Intel architectures, the
speed slows down on COREAVX onwards (COREI is fine), so I added a new
COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
the point where LEA is fast again (its 16-bit version is also fast,
unlike Zen 3).

In the meantime I'll be looking at the benchmarking code that Stefan
provided to see if it can and should be integrated.

Thanks again everyone for the results you're giving.

Alright, fine (I modified your test to include the CPU name as well ifpossible and added an IFDEFed distinction of 32-bits versus 64-bits):


32-bits:
CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
-----------------------------------------------------
   Pascal control case: 0.85 ns/call
 Using LEA instruction: 0.56 ns/call
Using ADD instructions: 0.84 ns/call

64-bits:
CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
-----------------------------------------------------
   Pascal control case: 0.85 ns/call
 Using LEA instruction: 0.56 ns/call
Using ADD instructions: 0.85 ns/call


32-bits:
CPU = AMD Athlon(tm) Processor
------------------------------
   Pascal control case: 6.10 ns/call
 Using LEA instruction: 3.40 ns/call
Using ADD instructions: 3.40 ns/call


32-bits:
(AMD DX4 100 MHz - no CPUID name)
   Pascal control case: 123 ns/call
 Using LEA instruction: 72 ns/call
Using ADD instructions: 73 ns/call

Tomas

On 10/10/2023 11:54, Tomas Hajny via fpc-devel wrote:
On 2023-10-10 12:19, Marco van de Voort via fpc-devel wrote:
Op 10-10-2023 om 11:13 schreef J. Gareth Moreton via fpc-devel:
Thanks Tomas,

Nothing is broken, but the timing measurement isn't precise enough.
Normally I have a much higher iteration count (e.g. 1,000,000), butI had reduced it to 10,000 because, coupled with the 1,000iterations in the subroutines themselves, would have led to1,000,000,000 passes and hence would take in the region of five toten minutes to complete for a 16 MHz 386, for example. Rika'ssuggestion of running as many iterations as needed until, say, 5seconds elapses, would help but the timing measurements would causea lot of latency and will be imprecise on very slow routines. Still, let's see if 100,000 gives better results for you.
I had the same problem, and now it is stable  Ryzen 5700X (ZEN3)

   Pascal control case: 0.7 ns/call
 Using LEA instruction: 0.4 ns/call
Using ADD instructions: 0.7 ns/call
Indeed, it's much more consistent now, attached a new log for both32-bit and 64-bit versions from the Intel machine with Windows.Apparently, ADD is still somewhat faster on such "newer" Intelmachines (at least if not considering the potential parallelism of LEAdiscussed previously). I can try this version on my AMD machines latertonight if considered useful - please, let me know which results wouldbe relevant for you in that case (out of the ancient AMD DX4, onlyslightly less ancient AMD Athlon 1 GHz and the still rather reasonableAMD A9).
Tomas

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] LEA instruction speed

Reply via email to