The LEA and ADD times are close enough that I can consider them
identical. And Braswell (the architecture behind that brand of Celeron)
doesn't support AVX, I don't think, so that lines up with COREI having a
fast LEA instruction but not COREAVX.
Given the many different x86-compatible CPUs, I wonder if we need to
document the best compiler parameters for end users in some way (e.g. so
it can be coded in a device driver installer so the most optimised
binary can be installed for a given CPU architecture).
Kit
On 11/10/2023 05:56, Christo Crause wrote:
On Tue, Oct 10, 2023 at 11:13 AM J. Gareth Moreton via fpc-devel
<fpc-devel@lists.freepascal.org> wrote:
Thanks Tomas,
Nothing is broken, but the timing measurement isn't precise enough.
Normally I have a much higher iteration count (e.g. 1,000,000), but I
had reduced it to 10,000 because, coupled with the 1,000 iterations in
the subroutines themselves, would have led to 1,000,000,000 passes and
hence would take in the region of five to ten minutes to complete for a
16 MHz 386, for example. Rika's suggestion of running as many
iterations as needed until, say, 5 seconds elapses, would help but the
timing measurements would cause a lot of latency and will be imprecise
on very slow routines. Still, let's see if 100,000 gives better results
for you.
Kit
Results on a modest CPU:
CPU = Intel(R) Celeron(R) CPU N3050 @ 1.60GHz
-----------------------------------------------------
Pascal control case: 6.71 ns/call
Using LEA instruction: 2.09 ns/call
Using ADD instructions: 2.05 ns/call
32 bits:
Pascal control case: 6.78 ns/call
Using LEA instruction: 2.16 ns/call
Using ADD instructions: 2.09 ns/call
Results show a bit of variance, above numbers are more or less typical.
Christo
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel