I should have figured. Thank you!
Kit
On 27/10/2023 01:51, Nikolay Nikolov via fpc-devel wrote:
On 10/11/23 11:21, Tomas Hajny via fpc-devel wrote:
On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
Sweet, thank you. Would you be willing to share your modified test's
source? I was worried that if CPUID wasn't present it would cause a
SIGILL.
Sure, attached, but I didn't do anything special - I modified it in a
way allowing easy disabling of this detection for x86 by disabling
definition of a conditional symbol added to the source and I was
prepared to recompile with the functionality disabled on the old AMD
DX4 if needed. However, I didn't need to do so - the AMD DX4 machine
simply ignored it and chose the branch used in case of missing
support for the particular CPUID function. I have no idea if this
might be due to some protection in OS/2 Warp 4 (used for compiling
and running the test on that machine) potentially masking that
exception, or what was the reason. Apparently, it should be possible
to detect CPUID availability (albeit not 100% reliably), see
https://wiki.osdev.org/CPUID, but I didn't use that.
There's CPUID support detection code in the Free Pascal RTL for i8086
and i386. It's in unit cpu:
function cpuid_support: boolean;
Nikolay
Tomas
On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
I'm all for receiving results for all kinds of processor, as it helps
me to make more informed choices on flags as well as confirming that
Agner Fog''s instruction tables are correct. Also, results for older
processors can be hard to come by sometimes.
Currently, most architectures have a fast LEA, and the default
"Athlon" option lines up with this. Of the Intel architectures, the
speed slows down on COREAVX onwards (COREI is fine), so I added a new
COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
the point where LEA is fast again (its 16-bit version is also fast,
unlike Zen 3).
In the meantime I'll be looking at the benchmarking code that Stefan
provided to see if it can and should be integrated.
Thanks again everyone for the results you're giving.
Alright, fine (I modified your test to include the CPU name as well
if possible and added an IFDEFed distinction of 32-bits versus
64-bits):
32-bits:
CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
-----------------------------------------------------
Pascal control case: 0.85 ns/call
Using LEA instruction: 0.56 ns/call
Using ADD instructions: 0.84 ns/call
64-bits:
CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
-----------------------------------------------------
Pascal control case: 0.85 ns/call
Using LEA instruction: 0.56 ns/call
Using ADD instructions: 0.85 ns/call
32-bits:
CPU = AMD Athlon(tm) Processor
------------------------------
Pascal control case: 6.10 ns/call
Using LEA instruction: 3.40 ns/call
Using ADD instructions: 3.40 ns/call
32-bits:
(AMD DX4 100 MHz - no CPUID name)
Pascal control case: 123 ns/call
Using LEA instruction: 72 ns/call
Using ADD instructions: 73 ns/call
Tomas
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel