Hi Nataraj

Which processor is that run on? (although too close to call, it implies LEA has a latency of 2 in that case)

Kit

On 08/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
Hi

[nataraj@dflyHP ~]$ fpc ttt.pas
Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
Copyright (c) 1993-2021 by Florian Klaempfl and others
Target OS: DragonFly for x86-64
Compiling ttt.pas
Linking ttt
/usr/local/bin/ld.bfd: warning: /usr/local/lib/fpc/3.2.2/units/x86_64-dragonfly/rtl/prt0.o: missing .note.GNU-stack section implies executable stack /usr/local/bin/ld.bfd: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
121 lines compiled, 14.9 sec
[nataraj@dflyHP ~]$ ./ttt
   Pascal control case: 6.7 ns/call
 Using LEA instruction: 4.2 ns/call
Using ADD instructions: 4.0 ns/call


Nataraj S Narayan
Synergy Info Systems
Software & Technology Consultants
Ettumanoor, INDIA
Ph:+91 9443211326


On Sat, Oct 7, 2023 at 9:39 PM J. Gareth Moreton via fpc-devel <fpc-devel@lists.freepascal.org> wrote:

    That's interesting; I am interested to see the assembly output for
    the
    Pascal control cases.  As for the 64-bit version, that was my fault
    since the assembly language is for Microsoft's ABI rather than the
    System V ABI, so it was checking a register with an undefined value.
    Find attached the fixed test.

    Kit

    P.S. Results on my Intel(R) Core(TM) i7-10750H

        Pascal control case: 2.0 ns/call
      Using LEA instruction: 1.7 ns/call
    Using ADD instructions: 1.3 ns/call

    On 07/10/2023 16:51, Tomas Hajny via fpc-devel wrote:
    > On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
    >
    >
    > Hi Kit,
    >
    >> Do you think this should suffice? Originally it ran for 1,000,000
    >> repetitions but I fear that will take way too long on a 486, so I
    >> reduced it to 10,000.
    >
    > OK, I tried it now. First of all, after turning on the old
    machine, I
    > realized that it wasn't Intel but AMD 486 DX4 - sorry for my bad
    > memory. :-( I compiled and ran the test under OS/2 there (I was too
    > lazy to boot it to DOS ;-) ), but I assume that it shouldn't
    make any
    > substantial difference. The ADD and LEA results were basically the
    > same there, both around 100 ns / call. The Pascal result was around
    > twice as long. Interestingly, the Pascal result for FPC 3.2.2 was
    > around 10% longer than the same source compiled with FPC 2.0.3 (the
    > assembler versions were obviously the same for both FPC versions; I
    > tried compiling it also with FPC 1.0.10 and the assembler versions
    > were more than three times slower due to missing support for the
    > nostackframe directive).
    >
    > I tested it under the AMD Athlon 1 GHz machine as well and
    again, the
    > results for LEA and ADD are basically equal (both 3.1 ns/call)
    and the
    > result for Pascal slightly more than twice (7.3 ns/call). However,
    > rather surprisingly for me, the overall test run was _much_ longer
    > there?! Finally, I tried compiling the test on a 64-bit machine
    (AMD
    > A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3
    compiled from
    > a fresh 3.2 branch). The Pascal version shows about 4 ns/call,
    but the
    > assembler version runs forever - well, certainly much longer
    than my
    > patience lasts. I haven't tried to analyze the reasons, but that's
    > what I get.
    >
    > Tomas
    >
    >
    >
    >>
    >> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
    >>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via
    fpc-devel"
    >>> <fpc-devel@lists.freepascal.org> wrote:
    >>>
    >>>
    >>> Hii Kit,
    >>>
    >>>> This is mainly to Florian, but also to anyone else who can
    answer
    >>>> the question - at which point did a complex LEA instruction
    (using
    >>>> all three input operands and some other specific
    circumstances) get
    >>>> slow? Preliminary research suggests the 486 was when it gained
    >>>> extra latency, and then Sandy Bridge when it got particularly
    bad.
    >>>> Icy Lake seems to be the architecture where faster LEA
    instructions
    >>>> are reintroduced, but I'm not sure about AMD processors.
    >>> I cannot answer your question, but if you prepare a test
    program, I
    >>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1 GHz
    machines
    >>> if it helps you in any way (at least I hope the 486 DX2 machine
    >>> should be still able to start ;-) ).
    >>>
    >>> Tomas
    >>>
    >>> _______________________________________________
    >>> fpc-devel maillist  - fpc-devel@lists.freepascal.org
    >>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
    >>>
    >> _______________________________________________
    >> fpc-devel maillist  - fpc-devel@lists.freepascal.org
    >> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
    > _______________________________________________
    > fpc-devel maillist  - fpc-devel@lists.freepascal.org
    > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
    >_______________________________________________
    fpc-devel maillist  - fpc-devel@lists.freepascal.org
    https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


_______________________________________________
fpc-devel maillist  -fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to