To show current progress, the .section .text.n_system.math.vectors$_$tvector3d_$__$$_$plus$tvector3d$tvector3d$$tvector3d,"ax" routine ("class operator TVector3D.+(const aVector1, aVector2: TVector3D): TVector3D;") on x86_64-win64 - before (this performs a component-wise addition of two 4-component vectors... inputs are pointed to by %rdx and %r8, and the result is pointed to by %rcx):

    movss    (%rdx),%xmm0
    addss    (%r8),%xmm0
    movss    %xmm0,(%rcx)
    movss    4(%rdx),%xmm0
    addss    4(%r8),%xmm0
    movss    %xmm0,4(%rcx)
    movss    8(%rdx),%xmm0
    addss    8(%r8),%xmm0
    movss    %xmm0,8(%rcx)
    movss    12(%rdx),%xmm0
    addss    12(%r8),%xmm0
    movss    %xmm0,12(%rcx)


    movups    (%rdx),%xmm0
    addps    (%r8),%xmm0
    movups    %xmm0,(%rcx)

(Note that this unit is NOT suitable for 3D graphics because the W-coordinate is treated the same as X, Y and Z rather than being kept at 1, say)

It's not perfect though.  When vectorcall or the System V ABI is concerned, it can produce worse code because of constantly needing to break up the vector to manipulate individual components, and there may be some unnecessary reads and writes between the stack and XMM registers, but I'm working on it, bit by it.

My current branch can be found here:


On 23/08/2024 17:56, J. Gareth Moreton via fpc-devel wrote:

Hi everyone,

So I'm getting ready to showcase my current vector work to others.  I do have a question though...

Currently the feature is locked behind "-Sv", since this is specificially "support vector processing" and the feature is still experimental and inefficient in places, but is this the right approach?  I ask because the -S switches are specifically syntax options, not code generation options (I do wonder exactly what syntax it enables).  Also, at least with the "make" script, it skips whole program optimisation and some of the packages.

Should I use a compiler definition instead like "-dX86_VECTORS"?  That way, the feature can easily be turned off.


On 21/08/2024 15:59, J. Gareth Moreton via fpc-devel wrote:

Hi everyone,

Just thought I'd give a heads-up on what's happening with me and the compiler improvements.  Also, I've been busy with contract work and have just had some minor surgery, so I'm not running on all cylinders currently.

  * Still waiting on administrator comments and feedback on my
    assembly-level CSE feature (a couple of years old now) and the
    first part of pure functions.  Both of these should be ready to
    merge unless someone found a bug that breaks things (someone did
    find some examples with pure functions which have since been fixed).
  * Haven't solved the SEH unwinding problem on aarch64-win64 yet. 
    This is quite a tough one!
  * Also working on vectorisation for x86_64 platforms.  I've got it
    working on win64, and can vectorise two-operand commutative
    operations like addition and multiplication, although some of the
    generated code is less than optimal (unnecessarily copying
    vectors to the stack).  Linux (and other OSes that use the System
    V ABI) is taking a bit longer since it stores pairs of floats in
    single XMM registers even without vectorisation code, and some of
    the internal procedures can't properly handle these if the desire
    is to combine a pair of these such registers (so 4 singles) into
    a single XMM vector, especially where shuffling is involved.

I'll let you know the progress.


<> <>


fpc-devel maillist

fpc-devel maillist

This email has been checked for viruses by Avast antivirus software.
fpc-devel maillist  -

Reply via email to