[fpc-devel] Vectorization

2017-12-09 Thread J. Gareth Moreton
Hi everyone,

Since I'm masochistic in my desire to understand and improve the Free Pascal 
Compiler, I would like to add 
some vectorisation support in its optimisation cycle, since that is one thing 
that many other compilers 
attempt to do these days.  But before I begin, does FPC support any kind of 
vectorisation already?  If it 
does I haven't been able to find it yet, and I don't want to end up reinventing 
the wheel.

I recall things, for example, where the following is not optimised even if the 
compiler is set to use SSE:

type
  TVector4f = packed record
X, Y, Z, W: Single;
  end;

function VectorAdd(A, B: TVector4f): TVector4f;
begin
  Result.X := A.X + B.X;
  Result.Y := A.Y + B.Y;
  Result.Z ;= A.Z + B.Z;
  Result.W := A.W + B.W;
end;

The resultant assembler code yields an individual "MOVSS" and arithmetic for 
each element rather than 
combining the reads and writes into a MOVUPS instruction and reducing the 
number of arithmetic instructions 
by a factor of 4.  For clarity, this is the assembler produced with '-CfSSE64':

.section .text.n_p$testfile_$$_addvector$tvector4f$tvector4f$$tvector4f,"x"
.balign 16,0x90
.globl  P$TESTFILE_$$_ADDVECTOR$TVECTOR4F$TVECTOR4F$$TVECTOR4F
P$TESTFILE_$$_ADDVECTOR$TVECTOR4F$TVECTOR4F$$TVECTOR4F:
.Lc1:
.seh_proc P$TESTFILE_$$_ADDVECTOR$TVECTOR4F$TVECTOR4F$$TVECTOR4F
leaq-56(%rsp),%rsp
.Lc3:
.seh_stackalloc 56
.seh_endprologue
movq%rcx,%rax
movq%rdx,(%rsp)
movq%r8,8(%rsp)
movq(%rsp),%rdx
movq(%rdx),%rcx
movq%rcx,16(%rsp)
movq8(%rdx),%rdx
movq%rdx,24(%rsp)
movq8(%rsp),%rdx
movq(%rdx),%rcx
movq%rcx,32(%rsp)
movq8(%rdx),%rdx
movq%rdx,40(%rsp)
movss   16(%rsp),%xmm0
addss   32(%rsp),%xmm0
movss   %xmm0,(%rax)
movss   20(%rsp),%xmm0
addss   36(%rsp),%xmm0
movss   %xmm0,4(%rax)
movss   24(%rsp),%xmm0
addss   40(%rsp),%xmm0
movss   %xmm0,8(%rax)
movss   28(%rsp),%xmm0
addss   44(%rsp),%xmm0
movss   %xmm0,12(%rax)
leaq56(%rsp),%rsp
ret
.seh_endproc
.Lc2:

A good vectoriser (for lack of a better name!) would be able to optimise the 12 
movss/addss routines to just 
"movups 16(%rsp),%xmm0  addps 32(%rsp),%xmm0  movups %xmm0,(%rax)" - since the 
stack is aligned to a 16-byte 
boundary, it can swap out the first movups to a movaps too.  Not sure what to 
do regarding moving everything 
to the stack first though.

I'm sure it's a mammoth task, but I would like to start somewhere with it - 
however, are there any design 
plans that I should be adhering to so I don't end up designing something that 
is disliked?

Kit
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Quickly recompiling fpc

2017-12-09 Thread Sven Barth via fpc-devel
Am 09.12.2017 14:57 schrieb "Benito van der Zander" :

Hi,
how do you recompile fpc after making a small change in the compiler, like
enabling the debugmsg define in x86/aoptx86.pas?

make buildbase says nothing was changed, and make clean; make buildbase
recompiles not just the compiler, but also the rtl, which is a waste of
time.


Build it in the Lazarus IDE using the corresponding project for your
platform (this will result in a compiler//pp(.exe) binary).
Other than that "make cycle" inside the compiler directory *is* the
recommended way to rebuild the compiler even if it rebuilds both the
compiler and the RTL multiple times.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Quickly recompiling fpc

2017-12-09 Thread Benito van der Zander

Hi,

how do you recompile fpc after making a small change in the compiler, 
like enabling the debugmsg define in x86/aoptx86.pas?


make buildbase says nothing was changed, and make clean; make buildbase 
recompiles not just the compiler, but also the rtl, which is a waste of 
time.


Bye,

Benito


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel