Re: [fpc-devel] using sse2 packed doubles

Vincent Snijders Sun, 08 Oct 2006 05:10:22 -0700

Florian Klaempfl wrote:

Vincent Snijders schrieb:
Daniël Mantione wrote:
Op Fri, 6 Oct 2006, schreef Micha Nelissen:
Vincent Snijders wrote:
You could also start an assembler implementation of the matrix unit.I suppose using it is allowed, and a Tvector2_double looks a lot likesuch a double2.
Unless the compiler somehow helps, inlining the assemblerimplementation won't work and then the speedup might be lost again.
I started to add vector pascal like support, currently only i386/x86_64are supported (no generic support). The whole (currently implemented)functionality is demonstrated by the following example. Please give somefeedback if it allows benchmark speedups.


Thanks Florian, for starting the vector support.

I think this would help speedup in benchmarks. I cannot give realestimates how much, maybe 20 % or so.


There are some problems still some bugs (or things not implemented).

Given the following program:

var
  ad1,ad2,ad3 : array[0..1] of double;

begin
  ad2[0] := 1;
  ad2[1] := 3;
  ad3[0] := 9;
  ad2[1] := 12;
  ad1:=ad2+ad3;
  writeln(ad1[1]);
end.

It writes:
 0.000000000000000E+000

Looking at the assembler, I see the ad1 in the writeln is read frommemory, but the ad1 is still only in the xmm0% register.


Further I encountered problems with the alignment.

Vincent

Vincent

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] using sse2 packed doubles

Reply via email to