Re: [fpc-devel] using sse2 packed doubles

Daniël Mantione Sun, 08 Oct 2006 01:41:28 -0700


Op Sat, 7 Oct 2006, schreef Florian Klaempfl:

> Vincent Snijders schrieb:
> > Daniël Mantione wrote:
> > > 
> > > Op Fri, 6 Oct 2006, schreef Micha Nelissen:
> > > 
> > > 
> > > > Vincent Snijders wrote:
> > > > 
> > > You could also start an assembler implementation of the matrix unit.
> > > I suppose using it is allowed, and a Tvector2_double looks a lot like
> > > such a double2.
> > 
> > Unless the compiler somehow helps, inlining the assembler implementation
> > won't work and then the speedup might be lost again.
> 
> I started to add vector pascal like support, currently only i386/x86_64 are
> supported (no generic support). The whole (currently implemented)
> functionality is demonstrated by the following example. Please give some
> feedback if it allows benchmark speedups.

To get a large speedup, I think you should instead of making pairs of 
doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000 
pixels wide, so, make an array of 3000 doubles, and do the operation with 
arrays. With proper compiler optimization, it should be possible to 
achieve speeds close to 2 flops a clock cycle.

Daniël

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] using sse2 packed doubles

Reply via email to