Re: [E-devel] [RFC]Evas common engine SSE3 blend op implementation

Jim Kukunas Fri, 30 Sep 2011 11:21:06 -0700

On Fri, Sep 30, 2011 at 12:39:20PM +0900, Carsten Haitzler wrote:
> On Thu, 29 Sep 2011 10:42:29 -0700 Jim Kukunas
> <james.t.kuku...@linux.intel.com> said:
> 
> well.. lucas committed this without me getting around to my review... i found
> several issues with it. A_MASK_SSE3 was being declared all the time and never
> used in the inline funcs. it was ONLY used in 1 of the c files. i moved it
> there. also you called the C init funcs for rel ops - not the sse3 ones. copy 
> &
> paste bug. also unused return value warnings in cpu sse3 detection function.


Whoops. Thanks for fixing these issues.

> 
> i ran a full expedite run:
> 
> http://www.enlightenment.org/~raster/speed.html
> 
> (i5-2500 CPU @ 3.30GHz, GeForce GTS 450, e17 running with OpenGL compositor).
> 
> just as a comparison - after the sse3 speedups, speed vs the nvidia gpu:
> 
> http://www.enlightenment.org/~raster/speedgl.html
> 
> i haven't tested against an atom yet.

Cool. I think these patches really shine on the atom.

There is a much bigger difference between 21 frames and 46 frames, then
betweeen 179 frames and 397 frames.

> 
> jim -> good work! if you want a line in AUTHORS, please send it along. in
> future add an AUTHORS line as part of your patch (so we get right "real name"
> and email).

Thanks. Looks like Caro already added me.

> 
> > Hi Folks,
> > 
> > This patch series introduces a SSE3 implementation of Evas's common
> > engine blending routines.
> > 
> > Why SSE3?: 
> > The lddqu instruction, introduced in SSE3, is faster then a typical
> > unaligned load in the situation where we load from, but not store to,
> > an unaligned address which crosses a cache line. This yields itself well
> > to the blending functions which operate on two separate arrays. We single
> > step until we obtain an aligned address for the destination array, and use
> > lddqu to load the other unaligned array.
> > 
> > Why do we need an SSE implementation?:
> > GCC does perform some auto-vectorization, but misses a lot of
> > opportunities for leveraging SSE, specifically when operating on
> > packed integers, as opposed to floating-point. With GCC 4.6.0 and
> > the CFLAGS listed below, the c implementation isn't vectorized, and
> > the MMX implementation performance is suboptimal.
> > 
> > A few tests which demonstrate the performance impact:
> > 
> > Setup:
> >     Intel Atom N270, Intel 945GME, Expedite Xlib engine
> >     GCC 4.5.1  CFLAGS=-m32 -mtune=atom -O2 -msse3
> > 
> > Rect Blend:
> >     C:    21.80 FPS +/- 0.028674
> >     MMX:  27.41 FPS +/- 0.021344
> >     SSE3: 46.90 FPS +/- 0.376106
> > 
> > Image Blend Fade Unscaled:
> >     C:    15.46 FPS +/- 0.031314
> >     MMX:  24.92 FPS +/- 0.055902
> >     SSE3: 34.28 FPS +/- 0.099457
> > 
> > Image Blend Solid Fade Unscaled:
> >     C:    22.03 FPS +/- 0.097125
> >     MMX:  33.78 FPS +/- 0.190351
> >     SSE3: 46.86 FPS +/- 0.437874
> > 
> > Setup:
> >     Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
> >     GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3
> > 
> > Rect Blend:
> >     C:    32.68 FPS +/- 0.218510
> >     MMX:  29.75 FPS +/- 0.527105
> >     SSE3: 54.24 FPS +/- 0.870486
> > 
> > Image Blend Unscaled:
> >     C:    32.73 FPS +/- 0.359036
> >     MMX:  35.00 FPS +/- 1.099517
> >     SSE3: 50.93 FPS +/- 0.990806
> > 
> > Image Blend Occlude 3 Many:
> >     C:    24.25 FPS +/- 0.213135
> >     MMX:  25.87 FPS +/- 0.470124
> >     SSE3: 36.96 FPS +/- 0.505757
> > 
> > I'm sure there is further room for improvement.
> > 
> > Let me know what you guys think.
> > 
> > Thanks.
> > 
> > 
> > 
> > ------------------------------------------------------------------------------
> > All the data continuously generated in your IT infrastructure contains a
> > definitive record of customers, application performance, security
> > threats, fraudulent activity and more. Splunk takes this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy1
> > _______________________________________________
> > enlightenment-devel mailing list
> > enlightenment-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
> > 
> 
> 
> -- 
> ------------- Codito, ergo sum - "I code, therefore I am" --------------
> The Rasterman (Carsten Haitzler)    ras...@rasterman.com

-- 
Jim Kukunas
Intel Open Source Technology Center

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] [RFC]Evas common engine SSE3 blend op implementation

Reply via email to