On Thursday, September 29, 2011, Jim Kukunas <
james.t.kuku...@linux.intel.com> wrote:
> Hi Folks,
>
> This patch series introduces a SSE3 implementation of Evas's common
> engine blending routines.
>
> Why SSE3?:
> The lddqu instruction, introduced in SSE3, is faster then a typical
> unaligned load in the situation where we load from, but not store to,
> an unaligned address which crosses a cache line. This yields itself well
> to the blending functions which operate on two separate arrays. We single
> step until we obtain an aligned address for the destination array, and use
> lddqu to load the other unaligned array.
>
> Why do we need an SSE implementation?:
> GCC does perform some auto-vectorization, but misses a lot of
> opportunities for leveraging SSE, specifically when operating on
> packed integers, as opposed to floating-point. With GCC 4.6.0 and
> the CFLAGS listed below, the c implementation isn't vectorized, and
> the MMX implementation performance is suboptimal.
>
> A few tests which demonstrate the performance impact:
>
> Setup:
>    Intel Atom N270, Intel 945GME, Expedite Xlib engine
>    GCC 4.5.1  CFLAGS=-m32 -mtune=atom -O2 -msse3
>
> Rect Blend:
>    C:    21.80 FPS +/- 0.028674
>    MMX:  27.41 FPS +/- 0.021344
>    SSE3: 46.90 FPS +/- 0.376106
>
> Image Blend Fade Unscaled:
>    C:    15.46 FPS +/- 0.031314
>    MMX:  24.92 FPS +/- 0.055902
>    SSE3: 34.28 FPS +/- 0.099457
>
> Image Blend Solid Fade Unscaled:
>    C:    22.03 FPS +/- 0.097125
>    MMX:  33.78 FPS +/- 0.190351
>    SSE3: 46.86 FPS +/- 0.437874
>
> Setup:
>    Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
>    GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3
>
> Rect Blend:
>    C:    32.68 FPS +/- 0.218510
>    MMX:  29.75 FPS +/- 0.527105
>    SSE3: 54.24 FPS +/- 0.870486
>
> Image Blend Unscaled:
>    C:    32.73 FPS +/- 0.359036
>    MMX:  35.00 FPS +/- 1.099517
>    SSE3: 50.93 FPS +/- 0.990806
>
> Image Blend Occlude 3 Many:
>    C:    24.25 FPS +/- 0.213135
>    MMX:  25.87 FPS +/- 0.470124
>    SSE3: 36.96 FPS +/- 0.505757
>
> I'm sure there is further room for improvement.
>
> Let me know what you guys think.

I think it is amazing! We were already very fast but it was improved and can
be improved even more. Excellent to have intel folks hacking EFL :-)

Now I wonder whenever you'll try with icc and if it's supposed to yield
better performance than gcc

Last but not least what's your target driver for gl/composite? Is it powervr
based? Or the intel one with open drivers?


>
> Thanks.
>
>
>
>
------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> enlightenment-devel mailing list
> enlightenment-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
>

-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--------------------------------------
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to