On Friday, September 30, 2011, Jim Kukunas <james.t.kuku...@linux.intel.com>
wrote:
> On Fri, Sep 30, 2011 at 12:08:03AM -0300, Gustavo Sverzut Barbieri wrote:
>> On Thursday, September 29, 2011, Jim Kukunas <
>> james.t.kuku...@linux.intel.com> wrote:
>> > Hi Folks,
>> >
>> > This patch series introduces a SSE3 implementation of Evas's common
>> > engine blending routines.
>> >
>> > Why SSE3?:
>> > The lddqu instruction, introduced in SSE3, is faster then a typical
>> > unaligned load in the situation where we load from, but not store to,
>> > an unaligned address which crosses a cache line. This yields itself
well
>> > to the blending functions which operate on two separate arrays. We
single
>> > step until we obtain an aligned address for the destination array, and
use
>> > lddqu to load the other unaligned array.
>> >
>> > Why do we need an SSE implementation?:
>> > GCC does perform some auto-vectorization, but misses a lot of
>> > opportunities for leveraging SSE, specifically when operating on
>> > packed integers, as opposed to floating-point. With GCC 4.6.0 and
>> > the CFLAGS listed below, the c implementation isn't vectorized, and
>> > the MMX implementation performance is suboptimal.
>> >
>> > A few tests which demonstrate the performance impact:
>> >
>> > Setup:
>> >    Intel Atom N270, Intel 945GME, Expedite Xlib engine
>> >    GCC 4.5.1  CFLAGS=-m32 -mtune=atom -O2 -msse3
>> >
>> > Rect Blend:
>> >    C:    21.80 FPS +/- 0.028674
>> >    MMX:  27.41 FPS +/- 0.021344
>> >    SSE3: 46.90 FPS +/- 0.376106
>> >
>> > Image Blend Fade Unscaled:
>> >    C:    15.46 FPS +/- 0.031314
>> >    MMX:  24.92 FPS +/- 0.055902
>> >    SSE3: 34.28 FPS +/- 0.099457
>> >
>> > Image Blend Solid Fade Unscaled:
>> >    C:    22.03 FPS +/- 0.097125
>> >    MMX:  33.78 FPS +/- 0.190351
>> >    SSE3: 46.86 FPS +/- 0.437874
>> >
>> > Setup:
>> >    Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
>> >    GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3
>> >
>> > Rect Blend:
>> >    C:    32.68 FPS +/- 0.218510
>> >    MMX:  29.75 FPS +/- 0.527105
>> >    SSE3: 54.24 FPS +/- 0.870486
>> >
>> > Image Blend Unscaled:
>> >    C:    32.73 FPS +/- 0.359036
>> >    MMX:  35.00 FPS +/- 1.099517
>> >    SSE3: 50.93 FPS +/- 0.990806
>> >
>> > Image Blend Occlude 3 Many:
>> >    C:    24.25 FPS +/- 0.213135
>> >    MMX:  25.87 FPS +/- 0.470124
>> >    SSE3: 36.96 FPS +/- 0.505757
>> >
>> > I'm sure there is further room for improvement.
>> >
>> > Let me know what you guys think.
>>
>> I think it is amazing! We were already very fast but it was improved and
can
>> be improved even more. Excellent to have intel folks hacking EFL :-)
>
> Thanks.
>
>>
>> Now I wonder whenever you'll try with icc and if it's supposed to yield
>> better performance than gcc
>
> I wasn't planning on trying with icc. There is definately room for GCC
> to generate better code for the SSE3 routines, and I'm not sure if ICC
> does or not. Either way, optimizing for GCC reaches a wider audience.

Sure, just wondering about the results and if intel had plans to make EFL
work with ICC :-)
Likely most people will still do gcc anyway, but it's good to know


>> Last but not least what's your target driver for gl/composite? Is it
powervr
>> based? Or the intel one with open drivers?
>
> All of my tests were conducted with Intel integrated graphics running
> the open source drivers.

But you ran software engine, not gl.
Once I used to have an intel GPU and it was a pain with evas from time to
time. Now I'm using nvidia and it's basically stable and fast. Raster is the
one to praise, as he hacks on nvidia and gave me this insight. Would be
amazing to have better "evas on intel gpu", most would agree.

Again, just poking what are public intel plans :-)


-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--------------------------------------
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to