On 09/30/2011 08:24 PM, Vincent Torri wrote:
>
>
> On Fri, 30 Sep 2011, Gustavo Sverzut Barbieri wrote:
>
>> On Friday, September 30, 2011, Jim Kukunas<[email protected]>
>> wrote:
>>> On Fri, Sep 30, 2011 at 12:08:03AM -0300, Gustavo Sverzut Barbieri wrote:
>>>> On Thursday, September 29, 2011, Jim Kukunas<
>>>> [email protected]>  wrote:
>>>>> Hi Folks,
>>>>>
>>>>> This patch series introduces a SSE3 implementation of Evas's common
>>>>> engine blending routines.
>>>>>
>>>>> Why SSE3?:
>>>>> The lddqu instruction, introduced in SSE3, is faster then a typical
>>>>> unaligned load in the situation where we load from, but not store to,
>>>>> an unaligned address which crosses a cache line. This yields itself
>> well
>>>>> to the blending functions which operate on two separate arrays. We
>> single
>>>>> step until we obtain an aligned address for the destination array, and
>> use
>>>>> lddqu to load the other unaligned array.
>>>>>
>>>>> Why do we need an SSE implementation?:
>>>>> GCC does perform some auto-vectorization, but misses a lot of
>>>>> opportunities for leveraging SSE, specifically when operating on
>>>>> packed integers, as opposed to floating-point. With GCC 4.6.0 and
>>>>> the CFLAGS listed below, the c implementation isn't vectorized, and
>>>>> the MMX implementation performance is suboptimal.
>>>>>
>>>>> A few tests which demonstrate the performance impact:
>>>>>
>>>>> Setup:
>>>>>     Intel Atom N270, Intel 945GME, Expedite Xlib engine
>>>>>     GCC 4.5.1  CFLAGS=-m32 -mtune=atom -O2 -msse3
>>>>>
>>>>> Rect Blend:
>>>>>     C:    21.80 FPS +/- 0.028674
>>>>>     MMX:  27.41 FPS +/- 0.021344
>>>>>     SSE3: 46.90 FPS +/- 0.376106
>>>>>
>>>>> Image Blend Fade Unscaled:
>>>>>     C:    15.46 FPS +/- 0.031314
>>>>>     MMX:  24.92 FPS +/- 0.055902
>>>>>     SSE3: 34.28 FPS +/- 0.099457
>>>>>
>>>>> Image Blend Solid Fade Unscaled:
>>>>>     C:    22.03 FPS +/- 0.097125
>>>>>     MMX:  33.78 FPS +/- 0.190351
>>>>>     SSE3: 46.86 FPS +/- 0.437874
>>>>>
>>>>> Setup:
>>>>>     Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
>>>>>     GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3
>>>>>
>>>>> Rect Blend:
>>>>>     C:    32.68 FPS +/- 0.218510
>>>>>     MMX:  29.75 FPS +/- 0.527105
>>>>>     SSE3: 54.24 FPS +/- 0.870486
>>>>>
>>>>> Image Blend Unscaled:
>>>>>     C:    32.73 FPS +/- 0.359036
>>>>>     MMX:  35.00 FPS +/- 1.099517
>>>>>     SSE3: 50.93 FPS +/- 0.990806
>>>>>
>>>>> Image Blend Occlude 3 Many:
>>>>>     C:    24.25 FPS +/- 0.213135
>>>>>     MMX:  25.87 FPS +/- 0.470124
>>>>>     SSE3: 36.96 FPS +/- 0.505757
>>>>>
>>>>> I'm sure there is further room for improvement.
>>>>>
>>>>> Let me know what you guys think.
>>>>
>>>> I think it is amazing! We were already very fast but it was improved and
>> can
>>>> be improved even more. Excellent to have intel folks hacking EFL :-)
>>>
>>> Thanks.
>>>
>>>>
>>>> Now I wonder whenever you'll try with icc and if it's supposed to yield
>>>> better performance than gcc
>>>
>>> I wasn't planning on trying with icc. There is definately room for GCC
>>> to generate better code for the SSE3 routines, and I'm not sure if ICC
>>> does or not. Either way, optimizing for GCC reaches a wider audience.
>>
>> Sure, just wondering about the results and if intel had plans to make EFL
>> work with ICC :-)
>> Likely most people will still do gcc anyway, but it's good to know
>
> well, i already compiled the EFL and e17 with suncc.
>
> I already tried a bit with icc, but as I had to register every month or
> so to get the right to use it, i gave up.

Actually, I have a valid licence on Linux so I can try to compile evas 
with it if someone is interested in by the results.

Mathieu
>
> Vincent
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> enlightenment-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to