On 09/30/2011 08:24 PM, Vincent Torri wrote: > > > On Fri, 30 Sep 2011, Gustavo Sverzut Barbieri wrote: > >> On Friday, September 30, 2011, Jim Kukunas<[email protected]> >> wrote: >>> On Fri, Sep 30, 2011 at 12:08:03AM -0300, Gustavo Sverzut Barbieri wrote: >>>> On Thursday, September 29, 2011, Jim Kukunas< >>>> [email protected]> wrote: >>>>> Hi Folks, >>>>> >>>>> This patch series introduces a SSE3 implementation of Evas's common >>>>> engine blending routines. >>>>> >>>>> Why SSE3?: >>>>> The lddqu instruction, introduced in SSE3, is faster then a typical >>>>> unaligned load in the situation where we load from, but not store to, >>>>> an unaligned address which crosses a cache line. This yields itself >> well >>>>> to the blending functions which operate on two separate arrays. We >> single >>>>> step until we obtain an aligned address for the destination array, and >> use >>>>> lddqu to load the other unaligned array. >>>>> >>>>> Why do we need an SSE implementation?: >>>>> GCC does perform some auto-vectorization, but misses a lot of >>>>> opportunities for leveraging SSE, specifically when operating on >>>>> packed integers, as opposed to floating-point. With GCC 4.6.0 and >>>>> the CFLAGS listed below, the c implementation isn't vectorized, and >>>>> the MMX implementation performance is suboptimal. >>>>> >>>>> A few tests which demonstrate the performance impact: >>>>> >>>>> Setup: >>>>> Intel Atom N270, Intel 945GME, Expedite Xlib engine >>>>> GCC 4.5.1 CFLAGS=-m32 -mtune=atom -O2 -msse3 >>>>> >>>>> Rect Blend: >>>>> C: 21.80 FPS +/- 0.028674 >>>>> MMX: 27.41 FPS +/- 0.021344 >>>>> SSE3: 46.90 FPS +/- 0.376106 >>>>> >>>>> Image Blend Fade Unscaled: >>>>> C: 15.46 FPS +/- 0.031314 >>>>> MMX: 24.92 FPS +/- 0.055902 >>>>> SSE3: 34.28 FPS +/- 0.099457 >>>>> >>>>> Image Blend Solid Fade Unscaled: >>>>> C: 22.03 FPS +/- 0.097125 >>>>> MMX: 33.78 FPS +/- 0.190351 >>>>> SSE3: 46.86 FPS +/- 0.437874 >>>>> >>>>> Setup: >>>>> Intel Atom N455, Intel GMA 3150, Expedite Xlib engine >>>>> GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3 >>>>> >>>>> Rect Blend: >>>>> C: 32.68 FPS +/- 0.218510 >>>>> MMX: 29.75 FPS +/- 0.527105 >>>>> SSE3: 54.24 FPS +/- 0.870486 >>>>> >>>>> Image Blend Unscaled: >>>>> C: 32.73 FPS +/- 0.359036 >>>>> MMX: 35.00 FPS +/- 1.099517 >>>>> SSE3: 50.93 FPS +/- 0.990806 >>>>> >>>>> Image Blend Occlude 3 Many: >>>>> C: 24.25 FPS +/- 0.213135 >>>>> MMX: 25.87 FPS +/- 0.470124 >>>>> SSE3: 36.96 FPS +/- 0.505757 >>>>> >>>>> I'm sure there is further room for improvement. >>>>> >>>>> Let me know what you guys think. >>>> >>>> I think it is amazing! We were already very fast but it was improved and >> can >>>> be improved even more. Excellent to have intel folks hacking EFL :-) >>> >>> Thanks. >>> >>>> >>>> Now I wonder whenever you'll try with icc and if it's supposed to yield >>>> better performance than gcc >>> >>> I wasn't planning on trying with icc. There is definately room for GCC >>> to generate better code for the SSE3 routines, and I'm not sure if ICC >>> does or not. Either way, optimizing for GCC reaches a wider audience. >> >> Sure, just wondering about the results and if intel had plans to make EFL >> work with ICC :-) >> Likely most people will still do gcc anyway, but it's good to know > > well, i already compiled the EFL and e17 with suncc. > > I already tried a bit with icc, but as I had to register every month or > so to get the right to use it, i gave up.
Actually, I have a valid licence on Linux so I can try to compile evas with it if someone is interested in by the results. Mathieu > > Vincent > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > enlightenment-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ enlightenment-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
