Re: [E-devel] [RFC]Evas common engine SSE3 blend op implementation

Jim Kukunas Fri, 30 Sep 2011 13:32:47 -0700

On Fri, Sep 30, 2011 at 03:16:22PM -0300, Gustavo Sverzut Barbieri wrote:
> On Friday, September 30, 2011, Jim Kukunas <[email protected]>
> wrote:
> > On Fri, Sep 30, 2011 at 12:08:03AM -0300, Gustavo Sverzut Barbieri wrote:
> >> On Thursday, September 29, 2011, Jim Kukunas <
> >> [email protected]> wrote:
> >> > Hi Folks,
> >> >
> >> > This patch series introduces a SSE3 implementation of Evas's common
> >> > engine blending routines.
> >> >
> >> > Why SSE3?:
> >> > The lddqu instruction, introduced in SSE3, is faster then a typical
> >> > unaligned load in the situation where we load from, but not store to,
> >> > an unaligned address which crosses a cache line. This yields itself
> well
> >> > to the blending functions which operate on two separate arrays. We
> single
> >> > step until we obtain an aligned address for the destination array, and
> use
> >> > lddqu to load the other unaligned array.
> >> >
> >> > Why do we need an SSE implementation?:
> >> > GCC does perform some auto-vectorization, but misses a lot of
> >> > opportunities for leveraging SSE, specifically when operating on
> >> > packed integers, as opposed to floating-point. With GCC 4.6.0 and
> >> > the CFLAGS listed below, the c implementation isn't vectorized, and
> >> > the MMX implementation performance is suboptimal.
> >> >
> >> > A few tests which demonstrate the performance impact:
> >> >
> >> > Setup:
> >> >    Intel Atom N270, Intel 945GME, Expedite Xlib engine
> >> >    GCC 4.5.1  CFLAGS=-m32 -mtune=atom -O2 -msse3
> >> >
> >> > Rect Blend:
> >> >    C:    21.80 FPS +/- 0.028674
> >> >    MMX:  27.41 FPS +/- 0.021344
> >> >    SSE3: 46.90 FPS +/- 0.376106
> >> >
> >> > Image Blend Fade Unscaled:
> >> >    C:    15.46 FPS +/- 0.031314
> >> >    MMX:  24.92 FPS +/- 0.055902
> >> >    SSE3: 34.28 FPS +/- 0.099457
> >> >
> >> > Image Blend Solid Fade Unscaled:
> >> >    C:    22.03 FPS +/- 0.097125
> >> >    MMX:  33.78 FPS +/- 0.190351
> >> >    SSE3: 46.86 FPS +/- 0.437874
> >> >
> >> > Setup:
> >> >    Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
> >> >    GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3
> >> >
> >> > Rect Blend:
> >> >    C:    32.68 FPS +/- 0.218510
> >> >    MMX:  29.75 FPS +/- 0.527105
> >> >    SSE3: 54.24 FPS +/- 0.870486
> >> >
> >> > Image Blend Unscaled:
> >> >    C:    32.73 FPS +/- 0.359036
> >> >    MMX:  35.00 FPS +/- 1.099517
> >> >    SSE3: 50.93 FPS +/- 0.990806
> >> >
> >> > Image Blend Occlude 3 Many:
> >> >    C:    24.25 FPS +/- 0.213135
> >> >    MMX:  25.87 FPS +/- 0.470124
> >> >    SSE3: 36.96 FPS +/- 0.505757
> >> >
> >> > I'm sure there is further room for improvement.
> >> >
> >> > Let me know what you guys think.
> >>
> >> I think it is amazing! We were already very fast but it was improved and
> can
> >> be improved even more. Excellent to have intel folks hacking EFL :-)
> >
> > Thanks.
> >
> >>
> >> Now I wonder whenever you'll try with icc and if it's supposed to yield
> >> better performance than gcc
> >
> > I wasn't planning on trying with icc. There is definately room for GCC
> > to generate better code for the SSE3 routines, and I'm not sure if ICC
> > does or not. Either way, optimizing for GCC reaches a wider audience.
> 
> Sure, just wondering about the results and if intel had plans to make EFL
> work with ICC :-)
> Likely most people will still do gcc anyway, but it's good to know
>


I don't know.

> 
> >> Last but not least what's your target driver for gl/composite? Is it
> powervr
> >> based? Or the intel one with open drivers?
> >
> > All of my tests were conducted with Intel integrated graphics running
> > the open source drivers.
> 
> But you ran software engine, not gl.
> Once I used to have an intel GPU and it was a pain with evas from time to
> time. Now I'm using nvidia and it's basically stable and fast. Raster is the
> one to praise, as he hacks on nvidia and gave me this insight. Would be
> amazing to have better "evas on intel gpu", most would agree.

Ah, I misread your question.

I'm definately interested in improving evas on Intel integrated
graphics.

> 
> Again, just poking what are public intel plans :-)

Keep in mind...

I don't speak for Intel; only for myself.

Thanks.

> 
> 
> -- 
> Gustavo Sverzut Barbieri
> http://profusion.mobi embedded systems
> --------------------------------------
> MSN: [email protected]
> Skype: gsbarbieri
> Mobile: +55 (19) 9225-2202
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> enlightenment-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

-- 
Jim Kukunas
Intel Open Source Technology Center

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] [RFC]Evas common engine SSE3 blend op implementation

Reply via email to