On Fri, Sep 30, 2011 at 12:39:20PM +0900, Carsten Haitzler wrote: > On Thu, 29 Sep 2011 10:42:29 -0700 Jim Kukunas > <james.t.kuku...@linux.intel.com> said: > > well.. lucas committed this without me getting around to my review... i found > several issues with it. A_MASK_SSE3 was being declared all the time and never > used in the inline funcs. it was ONLY used in 1 of the c files. i moved it > there. also you called the C init funcs for rel ops - not the sse3 ones. copy > & > paste bug. also unused return value warnings in cpu sse3 detection function.
Whoops. Thanks for fixing these issues. > > i ran a full expedite run: > > http://www.enlightenment.org/~raster/speed.html > > (i5-2500 CPU @ 3.30GHz, GeForce GTS 450, e17 running with OpenGL compositor). > > just as a comparison - after the sse3 speedups, speed vs the nvidia gpu: > > http://www.enlightenment.org/~raster/speedgl.html > > i haven't tested against an atom yet. Cool. I think these patches really shine on the atom. There is a much bigger difference between 21 frames and 46 frames, then betweeen 179 frames and 397 frames. > > jim -> good work! if you want a line in AUTHORS, please send it along. in > future add an AUTHORS line as part of your patch (so we get right "real name" > and email). Thanks. Looks like Caro already added me. > > > Hi Folks, > > > > This patch series introduces a SSE3 implementation of Evas's common > > engine blending routines. > > > > Why SSE3?: > > The lddqu instruction, introduced in SSE3, is faster then a typical > > unaligned load in the situation where we load from, but not store to, > > an unaligned address which crosses a cache line. This yields itself well > > to the blending functions which operate on two separate arrays. We single > > step until we obtain an aligned address for the destination array, and use > > lddqu to load the other unaligned array. > > > > Why do we need an SSE implementation?: > > GCC does perform some auto-vectorization, but misses a lot of > > opportunities for leveraging SSE, specifically when operating on > > packed integers, as opposed to floating-point. With GCC 4.6.0 and > > the CFLAGS listed below, the c implementation isn't vectorized, and > > the MMX implementation performance is suboptimal. > > > > A few tests which demonstrate the performance impact: > > > > Setup: > > Intel Atom N270, Intel 945GME, Expedite Xlib engine > > GCC 4.5.1 CFLAGS=-m32 -mtune=atom -O2 -msse3 > > > > Rect Blend: > > C: 21.80 FPS +/- 0.028674 > > MMX: 27.41 FPS +/- 0.021344 > > SSE3: 46.90 FPS +/- 0.376106 > > > > Image Blend Fade Unscaled: > > C: 15.46 FPS +/- 0.031314 > > MMX: 24.92 FPS +/- 0.055902 > > SSE3: 34.28 FPS +/- 0.099457 > > > > Image Blend Solid Fade Unscaled: > > C: 22.03 FPS +/- 0.097125 > > MMX: 33.78 FPS +/- 0.190351 > > SSE3: 46.86 FPS +/- 0.437874 > > > > Setup: > > Intel Atom N455, Intel GMA 3150, Expedite Xlib engine > > GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3 > > > > Rect Blend: > > C: 32.68 FPS +/- 0.218510 > > MMX: 29.75 FPS +/- 0.527105 > > SSE3: 54.24 FPS +/- 0.870486 > > > > Image Blend Unscaled: > > C: 32.73 FPS +/- 0.359036 > > MMX: 35.00 FPS +/- 1.099517 > > SSE3: 50.93 FPS +/- 0.990806 > > > > Image Blend Occlude 3 Many: > > C: 24.25 FPS +/- 0.213135 > > MMX: 25.87 FPS +/- 0.470124 > > SSE3: 36.96 FPS +/- 0.505757 > > > > I'm sure there is further room for improvement. > > > > Let me know what you guys think. > > > > Thanks. > > > > > > > > ------------------------------------------------------------------------------ > > All the data continuously generated in your IT infrastructure contains a > > definitive record of customers, application performance, security > > threats, fraudulent activity and more. Splunk takes this data and makes > > sense of it. Business sense. IT sense. Common sense. > > http://p.sf.net/sfu/splunk-d2dcopy1 > > _______________________________________________ > > enlightenment-devel mailing list > > enlightenment-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > > > > > -- > ------------- Codito, ergo sum - "I code, therefore I am" -------------- > The Rasterman (Carsten Haitzler) ras...@rasterman.com -- Jim Kukunas Intel Open Source Technology Center ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel