Hi Folks,
This patch series introduces a SSE3 implementation of Evas's common
engine blending routines.
Why SSE3?:
The lddqu instruction, introduced in SSE3, is faster then a typical
unaligned load in the situation where we load from, but not store to,
an unaligned address which crosses a cache line. This yields itself well
to the blending functions which operate on two separate arrays. We single
step until we obtain an aligned address for the destination array, and use
lddqu to load the other unaligned array.
Why do we need an SSE implementation?:
GCC does perform some auto-vectorization, but misses a lot of
opportunities for leveraging SSE, specifically when operating on
packed integers, as opposed to floating-point. With GCC 4.6.0 and
the CFLAGS listed below, the c implementation isn't vectorized, and
the MMX implementation performance is suboptimal.
A few tests which demonstrate the performance impact:
Setup:
Intel Atom N270, Intel 945GME, Expedite Xlib engine
GCC 4.5.1 CFLAGS=-m32 -mtune=atom -O2 -msse3
Rect Blend:
C: 21.80 FPS +/- 0.028674
MMX: 27.41 FPS +/- 0.021344
SSE3: 46.90 FPS +/- 0.376106
Image Blend Fade Unscaled:
C: 15.46 FPS +/- 0.031314
MMX: 24.92 FPS +/- 0.055902
SSE3: 34.28 FPS +/- 0.099457
Image Blend Solid Fade Unscaled:
C: 22.03 FPS +/- 0.097125
MMX: 33.78 FPS +/- 0.190351
SSE3: 46.86 FPS +/- 0.437874
Setup:
Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3
Rect Blend:
C: 32.68 FPS +/- 0.218510
MMX: 29.75 FPS +/- 0.527105
SSE3: 54.24 FPS +/- 0.870486
Image Blend Unscaled:
C: 32.73 FPS +/- 0.359036
MMX: 35.00 FPS +/- 1.099517
SSE3: 50.93 FPS +/- 0.990806
Image Blend Occlude 3 Many:
C: 24.25 FPS +/- 0.213135
MMX: 25.87 FPS +/- 0.470124
SSE3: 36.96 FPS +/- 0.505757
I'm sure there is further room for improvement.
Let me know what you guys think.
Thanks.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel