Chris Wilson <ch...@chris-wilson.co.uk> writes: > This path is being exercised by inplace compositing of trapezoids, for > instance as used in the firefox-asteroids cairo-trace. > > core2 @ 2.66GHz, > > reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills) > > before: add_n_8888 = L1: 4.36 L2: 4.27 M: 1.61 ( 0.13%) HT: > 1.65 VT: 1.63 R: 1.63 RT: 1.59 ( 21Kops/s) > > after: add_n_8888 = L1:2969.09 L2:3926.11 M:603.30 ( 49.27%) HT:524.69 > VT:401.01 R:407.59 RT:210.34 ( 804Kops/s)
Just two brief comments, and then I'll disappear again (until the 11th or so): - It looks like this function will work for abgr destinations as well as argb. - I'm surprised that the new function is _that_ much better. The current code should hit an SSE2 combiner and noop iterators for both source and destination, so while I'd expect a solid improvement from a dedicated fast path, it is hard to believe that it would be 919 times faster than the old. If these numbers are real, there has to be something wrong with either the benchmark or the current code. Soren _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman