Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

Søren Sandmann Wed, 02 Jan 2013 10:41:04 -0800

Chris Wilson <ch...@chris-wilson.co.uk> writes:

> This path is being exercised by inplace compositing of trapezoids, for
> instance as used in the firefox-asteroids cairo-trace.
>
> core2 @ 2.66GHz,
>
> reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills)
>
> before: add_n_8888 = L1:   4.36  L2:   4.27  M:  1.61 (  0.13%)  HT:
> 1.65  VT:  1.63  R:  1.63  RT:  1.59 (  21Kops/s)
>
> after:  add_n_8888 = L1:2969.09  L2:3926.11  M:603.30 ( 49.27%)  HT:524.69
> VT:401.01  R:407.59  RT:210.34 ( 804Kops/s)


Just two brief comments, and then I'll disappear again (until the 11th
or so):

- It looks like this function will work for abgr destinations as well as
  argb.

- I'm surprised that the new function is _that_ much better. The current
  code should hit an SSE2 combiner and noop iterators for both source
  and destination, so while I'd expect a solid improvement from a
  dedicated fast path, it is hard to believe that it would be 919 times
  faster than the old. If these numbers are real, there has to be
  something wrong with either the benchmark or the current code.


Soren
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

Reply via email to