Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

Chris Wilson Wed, 02 Jan 2013 11:12:40 -0800

On Wed, 02 Jan 2013 19:40:58 +0100, sandm...@cs.au.dk (=?utf-8?Q?S=C3=B8ren?= 
Sandmann) wrote:
> Chris Wilson <ch...@chris-wilson.co.uk> writes:
> 
> > This path is being exercised by inplace compositing of trapezoids, for
> > instance as used in the firefox-asteroids cairo-trace.
> >
> > core2 @ 2.66GHz,
> >
> > reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills)
> >
> > before: add_n_8888 = L1:   4.36  L2:   4.27  M:  1.61 (  0.13%)  HT:
> > 1.65  VT:  1.63  R:  1.63  RT:  1.59 (  21Kops/s)
> >
> > after:  add_n_8888 = L1:2969.09  L2:3926.11  M:603.30 ( 49.27%)  HT:524.69
> > VT:401.01  R:407.59  RT:210.34 ( 804Kops/s)
> 
> Just two brief comments, and then I'll disappear again (until the 11th
> or so):
> 
> - It looks like this function will work for abgr destinations as well as
>   argb.
> 
> - I'm surprised that the new function is _that_ much better. The current
>   code should hit an SSE2 combiner and noop iterators for both source
>   and destination, so while I'd expect a solid improvement from a
>   dedicated fast path, it is hard to believe that it would be 919 times
>   faster than the old. If these numbers are real, there has to be
>   something wrong with either the benchmark or the current code.


Judging from the perf profile of cairo-traces, the delta is closer to 5x.
All I did to gather the numbers was to run 
   ./test/lowlevel-blt-bench -n add_n_8888
which is dominated by general_composite_rect:

 if (repeat == PIXMAN_REPEAT_NORMAL)
 {
        while (*c >= size)
                *c -= size;
        while (*c < 0)
                *c += size;
 }

special casing size==1 there boosts the L1 results from 4 to 70, but it
still surprising that we hit that path at all.

Ah, read the options to lowlevel-blt-bench wrong...

./test/lowlevel-blt-bench add_n_8888:
   add_n_8888 =  L1:1131.58  L2:1112.37  M:530.11 ( 43.24%)  HT:108.01  VT:
99.03  R: 90.03  RT: 25.11 ( 306Kops/s)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

Reply via email to