Re: [Pixman] [PATCH] Fix arithmetic overflow in pointer arithmetic in ‘general_composite_rect’

2015-09-22 Thread Bill Spitzak
On Mon, Sep 21, 2015 at 10:07 PM, Søren Sandmann 
wrote:

> Sure. The extra width check can't harm.
>

Actually it can, because it implies that such values *can* arrive at this
function, leading programmers to add tests to the calling functions, thus
leading to a large number of tests for conditions that cannot happen.

I would prefer to see an assert(width > 0) there.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v2] pixman-general: Tighten up calculation of temporary buffer sizes

2015-09-22 Thread Pekka Paalanen
On Tue, 22 Sep 2015 12:43:25 +0100
Ben Avison  wrote:

> Each of the aligns can only add a maximum of 15 bytes to the space
> requirement. This permits some edge cases to use the stack buffer where
> previously it would have deduced that a heap buffer was required.
> ---
> 
> This is an update of my previous patch (now posted over a year ago):
> https://patchwork.freedesktop.org/patch/49898/
> 
> which conflicts with the patch just pushed:
> http://patchwork.freedesktop.org/patch/60052/
> 
> Since this piece of code is fresh in people's minds, this might be a good
> time to get this one pushed as well.
> 
> Note that the scope of this change has been reduced: while the old code
> added space to align the end address to a cacheline boundary (which as I
> pointed out, was unnecessary), the new version works using buffer lengths
> only.
> 
> As a discussion point, wouldn't it be better for the ALIGN macro to
> assume 32-byte cache lines? Both 16-byte and 32-byte cachelines are
> currently in common use, but by aligning the buffers to 32-byte addresses
> we would simultaneously achieve 16-byte alignment.
> 
>  pixman/pixman-general.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
> index fa88463..6141cb0 100644
> --- a/pixman/pixman-general.c
> +++ b/pixman/pixman-general.c
> @@ -158,9 +158,9 @@ general_composite_rect  (pixman_implementation_t *imp,
>  if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3))
>   return;
>  
> -if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3)
> +if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 15 * 3)
>  {
> - scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3);
> + scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3);
>  
>   if (!scanline_buffer)
>   return;

Reviewed-by: Pekka Paalanen 

I think Ben's explanation as seen in
https://patchwork.freedesktop.org/patch/49898/
covers all Søren's concerns (it quotes everything Søren said about the
patch), and I see no reason to reject this.

I'll push this on Friday if no-one objects.


Thanks,
pq


pgpL74xwNlDmG.pgp
Description: OpenPGP digital signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] pixman-general: Fix stack related pointer arithmetic overflow

2015-09-22 Thread Oded Gabbay
On Tue, Sep 22, 2015 at 4:25 AM, Siarhei Siamashka
 wrote:
> As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains,
> the stack is allocated at the very top of the process address space
> in some configurations (32-bit x86 systems with ASLR disabled).
> And the careless computations done with the 'dest_buffer' pointer
> may overflow, failing the buffer upper limit check.
>
> The problem can be reproduced using the 'stress-test' program,
> which segfaults when executed via setarch:
>
> export CFLAGS="-O2 -m32" && ./autogen.sh
> ./configure --disable-libpng --disable-gtk && make
> setarch i686 -R test/stress-test
>
> This patch introduces the required corrections. The extra check
> for negative 'width' may be redundant (the invalid 'width' value
> is not supposed to reach here), but it's better to play safe
> when dealing with the buffers allocated on stack.
>
> Reported-by: Ludovic Courtès 
> Signed-off-by: Siarhei Siamashka 
> ---
>  pixman/pixman-general.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
> index 7cdea29..fa88463 100644
> --- a/pixman/pixman-general.c
> +++ b/pixman/pixman-general.c
> @@ -155,23 +155,21 @@ general_composite_rect  (pixman_implementation_t *imp,
>  #define ALIGN(addr)\
>  ((uint8_t *)uintptr_t)(addr)) + 15) & (~15)))
>
> -src_buffer = ALIGN (scanline_buffer);
> -mask_buffer = ALIGN (src_buffer + width * Bpp);
> -dest_buffer = ALIGN (mask_buffer + width * Bpp);
> +if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3))
> +   return;
>
> -if (ALIGN (dest_buffer + width * Bpp) >
> -   scanline_buffer + sizeof (stack_scanline_buffer))
> +if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3)
>  {
> scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3);
>
> if (!scanline_buffer)
> return;
> -
> -   src_buffer = ALIGN (scanline_buffer);
> -   mask_buffer = ALIGN (src_buffer + width * Bpp);
> -   dest_buffer = ALIGN (mask_buffer + width * Bpp);
>  }
>
> +src_buffer = ALIGN (scanline_buffer);
> +mask_buffer = ALIGN (src_buffer + width * Bpp);
> +dest_buffer = ALIGN (mask_buffer + width * Bpp);
> +
>  if (width_flag == ITER_WIDE)
>  {
> /* To make sure there aren't any NANs in the buffers */
> --
> 2.4.6
>

I tested the patch and it fixed the crash on my computer.

I pushed it to master:
4297e90..8b49d4b  master -> master

and to 0.32 branch:
d6a4a56..204fcd2  0.32 -> 0.32

   Oded
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH v2] pixman-general: Tighten up calculation of temporary buffer sizes

2015-09-22 Thread Ben Avison
Each of the aligns can only add a maximum of 15 bytes to the space
requirement. This permits some edge cases to use the stack buffer where
previously it would have deduced that a heap buffer was required.
---

This is an update of my previous patch (now posted over a year ago):
https://patchwork.freedesktop.org/patch/49898/

which conflicts with the patch just pushed:
http://patchwork.freedesktop.org/patch/60052/

Since this piece of code is fresh in people's minds, this might be a good
time to get this one pushed as well.

Note that the scope of this change has been reduced: while the old code
added space to align the end address to a cacheline boundary (which as I
pointed out, was unnecessary), the new version works using buffer lengths
only.

As a discussion point, wouldn't it be better for the ALIGN macro to
assume 32-byte cache lines? Both 16-byte and 32-byte cachelines are
currently in common use, but by aligning the buffers to 32-byte addresses
we would simultaneously achieve 16-byte alignment.

 pixman/pixman-general.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
index fa88463..6141cb0 100644
--- a/pixman/pixman-general.c
+++ b/pixman/pixman-general.c
@@ -158,9 +158,9 @@ general_composite_rect  (pixman_implementation_t *imp,
 if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3))
return;
 
-if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3)
+if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 15 * 3)
 {
-   scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3);
+   scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3);
 
if (!scanline_buffer)
return;
-- 
1.7.5.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [ANNOUNCE] pixman release 0.32.8 now available

2015-09-22 Thread Oded Gabbay
A new pixman release 0.32.8 is now available. This is a stable release that 
contains an important bug fix (buffer overflow), which can affect 32-bit 
systems.

I recommend to all the distributions that still use the 0.32.x version to 
upgrade to this release as soon as possible.

For those distributions that already moved to the new 0.33.2 version, I will 
release in the coming weeks a new version (0.33.4), which will include this 
bug fix, among other things.

Thanks,

 Oded

tar.gz:
http://cairographics.org/releases/pixman-0.32.8.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.32.8.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.32.8.tar.bz2

Hashes:
MD5:  e588622e375e7d578e9df320f1ace2de  pixman-0.32.8.tar.gz
MD5:  18d6b62abdb7bc0f8e6b0ddf48986b2c  pixman-0.32.8.tar.bz2
SHA1: c1119bbdb587c56009b653e6f81c083f98a20135  pixman-0.32.8.tar.gz
SHA1: 5c57045622265b877c9bf02d531973eadf942140  pixman-0.32.8.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.32.8.tar.gz.sha1.asc
(signed by Oded Gabbay )

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.32.8

Log:
Oded Gabbay (1):
  Pre-release version bump to 0.32.8

Siarhei Siamashka (1):
  pixman-general: Fix stack related pointer arithmetic overflow

Søren Sandmann Pedersen (1):
  Post-release version bump to 0.32.7
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] pixman-general: Fix stack related pointer arithmetic overflow

2015-09-22 Thread Ludovic Courtès
Oded Gabbay  skribis:

> I tested the patch and it fixed the crash on my computer.
>
> I pushed it to master:
> 4297e90..8b49d4b  master -> master
>
> and to 0.32 branch:
> d6a4a56..204fcd2  0.32 -> 0.32

Great, thanks to all of you.

Ludo’.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/2] Remove the 8e extra safety margin in COVER_CLIP analysis

2015-09-22 Thread Ben Avison

On Mon, 21 Sep 2015 06:32:48 +0100, Siarhei Siamashka 
 wrote:

Since we are trying to justify the 8e extra safety margin removal in
the context of this patch, this is what I wanted to see explained in
the commit message. But maybe I'm just bad at math and it was perfectly
obvious to everyone else without any special proof.


I think it's just that if you've come on board since the old rounding
code was removed (as I have) it's hard to see why this would ever have
been a problem. If you express your P vector as a linear combination of
vectors

| frac(x_dst) |   | int(x_dst) |
| frac(y_dst) | + | int(y_dst) |
| 0x1 |   | 0  |

then it's clear that the first component is an invariant (0x8000, 0x8000,
0x1) irrespective of whether you reach P in a stepwise manner or not,
and that the other one uses only integers. Any integer multiplied by a
fixed-point number is a fixed-point number (of the same number of
fractional digits) without any rounding errors, so in the absence of any
intermediate rounding steps, the rounding error of the expression as a
whole is the same as the rounding error of the first component, and that
hasn't changed.


Proof:

All implementations must give the same numerical results as
bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear().

The former does
int x0 = pixman_fixed_to_int (x - pixman_fixed_e);
which maps directly to the new test for the nearest flag, when you consider
that x0 must fall in the interval [0,width).

The latter does
x1 = x - pixman_fixed_1 / 2;
x1 = pixman_fixed_to_int (x1);
x2 = x1 + 1;
but then x2 is brought back in range by the repeat() function, so it can't
stray beyond the end of the source line.


The wording does not look very good here. It seems to imply that the
repeat() function has some special use in the latter (BILINEAR) case.
But the repeat handling is exactly the same for NEAREST and BILINEAR.
Also for NONE repeat and fetching pixels from the outside of the source
image bounds, we are not bringing the coordinate back into range but
interpreting the pixel value as zero.


I can't follow your argument there - I don't think I implied that
repeat() acted differently for the bilinear case?

On NONE repeat, yes I neglected that detail, but I was generalising. How
about:

The latter does
x1 = x - pixman_fixed_1 / 2;
x1 = pixman_fixed_to_int (x1);
x2 = x1 + 1;
but any values of x2 that correspond to a pixel offset beyond the end of
the source line are never used to dereference the pixel array. In the
case of NONE repeat, a pixel value of zero is substituted, and otherwise
the action of the repeat() function, when applied to x2, is to select a
different pixel offset which *does* lie within the source line (the exact
choice depends upon the repeat type selected).

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] vmx: implement fast path vmx_composite_over_n_8888

2015-09-22 Thread Pekka Paalanen
On Mon, 21 Sep 2015 14:22:53 +0300
Oded Gabbay  wrote:

> On Thu, Sep 10, 2015 at 7:16 PM, Siarhei Siamashka
>  wrote:
> >
> > On Thu, 10 Sep 2015 12:27:18 +0300
> > Oded Gabbay  wrote:
> >
> > > On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay  
> > > wrote:
> > > >
> > > > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka
> > > >  wrote:
> > > > > Running "lowlevel-blt-bench over_n_" on Playstation3 3.2GHz,
> > > > > Gentoo ppc (32-bit userland) gave the following results:
> > > > >
> > > > > before:  over_n_ =  L1: 147.47  L2: 205.86  M:121.07
> > > > > after:   over_n_ =  L1: 287.27  L2: 261.09  M:133.48
> > > > >
> > > > > Signed-off-by: Siarhei Siamashka 
> > > > > ---
> > > > >  pixman/pixman-vmx.c |   54 
> > > > > +++
> > > > >  1 files changed, 54 insertions(+), 0 deletions(-)
> > > > >
> > > > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c
> > > > > index a9bd024..9e551b3 100644
> > > > > --- a/pixman/pixman-vmx.c
> > > > > +++ b/pixman/pixman-vmx.c
> > > > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_ 
> > > > > (pixman_implementation_t *imp,
> > > > >  }
> > > > >
> > > > >  static void
> > > > > +vmx_composite_over_n_ (pixman_implementation_t *imp,
> > > > > +   pixman_composite_info_t *info)
> > > > > +{
> > > > > +PIXMAN_COMPOSITE_ARGS (info);
> > > > > +uint32_t *dst_line, *dst;
> > > > > +uint32_t src, ia;
> > > > > +int  i, w, dst_stride;
> > > > > +vector unsigned int vdst, vsrc, via;
> > > > > +
> > > > > +src = _pixman_image_get_solid (imp, src_image, 
> > > > > dest_image->bits.format);
> > > > > +
> > > > > +if (src == 0)
> > > > > +   return;
> > > > > +
> > > > > +PIXMAN_IMAGE_GET_LINE (
> > > > > +   dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 
> > > > > 1);
> > > > > +
> > > > > +vsrc = (vector unsigned int){src, src, src, src};
> > > > > +via = negate (splat_alpha (vsrc));
> > > > If we will use the over function (see my next comment), we need to
> > > > remove the negate() from the above statement, as it is done in the
> > > > over function.
> > > >
> > > > > +ia = ALPHA_8 (~src);
> > > > > +
> > > > > +while (height--)
> > > > > +{
> > > > > +   dst = dst_line;
> > > > > +   dst_line += dst_stride;
> > > > > +   w = width;
> > > > > +
> > > > > +   while (w && ((uintptr_t)dst & 15))
> > > > > +   {
> > > > > +   uint32_t d = *dst;
> > > > > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
> > > > > +   *dst++ = d;
> > > > > +   w--;
> > > > > +   }
> > > > > +
> > > > > +   for (i = w / 4; i > 0; i--)
> > > > > +   {
> > > > > +   vdst = pix_multiply (load_128_aligned (dst), via);
> > > > > +   save_128_aligned (dst, pix_add (vsrc, vdst));
> > > >
> > > > Instead of the above two lines, I would simply use the over function
> > > > in vmx, which does exactly that. So:
> > > > vdst = over(vsrc, via, load_128_aligned(dst))
> > > > save_128_aligned (dst, vdst);
> > > >
> > > > I prefer this as it reuses an existing function which helps
> > > > maintainability, and using it has no impact on performance.
> > > >
> > > > > +   dst += 4;
> > > > > +   }
> > > > > +
> > > > > +   for (i = w % 4; --i >= 0;)
> > > > > +   {
> > > > > +   uint32_t d = dst[i];
> > > > > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
> > > > > +   dst[i] = d;
> > > > > +   }
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +static void
> > > > >  vmx_composite_over__ (pixman_implementation_t *imp,
> > > > > pixman_composite_info_t *info)
> > > > >  {
> > > > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP 
> > > > > (vmx___normal_OVER,
> > > > >
> > > > >  static const pixman_fast_path_t vmx_fast_paths[] =
> > > > >  {
> > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, a8r8g8b8, 
> > > > > vmx_composite_over_n_),
> > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, x8r8g8b8, 
> > > > > vmx_composite_over_n_),
> > > > >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
> > > > > vmx_composite_over__),
> > > > >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
> > > > > vmx_composite_over__),
> > > > >  PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
> > > > > vmx_composite_over__),
> > > > > --
> > > > > 1.7.8.6
> > > > >

> 
> I took your advice and run benchmarks with the *non* trimmed traces.
> It takes 144 minutes to run a benchmark on POWER8, 3.4GHz 8 Cores (a
> raw machine, not VM).
> 
> I compared with and without this patch and I got:
> 
> image  ocitysmap 659.69-> 611.71

Re: [Pixman] [PATCH] vmx: implement fast path vmx_composite_over_n_8888

2015-09-22 Thread Oded Gabbay
On Tue, Sep 22, 2015 at 3:59 PM, Pekka Paalanen  wrote:
> On Mon, 21 Sep 2015 14:22:53 +0300
> Oded Gabbay  wrote:
>
>> On Thu, Sep 10, 2015 at 7:16 PM, Siarhei Siamashka
>>  wrote:
>> >
>> > On Thu, 10 Sep 2015 12:27:18 +0300
>> > Oded Gabbay  wrote:
>> >
>> > > On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay  
>> > > wrote:
>> > > >
>> > > > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka
>> > > >  wrote:
>> > > > > Running "lowlevel-blt-bench over_n_" on Playstation3 3.2GHz,
>> > > > > Gentoo ppc (32-bit userland) gave the following results:
>> > > > >
>> > > > > before:  over_n_ =  L1: 147.47  L2: 205.86  M:121.07
>> > > > > after:   over_n_ =  L1: 287.27  L2: 261.09  M:133.48
>> > > > >
>> > > > > Signed-off-by: Siarhei Siamashka 
>> > > > > ---
>> > > > >  pixman/pixman-vmx.c |   54 
>> > > > > +++
>> > > > >  1 files changed, 54 insertions(+), 0 deletions(-)
>> > > > >
>> > > > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c
>> > > > > index a9bd024..9e551b3 100644
>> > > > > --- a/pixman/pixman-vmx.c
>> > > > > +++ b/pixman/pixman-vmx.c
>> > > > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_ 
>> > > > > (pixman_implementation_t *imp,
>> > > > >  }
>> > > > >
>> > > > >  static void
>> > > > > +vmx_composite_over_n_ (pixman_implementation_t *imp,
>> > > > > +   pixman_composite_info_t *info)
>> > > > > +{
>> > > > > +PIXMAN_COMPOSITE_ARGS (info);
>> > > > > +uint32_t *dst_line, *dst;
>> > > > > +uint32_t src, ia;
>> > > > > +int  i, w, dst_stride;
>> > > > > +vector unsigned int vdst, vsrc, via;
>> > > > > +
>> > > > > +src = _pixman_image_get_solid (imp, src_image, 
>> > > > > dest_image->bits.format);
>> > > > > +
>> > > > > +if (src == 0)
>> > > > > +   return;
>> > > > > +
>> > > > > +PIXMAN_IMAGE_GET_LINE (
>> > > > > +   dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 
>> > > > > 1);
>> > > > > +
>> > > > > +vsrc = (vector unsigned int){src, src, src, src};
>> > > > > +via = negate (splat_alpha (vsrc));
>> > > > If we will use the over function (see my next comment), we need to
>> > > > remove the negate() from the above statement, as it is done in the
>> > > > over function.
>> > > >
>> > > > > +ia = ALPHA_8 (~src);
>> > > > > +
>> > > > > +while (height--)
>> > > > > +{
>> > > > > +   dst = dst_line;
>> > > > > +   dst_line += dst_stride;
>> > > > > +   w = width;
>> > > > > +
>> > > > > +   while (w && ((uintptr_t)dst & 15))
>> > > > > +   {
>> > > > > +   uint32_t d = *dst;
>> > > > > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
>> > > > > +   *dst++ = d;
>> > > > > +   w--;
>> > > > > +   }
>> > > > > +
>> > > > > +   for (i = w / 4; i > 0; i--)
>> > > > > +   {
>> > > > > +   vdst = pix_multiply (load_128_aligned (dst), via);
>> > > > > +   save_128_aligned (dst, pix_add (vsrc, vdst));
>> > > >
>> > > > Instead of the above two lines, I would simply use the over function
>> > > > in vmx, which does exactly that. So:
>> > > > vdst = over(vsrc, via, load_128_aligned(dst))
>> > > > save_128_aligned (dst, vdst);
>> > > >
>> > > > I prefer this as it reuses an existing function which helps
>> > > > maintainability, and using it has no impact on performance.
>> > > >
>> > > > > +   dst += 4;
>> > > > > +   }
>> > > > > +
>> > > > > +   for (i = w % 4; --i >= 0;)
>> > > > > +   {
>> > > > > +   uint32_t d = dst[i];
>> > > > > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
>> > > > > +   dst[i] = d;
>> > > > > +   }
>> > > > > +}
>> > > > > +}
>> > > > > +
>> > > > > +static void
>> > > > >  vmx_composite_over__ (pixman_implementation_t *imp,
>> > > > > pixman_composite_info_t *info)
>> > > > >  {
>> > > > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP 
>> > > > > (vmx___normal_OVER,
>> > > > >
>> > > > >  static const pixman_fast_path_t vmx_fast_paths[] =
>> > > > >  {
>> > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, a8r8g8b8, 
>> > > > > vmx_composite_over_n_),
>> > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, x8r8g8b8, 
>> > > > > vmx_composite_over_n_),
>> > > > >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
>> > > > > vmx_composite_over__),
>> > > > >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
>> > > > > vmx_composite_over__),
>> > > > >  PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
>> > > > > vmx_composite_over__),
>> > > > > --
>> > > > > 1.7.8.6
>> > > > >
>
>>
>> I took your advice and run benchmarks with the *non* trimmed traces.
>> It