Re: [Pixman] [PATCH] Fix arithmetic overflow in pointer arithmetic in ‘general_composite_rect’
On Mon, Sep 21, 2015 at 10:07 PM, Søren Sandmannwrote: > Sure. The extra width check can't harm. > Actually it can, because it implies that such values *can* arrive at this function, leading programmers to add tests to the calling functions, thus leading to a large number of tests for conditions that cannot happen. I would prefer to see an assert(width > 0) there. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v2] pixman-general: Tighten up calculation of temporary buffer sizes
On Tue, 22 Sep 2015 12:43:25 +0100 Ben Avisonwrote: > Each of the aligns can only add a maximum of 15 bytes to the space > requirement. This permits some edge cases to use the stack buffer where > previously it would have deduced that a heap buffer was required. > --- > > This is an update of my previous patch (now posted over a year ago): > https://patchwork.freedesktop.org/patch/49898/ > > which conflicts with the patch just pushed: > http://patchwork.freedesktop.org/patch/60052/ > > Since this piece of code is fresh in people's minds, this might be a good > time to get this one pushed as well. > > Note that the scope of this change has been reduced: while the old code > added space to align the end address to a cacheline boundary (which as I > pointed out, was unnecessary), the new version works using buffer lengths > only. > > As a discussion point, wouldn't it be better for the ALIGN macro to > assume 32-byte cache lines? Both 16-byte and 32-byte cachelines are > currently in common use, but by aligning the buffers to 32-byte addresses > we would simultaneously achieve 16-byte alignment. > > pixman/pixman-general.c |4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c > index fa88463..6141cb0 100644 > --- a/pixman/pixman-general.c > +++ b/pixman/pixman-general.c > @@ -158,9 +158,9 @@ general_composite_rect (pixman_implementation_t *imp, > if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3)) > return; > > -if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3) > +if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 15 * 3) > { > - scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3); > + scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3); > > if (!scanline_buffer) > return; Reviewed-by: Pekka Paalanen I think Ben's explanation as seen in https://patchwork.freedesktop.org/patch/49898/ covers all Søren's concerns (it quotes everything Søren said about the patch), and I see no reason to reject this. I'll push this on Friday if no-one objects. Thanks, pq pgpL74xwNlDmG.pgp Description: OpenPGP digital signature ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] pixman-general: Fix stack related pointer arithmetic overflow
On Tue, Sep 22, 2015 at 4:25 AM, Siarhei Siamashkawrote: > As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains, > the stack is allocated at the very top of the process address space > in some configurations (32-bit x86 systems with ASLR disabled). > And the careless computations done with the 'dest_buffer' pointer > may overflow, failing the buffer upper limit check. > > The problem can be reproduced using the 'stress-test' program, > which segfaults when executed via setarch: > > export CFLAGS="-O2 -m32" && ./autogen.sh > ./configure --disable-libpng --disable-gtk && make > setarch i686 -R test/stress-test > > This patch introduces the required corrections. The extra check > for negative 'width' may be redundant (the invalid 'width' value > is not supposed to reach here), but it's better to play safe > when dealing with the buffers allocated on stack. > > Reported-by: Ludovic Courtès > Signed-off-by: Siarhei Siamashka > --- > pixman/pixman-general.c | 16 +++- > 1 file changed, 7 insertions(+), 9 deletions(-) > > diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c > index 7cdea29..fa88463 100644 > --- a/pixman/pixman-general.c > +++ b/pixman/pixman-general.c > @@ -155,23 +155,21 @@ general_composite_rect (pixman_implementation_t *imp, > #define ALIGN(addr)\ > ((uint8_t *)uintptr_t)(addr)) + 15) & (~15))) > > -src_buffer = ALIGN (scanline_buffer); > -mask_buffer = ALIGN (src_buffer + width * Bpp); > -dest_buffer = ALIGN (mask_buffer + width * Bpp); > +if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3)) > + return; > > -if (ALIGN (dest_buffer + width * Bpp) > > - scanline_buffer + sizeof (stack_scanline_buffer)) > +if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3) > { > scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3); > > if (!scanline_buffer) > return; > - > - src_buffer = ALIGN (scanline_buffer); > - mask_buffer = ALIGN (src_buffer + width * Bpp); > - dest_buffer = ALIGN (mask_buffer + width * Bpp); > } > > +src_buffer = ALIGN (scanline_buffer); > +mask_buffer = ALIGN (src_buffer + width * Bpp); > +dest_buffer = ALIGN (mask_buffer + width * Bpp); > + > if (width_flag == ITER_WIDE) > { > /* To make sure there aren't any NANs in the buffers */ > -- > 2.4.6 > I tested the patch and it fixed the crash on my computer. I pushed it to master: 4297e90..8b49d4b master -> master and to 0.32 branch: d6a4a56..204fcd2 0.32 -> 0.32 Oded ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH v2] pixman-general: Tighten up calculation of temporary buffer sizes
Each of the aligns can only add a maximum of 15 bytes to the space requirement. This permits some edge cases to use the stack buffer where previously it would have deduced that a heap buffer was required. --- This is an update of my previous patch (now posted over a year ago): https://patchwork.freedesktop.org/patch/49898/ which conflicts with the patch just pushed: http://patchwork.freedesktop.org/patch/60052/ Since this piece of code is fresh in people's minds, this might be a good time to get this one pushed as well. Note that the scope of this change has been reduced: while the old code added space to align the end address to a cacheline boundary (which as I pointed out, was unnecessary), the new version works using buffer lengths only. As a discussion point, wouldn't it be better for the ALIGN macro to assume 32-byte cache lines? Both 16-byte and 32-byte cachelines are currently in common use, but by aligning the buffers to 32-byte addresses we would simultaneously achieve 16-byte alignment. pixman/pixman-general.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c index fa88463..6141cb0 100644 --- a/pixman/pixman-general.c +++ b/pixman/pixman-general.c @@ -158,9 +158,9 @@ general_composite_rect (pixman_implementation_t *imp, if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3)) return; -if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3) +if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 15 * 3) { - scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3); + scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3); if (!scanline_buffer) return; -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.32.8 now available
A new pixman release 0.32.8 is now available. This is a stable release that contains an important bug fix (buffer overflow), which can affect 32-bit systems. I recommend to all the distributions that still use the 0.32.x version to upgrade to this release as soon as possible. For those distributions that already moved to the new 0.33.2 version, I will release in the coming weeks a new version (0.33.4), which will include this bug fix, among other things. Thanks, Oded tar.gz: http://cairographics.org/releases/pixman-0.32.8.tar.gz http://xorg.freedesktop.org/archive/individual/lib/pixman-0.32.8.tar.gz tar.bz2: http://xorg.freedesktop.org/archive/individual/lib/pixman-0.32.8.tar.bz2 Hashes: MD5: e588622e375e7d578e9df320f1ace2de pixman-0.32.8.tar.gz MD5: 18d6b62abdb7bc0f8e6b0ddf48986b2c pixman-0.32.8.tar.bz2 SHA1: c1119bbdb587c56009b653e6f81c083f98a20135 pixman-0.32.8.tar.gz SHA1: 5c57045622265b877c9bf02d531973eadf942140 pixman-0.32.8.tar.bz2 GPG signature: http://cairographics.org/releases/pixman-0.32.8.tar.gz.sha1.asc (signed by Oded Gabbay) Git: git://git.freedesktop.org/git/pixman tag: pixman-0.32.8 Log: Oded Gabbay (1): Pre-release version bump to 0.32.8 Siarhei Siamashka (1): pixman-general: Fix stack related pointer arithmetic overflow Søren Sandmann Pedersen (1): Post-release version bump to 0.32.7 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] pixman-general: Fix stack related pointer arithmetic overflow
Oded Gabbayskribis: > I tested the patch and it fixed the crash on my computer. > > I pushed it to master: > 4297e90..8b49d4b master -> master > > and to 0.32 branch: > d6a4a56..204fcd2 0.32 -> 0.32 Great, thanks to all of you. Ludo’. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] Remove the 8e extra safety margin in COVER_CLIP analysis
On Mon, 21 Sep 2015 06:32:48 +0100, Siarhei Siamashkawrote: Since we are trying to justify the 8e extra safety margin removal in the context of this patch, this is what I wanted to see explained in the commit message. But maybe I'm just bad at math and it was perfectly obvious to everyone else without any special proof. I think it's just that if you've come on board since the old rounding code was removed (as I have) it's hard to see why this would ever have been a problem. If you express your P vector as a linear combination of vectors | frac(x_dst) | | int(x_dst) | | frac(y_dst) | + | int(y_dst) | | 0x1 | | 0 | then it's clear that the first component is an invariant (0x8000, 0x8000, 0x1) irrespective of whether you reach P in a stepwise manner or not, and that the other one uses only integers. Any integer multiplied by a fixed-point number is a fixed-point number (of the same number of fractional digits) without any rounding errors, so in the absence of any intermediate rounding steps, the rounding error of the expression as a whole is the same as the rounding error of the first component, and that hasn't changed. Proof: All implementations must give the same numerical results as bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear(). The former does int x0 = pixman_fixed_to_int (x - pixman_fixed_e); which maps directly to the new test for the nearest flag, when you consider that x0 must fall in the interval [0,width). The latter does x1 = x - pixman_fixed_1 / 2; x1 = pixman_fixed_to_int (x1); x2 = x1 + 1; but then x2 is brought back in range by the repeat() function, so it can't stray beyond the end of the source line. The wording does not look very good here. It seems to imply that the repeat() function has some special use in the latter (BILINEAR) case. But the repeat handling is exactly the same for NEAREST and BILINEAR. Also for NONE repeat and fetching pixels from the outside of the source image bounds, we are not bringing the coordinate back into range but interpreting the pixel value as zero. I can't follow your argument there - I don't think I implied that repeat() acted differently for the bilinear case? On NONE repeat, yes I neglected that detail, but I was generalising. How about: The latter does x1 = x - pixman_fixed_1 / 2; x1 = pixman_fixed_to_int (x1); x2 = x1 + 1; but any values of x2 that correspond to a pixel offset beyond the end of the source line are never used to dereference the pixel array. In the case of NONE repeat, a pixel value of zero is substituted, and otherwise the action of the repeat() function, when applied to x2, is to select a different pixel offset which *does* lie within the source line (the exact choice depends upon the repeat type selected). Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] vmx: implement fast path vmx_composite_over_n_8888
On Mon, 21 Sep 2015 14:22:53 +0300 Oded Gabbaywrote: > On Thu, Sep 10, 2015 at 7:16 PM, Siarhei Siamashka > wrote: > > > > On Thu, 10 Sep 2015 12:27:18 +0300 > > Oded Gabbay wrote: > > > > > On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay > > > wrote: > > > > > > > > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka > > > > wrote: > > > > > Running "lowlevel-blt-bench over_n_" on Playstation3 3.2GHz, > > > > > Gentoo ppc (32-bit userland) gave the following results: > > > > > > > > > > before: over_n_ = L1: 147.47 L2: 205.86 M:121.07 > > > > > after: over_n_ = L1: 287.27 L2: 261.09 M:133.48 > > > > > > > > > > Signed-off-by: Siarhei Siamashka > > > > > --- > > > > > pixman/pixman-vmx.c | 54 > > > > > +++ > > > > > 1 files changed, 54 insertions(+), 0 deletions(-) > > > > > > > > > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c > > > > > index a9bd024..9e551b3 100644 > > > > > --- a/pixman/pixman-vmx.c > > > > > +++ b/pixman/pixman-vmx.c > > > > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_ > > > > > (pixman_implementation_t *imp, > > > > > } > > > > > > > > > > static void > > > > > +vmx_composite_over_n_ (pixman_implementation_t *imp, > > > > > + pixman_composite_info_t *info) > > > > > +{ > > > > > +PIXMAN_COMPOSITE_ARGS (info); > > > > > +uint32_t *dst_line, *dst; > > > > > +uint32_t src, ia; > > > > > +int i, w, dst_stride; > > > > > +vector unsigned int vdst, vsrc, via; > > > > > + > > > > > +src = _pixman_image_get_solid (imp, src_image, > > > > > dest_image->bits.format); > > > > > + > > > > > +if (src == 0) > > > > > + return; > > > > > + > > > > > +PIXMAN_IMAGE_GET_LINE ( > > > > > + dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, > > > > > 1); > > > > > + > > > > > +vsrc = (vector unsigned int){src, src, src, src}; > > > > > +via = negate (splat_alpha (vsrc)); > > > > If we will use the over function (see my next comment), we need to > > > > remove the negate() from the above statement, as it is done in the > > > > over function. > > > > > > > > > +ia = ALPHA_8 (~src); > > > > > + > > > > > +while (height--) > > > > > +{ > > > > > + dst = dst_line; > > > > > + dst_line += dst_stride; > > > > > + w = width; > > > > > + > > > > > + while (w && ((uintptr_t)dst & 15)) > > > > > + { > > > > > + uint32_t d = *dst; > > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); > > > > > + *dst++ = d; > > > > > + w--; > > > > > + } > > > > > + > > > > > + for (i = w / 4; i > 0; i--) > > > > > + { > > > > > + vdst = pix_multiply (load_128_aligned (dst), via); > > > > > + save_128_aligned (dst, pix_add (vsrc, vdst)); > > > > > > > > Instead of the above two lines, I would simply use the over function > > > > in vmx, which does exactly that. So: > > > > vdst = over(vsrc, via, load_128_aligned(dst)) > > > > save_128_aligned (dst, vdst); > > > > > > > > I prefer this as it reuses an existing function which helps > > > > maintainability, and using it has no impact on performance. > > > > > > > > > + dst += 4; > > > > > + } > > > > > + > > > > > + for (i = w % 4; --i >= 0;) > > > > > + { > > > > > + uint32_t d = dst[i]; > > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); > > > > > + dst[i] = d; > > > > > + } > > > > > +} > > > > > +} > > > > > + > > > > > +static void > > > > > vmx_composite_over__ (pixman_implementation_t *imp, > > > > > pixman_composite_info_t *info) > > > > > { > > > > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP > > > > > (vmx___normal_OVER, > > > > > > > > > > static const pixman_fast_path_t vmx_fast_paths[] = > > > > > { > > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, a8r8g8b8, > > > > > vmx_composite_over_n_), > > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, x8r8g8b8, > > > > > vmx_composite_over_n_), > > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, > > > > > vmx_composite_over__), > > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, > > > > > vmx_composite_over__), > > > > > PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, > > > > > vmx_composite_over__), > > > > > -- > > > > > 1.7.8.6 > > > > > > > I took your advice and run benchmarks with the *non* trimmed traces. > It takes 144 minutes to run a benchmark on POWER8, 3.4GHz 8 Cores (a > raw machine, not VM). > > I compared with and without this patch and I got: > > image ocitysmap 659.69-> 611.71
Re: [Pixman] [PATCH] vmx: implement fast path vmx_composite_over_n_8888
On Tue, Sep 22, 2015 at 3:59 PM, Pekka Paalanenwrote: > On Mon, 21 Sep 2015 14:22:53 +0300 > Oded Gabbay wrote: > >> On Thu, Sep 10, 2015 at 7:16 PM, Siarhei Siamashka >> wrote: >> > >> > On Thu, 10 Sep 2015 12:27:18 +0300 >> > Oded Gabbay wrote: >> > >> > > On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay >> > > wrote: >> > > > >> > > > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka >> > > > wrote: >> > > > > Running "lowlevel-blt-bench over_n_" on Playstation3 3.2GHz, >> > > > > Gentoo ppc (32-bit userland) gave the following results: >> > > > > >> > > > > before: over_n_ = L1: 147.47 L2: 205.86 M:121.07 >> > > > > after: over_n_ = L1: 287.27 L2: 261.09 M:133.48 >> > > > > >> > > > > Signed-off-by: Siarhei Siamashka >> > > > > --- >> > > > > pixman/pixman-vmx.c | 54 >> > > > > +++ >> > > > > 1 files changed, 54 insertions(+), 0 deletions(-) >> > > > > >> > > > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c >> > > > > index a9bd024..9e551b3 100644 >> > > > > --- a/pixman/pixman-vmx.c >> > > > > +++ b/pixman/pixman-vmx.c >> > > > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_ >> > > > > (pixman_implementation_t *imp, >> > > > > } >> > > > > >> > > > > static void >> > > > > +vmx_composite_over_n_ (pixman_implementation_t *imp, >> > > > > + pixman_composite_info_t *info) >> > > > > +{ >> > > > > +PIXMAN_COMPOSITE_ARGS (info); >> > > > > +uint32_t *dst_line, *dst; >> > > > > +uint32_t src, ia; >> > > > > +int i, w, dst_stride; >> > > > > +vector unsigned int vdst, vsrc, via; >> > > > > + >> > > > > +src = _pixman_image_get_solid (imp, src_image, >> > > > > dest_image->bits.format); >> > > > > + >> > > > > +if (src == 0) >> > > > > + return; >> > > > > + >> > > > > +PIXMAN_IMAGE_GET_LINE ( >> > > > > + dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, >> > > > > 1); >> > > > > + >> > > > > +vsrc = (vector unsigned int){src, src, src, src}; >> > > > > +via = negate (splat_alpha (vsrc)); >> > > > If we will use the over function (see my next comment), we need to >> > > > remove the negate() from the above statement, as it is done in the >> > > > over function. >> > > > >> > > > > +ia = ALPHA_8 (~src); >> > > > > + >> > > > > +while (height--) >> > > > > +{ >> > > > > + dst = dst_line; >> > > > > + dst_line += dst_stride; >> > > > > + w = width; >> > > > > + >> > > > > + while (w && ((uintptr_t)dst & 15)) >> > > > > + { >> > > > > + uint32_t d = *dst; >> > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); >> > > > > + *dst++ = d; >> > > > > + w--; >> > > > > + } >> > > > > + >> > > > > + for (i = w / 4; i > 0; i--) >> > > > > + { >> > > > > + vdst = pix_multiply (load_128_aligned (dst), via); >> > > > > + save_128_aligned (dst, pix_add (vsrc, vdst)); >> > > > >> > > > Instead of the above two lines, I would simply use the over function >> > > > in vmx, which does exactly that. So: >> > > > vdst = over(vsrc, via, load_128_aligned(dst)) >> > > > save_128_aligned (dst, vdst); >> > > > >> > > > I prefer this as it reuses an existing function which helps >> > > > maintainability, and using it has no impact on performance. >> > > > >> > > > > + dst += 4; >> > > > > + } >> > > > > + >> > > > > + for (i = w % 4; --i >= 0;) >> > > > > + { >> > > > > + uint32_t d = dst[i]; >> > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); >> > > > > + dst[i] = d; >> > > > > + } >> > > > > +} >> > > > > +} >> > > > > + >> > > > > +static void >> > > > > vmx_composite_over__ (pixman_implementation_t *imp, >> > > > > pixman_composite_info_t *info) >> > > > > { >> > > > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP >> > > > > (vmx___normal_OVER, >> > > > > >> > > > > static const pixman_fast_path_t vmx_fast_paths[] = >> > > > > { >> > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, a8r8g8b8, >> > > > > vmx_composite_over_n_), >> > > > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, x8r8g8b8, >> > > > > vmx_composite_over_n_), >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, >> > > > > vmx_composite_over__), >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, >> > > > > vmx_composite_over__), >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, >> > > > > vmx_composite_over__), >> > > > > -- >> > > > > 1.7.8.6 >> > > > > > >> >> I took your advice and run benchmarks with the *non* trimmed traces. >> It