Re: [Pixman] [PATCH v2] scaling-test: list more details when verbose

2015-08-27 Thread Ben Avison

On Thu, 27 Aug 2015 12:16:49 +0100, Pekka Paalanen  wrote:


[Pekka: redo whitespace and print src,dst,mask x and y.]
Signed-off-by: Pekka Paalanen 


Reviewed-by: Ben Avison 
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

2015-08-27 Thread Ben Avison

On Thu, 27 Aug 2015 10:59:52 +0100, Pekka Paalanen  wrote:

It would be *really* nice if we could somehow use a benchmark mode
where we could run an operation with every possible implementation and
compare them. I wonder, can we already do that with PIXMAN_DISABLE?


That would certainly help detect some issues, where we get worse
performance at supposedly more advanced implementation levels. Obviously
it wouldn't make any sense to bother with certain combinations, such as
PIXMAN_DISABLE=arm-neon when testing on an x86.

One thing it wouldn't be able to detect, though, would be where the fetch/
combine/writeback iterators are faster than fast paths for the *same*
implementation level - such as with the ARMv6 nearest-scaled patches I
was revisiting recently. In that specific case, it turned out that my
original solution of bespoke C wrappers for the fetchers turned out to be
even faster - but we don't have any way at present of detecting if there
are other cases where we would be better off deleting the fast paths and
letting the iterators do the work instead.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] test: Add cover-test

2015-08-27 Thread Ben Avison
This test aims to verify both numerical correctness and the honouring of
array bounds for scaled plots (both nearest-neighbour and bilinear) at or
close to the boundary conditions for applicability of "cover" type fast paths
and iter fetch routines.

It has a secondary purpose: by setting the env var EXACT (to any value) it
will only test plots that are exactly on the boundary condition. This makes
it possible to ensure that "cover" routines are being used to the maximum,
although this requires the use of a debugger or code instrumentation to
verify.
---
 test/Makefile.sources |1 +
 test/cover-test.c |  424 +
 2 files changed, 425 insertions(+), 0 deletions(-)
 create mode 100644 test/cover-test.c

diff --git a/test/Makefile.sources b/test/Makefile.sources
index 5b9ae84..aece143 100644
--- a/test/Makefile.sources
+++ b/test/Makefile.sources
@@ -27,6 +27,7 @@ TESTPROGRAMS =  \
glyph-test\
solid-test\
stress-test   \
+   cover-test\
blitters-test \
affine-test   \
scaling-test  \
diff --git a/test/cover-test.c b/test/cover-test.c
new file mode 100644
index 000..dbdf0f8
--- /dev/null
+++ b/test/cover-test.c
@@ -0,0 +1,424 @@
+/*
+ * Copyright © 2015 RISC OS Open Ltd
+ *
+ * Permission to use, copy, modify, distribute, and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the name of the copyright holders not be used in
+ * advertising or publicity pertaining to distribution of the software without
+ * specific, written prior permission.  The copyright holders make no
+ * representations about the suitability of this software for any purpose.  It
+ * is provided "as is" without express or implied warranty.
+ *
+ * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
+ * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
+ * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
+ * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
+ * SOFTWARE.
+ *
+ * Author:  Ben Avison (bavi...@riscosopen.org)
+ *
+ */
+
+/*
+ * This test aims to verify both numerical correctness and the honouring of
+ * array bounds for scaled plots (both nearest-neighbour and bilinear) at or
+ * close to the boundary conditions for applicability of "cover" type fast 
paths
+ * and iter fetch routines.
+ *
+ * It has a secondary purpose: by setting the env var EXACT (to any value) it
+ * will only test plots that are exactly on the boundary condition. This makes
+ * it possible to ensure that "cover" routines are being used to the maximum,
+ * although this requires the use of a debugger or code instrumentation to
+ * verify.
+ */
+
+#include "utils.h"
+#include 
+#include 
+
+/* Approximate limits for random scale factor generation - these ensure we can
+ * get at least 8x reduction and 8x enlargement.
+ */
+#define LOG2_MAX_FACTOR (3)
+
+/* 1/sqrt(2) (or sqrt(0.5), or 2^-0.5) as a 0.32 fixed-point number */
+#define INV_SQRT_2_0POINT32_FIXED (0xB504F334u)
+
+/* The largest increment that can be generated by random_scale_factor().
+ * This occurs when the "mantissa" part is 0x and the "exponent"
+ * part is -LOG2_MAX_FACTOR.
+ */
+#define MAX_INC ((pixman_fixed_t)(INV_SQRT_2_0POINT32_FIXED >> (31 - 16 - 
LOG2_MAX_FACTOR)))
+
+/* Minimum source width (in pixels) based on a typical page size of 4K and
+ * maximum colour depth of 32bpp.
+ */
+#define MIN_SRC_WIDTH (4096 / 4)
+
+/* Derive the destination width so that at max increment we fit within source 
*/
+#define DST_WIDTH (MIN_SRC_WIDTH * pixman_fixed_1 / MAX_INC)
+
+/* Calculate heights the other way round - no limits due to page alignment 
here */
+#define DST_HEIGHT 3
+#define SRC_HEIGHT ((DST_HEIGHT * MAX_INC + pixman_fixed_1 - 1) / 
pixman_fixed_1)
+
+/* At the time of writing, all the scaled fast paths use SRC, OVER or ADD
+ * Porter-Duff operators. XOR is included in the list to ensure good
+ * representation of iter scanline fetch routines.
+ */
+static const pixman_op_t op_list[] = {
+PIXMAN_OP_SRC,
+PIXMAN_OP_OVER,
+PIXMAN_OP_ADD,
+PIXMAN_OP_XOR,
+};
+
+/* At the time of writing, all the scaled fast paths use a8r8g8b8, x8r8g8b8
+ * or r5g6b5, or red-blue swapped versions of the same. When a mask channel is
+ * used, it is always a8 (and so implicitly not component alpha). a1r5g5b5 is
+ * included because it is the only other

Re: [Pixman] [PATCH 5/7] armv7/mips/sse2: Fix bounds violations in bilinear cover scaled fast paths

2015-08-27 Thread Ben Avison

On Thu, 27 Aug 2015 04:44:58 +0100, Siarhei Siamashka 
 wrote:

On Mon, 24 Aug 2015 21:42:04 +0100
Ben Avison  wrote:

The current COVER flag assumes that every pixel is fetched from the
source image as a 2x2 block (regardless of whether the fractional part
of the offset is zero or not) and this is explicitly expected to be
safe.


The situation isn't quite that simple though. With the bilinear fast
paths defined using FAST_BILINEAR_MAINLOOP_INT(), the height of the block
is already set to 1 when the fractional part is 0 - albeit as a side
effect of not being able to express a fixed-point weight of 1.0 in a
16-bit value:

http://cgit.freedesktop.org/pixman/tree/pixman/pixman-inlines.h?id=pixman-0.33.2#n947

My ARMv6 bilinear fetchers (the ones still waiting to be committed)
already contained an optimisation to avoid processing a scanline whose
vertical weight was 0. Patch 6 of this series is a tweak to
fast_fetch_bilinear_cover() so it skips such scanlines, and
ssse3_fetch_bilinear_cover() would benefit from the same optimisation.

In the horizontal direction, it was actually Søren's post (accidentally
or otherwise) which first prompted me to see there's an issue: if you're
going to say that you always fetch two source pixels even when the
coordinate is exactly aligned to a single source pixel, then who's to say
whether the zero-weighted pixel is the one to the right of the true pixel
rather than the one to its left? In practice, FAST_BILINEAR_MAINLOOP_INT
chooses the one on the right, and this happens to match your suggested
change to the calculation of the FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR,
where we simply remove the 8 * pixman_fixed_e border. As a reminder of
what I wrote earlier, this test corresponds to:

transformed.x1 >= pixman_fixed_1 / 2
transformed.y1 >= pixman_fixed_1 / 2
transformed.x2 < image->bits.width * pixman_fixed_1 - pixman_fixed_1 / 2
transformed.y2 < image->bits.height * pixman_fixed_1 - pixman_fixed_1 / 2

If we assume that zero-weighted pixels to the left are always loaded,
these would need to be the tests instead:

transformed.x1 > pixman_fixed_1 / 2
transformed.y1 > pixman_fixed_1 / 2
transformed.x2 <= image->bits.width * pixman_fixed_1 - pixman_fixed_1 / 2
transformed.y2 <= image->bits.height * pixman_fixed_1 - pixman_fixed_1 / 2

Søren talked about samples with index -1 being fetched - in other words,
the zero-weighted pixel being the one to the left of the true one. This
actually turns out to be more efficient, because to avoid out-of-bounds
accesses when transformed.x1 == pixman_fixed_1 / 2, you only need to test
the first pixel on each row, which can be moved outside the loop. Any
later pixels on the same row which might fetch a zero-weighted pixel to
the left, you at least know that the zero-weighted pixel is within the
pixel row. If the zero-weighted pixel were to be to the right, you'd need
to test every single pixel, which would normally be a significant speed
hit.

It turns out to be surprisingly simple to cause the zero-weighted pixel
to be on the left - just negate the X increments, as demonstrated by
patch 6.

Things work even more smoothly with my ARMv6 bilinear fetchers (though I
haven't reposted them yet). I was able to take advantage of ARM
conditional execution to avoid reading zero-weighted pixels with zero
cycles overhead (and in fact some speed gain due to fewer cachelines
being fetched from RAM). The only necessary change to the core code:

   adds   accumulator, accumulator, increment_fractional
   ldrcs  in0, [source_ptr, #increment_integer + 1]!
   ldrcc  in0, [source_ptr, #increment_integer]!
   ldrin1, [source_ptr, #1]

is to change the last ldr to an ldrne.

This is actually an exception to what I wrote above "If the zero-weighted
pixel were to be to the right, you'd need to test every single pixel,
which would normally be a significant speed hit." But since conditional
execution of arbitrary instructions is pretty unique to ARM, it make
sense to use the negated-increment method for the generic C
implementation.


However with your new proposed definition of the COVER_CLIP_BILINEAR
flag, the area is shrinked by "pixman_fixed_e" on the right side in


(enlarged by pixman_fixed_e, but I think that's what you meant)


order to allow a special corner case to pass through. And this is where
the bounds violations are likely coming from.

[...]

Is there actually a real bug in the
current pixman code? Because the commit summary looks kinda scary and
may be misinterpreted by the general public.


Don't worry, the existing Pixman code doesn't have any bounds violations.
In fact, there aren't any bounds violations at all, despite what the
commit summaries say.

To explain this: quite late on in development, I added the
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR_APPROX flag, as an insurance policy
against me not finding a volunteer to adapt the SSSE3 bilinear fetcher.
But then I realised that if I added the flag early on, I could move
individual fast paths from

[Pixman] [PATCH v2] scaling-test: list more details when verbose

2015-08-27 Thread Pekka Paalanen
From: Ben Avison 

Add mask details to the output.

[Pekka: redo whitespace and print src,dst,mask x and y.]
Signed-off-by: Pekka Paalanen 
---
 test/scaling-test.c | 66 +++--
 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/test/scaling-test.c b/test/scaling-test.c
index e2f7fa9..0ece611 100644
--- a/test/scaling-test.c
+++ b/test/scaling-test.c
@@ -73,7 +73,7 @@ test_composite (int  testnum,
 pixman_op_top;
 pixman_repeat_trepeat = PIXMAN_REPEAT_NONE;
 pixman_repeat_tmask_repeat = PIXMAN_REPEAT_NONE;
-pixman_format_code_t src_fmt, dst_fmt;
+pixman_format_code_t src_fmt, mask_fmt, dst_fmt;
 uint32_t * srcbuf;
 uint32_t * dstbuf;
 uint32_t * maskbuf;
@@ -145,6 +145,7 @@ test_composite (int  testnum,
 prng_randmemset (dstbuf, dst_stride * dst_height, 0);
 
 src_fmt = get_format (src_bpp);
+mask_fmt = PIXMAN_a8;
 dst_fmt = get_format (dst_bpp);
 
 if (prng_rand_n (2))
@@ -169,7 +170,7 @@ test_composite (int  testnum,
 src_fmt, src_width, src_height, srcbuf, src_stride);
 
 mask_img = pixman_image_create_bits (
-PIXMAN_a8, mask_width, mask_height, maskbuf, mask_stride);
+mask_fmt, mask_width, mask_height, maskbuf, mask_stride);
 
 dst_img = pixman_image_create_bits (
 dst_fmt, dst_width, dst_height, dstbuf, dst_stride);
@@ -255,21 +256,6 @@ test_composite (int  testnum,
 else
pixman_image_set_filter (mask_img, PIXMAN_FILTER_BILINEAR, NULL, 0);
 
-if (verbose)
-{
-   printf ("src_fmt=%s, dst_fmt=%s\n", 
-   format_name (src_fmt), format_name (dst_fmt));
-   printf ("op=%s, scale_x=%d, scale_y=%d, repeat=%d\n",
-   operator_name (op), scale_x, scale_y, repeat);
-   printf ("translate_x=%d, translate_y=%d\n",
-   translate_x, translate_y);
-   printf ("src_width=%d, src_height=%d, dst_width=%d, dst_height=%d\n",
-   src_width, src_height, dst_width, dst_height);
-   printf ("src_x=%d, src_y=%d, dst_x=%d, dst_y=%d\n",
-   src_x, src_y, dst_x, dst_y);
-   printf ("w=%d, h=%d\n", w, h);
-}
-
 if (prng_rand_n (8) == 0)
 {
pixman_box16_t clip_boxes[2];
@@ -352,10 +338,45 @@ test_composite (int  testnum,
 }
 
 if (prng_rand_n (2) == 0)
-   pixman_image_composite (op, src_img, NULL, dst_img,
-src_x, src_y, 0, 0, dst_x, dst_y, w, h);
-else
-   pixman_image_composite (op, src_img, mask_img, dst_img,
+{
+   mask_fmt = PIXMAN_null;
+   pixman_image_unref (mask_img);
+   mask_img = NULL;
+   mask_x = 0;
+   mask_y = 0;
+}
+
+if (verbose)
+{
+   printf ("op=%s, src_fmt=%s, mask_fmt=%s, dst_fmt=%s\n",
+   operator_name (op), format_name (src_fmt),
+   format_name (mask_fmt), format_name (dst_fmt));
+   printf ("scale_x=%d, scale_y=%d, repeat=%d, filter=%d\n",
+   scale_x, scale_y, repeat, src_img->common.filter);
+   printf ("translate_x=%d, translate_y=%d\n",
+   translate_x, translate_y);
+   if (mask_fmt != PIXMAN_null)
+   {
+   printf ("mask_scale_x=%d, mask_scale_y=%d, "
+   "mask_repeat=%d, mask_filter=%d\n",
+   mask_scale_x, mask_scale_y, mask_repeat,
+   mask_img->common.filter);
+   printf ("mask_translate_x=%d, mask_translate_y=%d\n",
+   mask_translate_x, mask_translate_y);
+   }
+   printf ("src_width=%d, src_height=%d, src_x=%d, src_y=%d\n",
+   src_width, src_height, src_x, src_y);
+   if (mask_fmt != PIXMAN_null)
+   {
+   printf ("mask_width=%d, mask_height=%d, mask_x=%d, mask_y=%d\n",
+   mask_width, mask_height, mask_x, mask_y);
+   }
+   printf ("dst_width=%d, dst_height=%d, dst_x=%d, dst_y=%d\n",
+   dst_width, dst_height, dst_x, dst_y);
+   printf ("w=%d, h=%d\n", w, h);
+}
+
+pixman_image_composite (op, src_img, mask_img, dst_img,
 src_x, src_y, mask_x, mask_y, dst_x, dst_y, w, h);
 
 crc32 = compute_crc32_for_image (0, dst_img);
@@ -364,7 +385,8 @@ test_composite (int  testnum,
print_image (dst_img);
 
 pixman_image_unref (src_img);
-pixman_image_unref (mask_img);
+if (mask_img != NULL)
+   pixman_image_unref (mask_img);
 pixman_image_unref (dst_img);
 
 if (src_stride < 0)
-- 
2.4.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/7] Refactor calculation of cover flags

2015-08-27 Thread Ben Avison

On Thu, 27 Aug 2015 03:55:07 +0100, Siarhei Siamashka 
 wrote:

-/* Expand the source area by a tiny bit so account of different rounding 
that
- * may happen during sampling. Note that (8 * pixman_fixed_e) is very far 
from
- * 0.5 so this won't cause the area computed to be overly pessimistic.
- */
-transformed.x1 -= 8 * pixman_fixed_e;
-transformed.y1 -= 8 * pixman_fixed_e;
-transformed.x2 += 8 * pixman_fixed_e;
-transformed.y2 += 8 * pixman_fixed_e;


Thanks for spotting this! I think that this code was used to compensate
matrix multiplication accuracy problems in older versions of pixman.
But we have already fixed these accuracy problems some time ago:

[...]

Now we can drop this "8 * pixman_fixed_e" adjustment because there
is no accuracy loss in the matrix multiplication for affine
transformations.


Ah, thank you! I couldn't work out what rounding it was that the comment
was referring to, and it seemed to have been quite deliberate. Søren
could only recall the related issue with bilinear scalers reading pixel
pairs where one source pixel might be read from outside the source array
if its weight was going to be 0.


And it is probably better to just do this simple fix
with a single patch. There is no need to spread it across multiple
patches.



Just as you do it in
http://lists.freedesktop.org/archives/pixman/2015-August/003877.html
this part becomes:

if (pixman_fixed_to_int (transformed.x1 - pixman_fixed_e) >= 0&&
pixman_fixed_to_int (transformed.y1 - pixman_fixed_e) >= 0&&
pixman_fixed_to_int (transformed.x2 - pixman_fixed_e) < image->bits.width &&
pixman_fixed_to_int (transformed.y2 - pixman_fixed_e) < image->bits.height)


Glad you agree about the missing pixman_fixed_e offset, which was
disguised by the old 8 * pixman_fixed_e enlargement.

I originally wrote this stuff as a single patch, but I got the impression
that it was too much to digest in one go for most people, hence why I
split it up. Perhaps the compromise is to go for two patches, one to deal
with the 8 * pixman_fixed_e issue, and one to deal with bilinear scaling
with zero-weight pixels just beyond the edges of the image.

For those following the other thread where I'm describing the operation
of the cover-test program, you may want to note that if you multiply both
sides of the above inequalities by pixman_fixed_1, you get

transformed.x1 - pixman_fixed_e >= 0
transformed.y1 - pixman_fixed_e >= 0
transformed.x2 - pixman_fixed_e < image->bits.width * pixman_fixed_1
transformed.y2 - pixman_fixed_e < image->bits.height * pixman_fixed_1

which is equivalent to

transformed.x1 >= pixman_fixed_e
transformed.y1 >= pixman_fixed_e
transformed.x2 <= image->bits.width * pixman_fixed_1
transformed.y2 <= image->bits.height * pixman_fixed_1

which lines up nicely with my description of which coordinates correspond
to which source pixels.


And [the bilinear case] does not need any changes. At least as far as
dropping "8 * pixman_fixed_e" is concerned.


Reworking these in the same way, you get

transformed.x1 - pixman_fixed_1 / 2 >= 0
transformed.y1 - pixman_fixed_1 / 2 >= 0
transformed.x2 + pixman_fixed_1 / 2 < image->bits.width
transformed.y2 + pixman_fixed_1 / 2 < image->bits.height

transformed.x1 >= pixman_fixed_1 / 2
transformed.y1 >= pixman_fixed_1 / 2
transformed.x2 < image->bits.width * pixman_fixed_1 - pixman_fixed_1 / 2
transformed.y2 < image->bits.height * pixman_fixed_1 - pixman_fixed_1 / 2

and then my remaining changes basically boil down to arguing that these
last two rows should be <= not <. That could sensibly be a separate patch.


I believe that neither of FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR flags matters for projective
transformations, but this can be additionally checked just in case.


I did survey all the existing fast paths and fetchers and discovered that
both the cover flags were only being used for affine transforms when I
was trying to work out what the 8 * pixman_fixed_e rounding might have
been referring to.

By check, do you mean we should ensure that the flags are not set for
projective transforms? If so, I see that compute_image_info() is called
before analyze_extent() so it could be achieved by adding an extra test
in analyze_extent() that FAST_PATH_AFFINE_TRANSFORM is set before setting
the cover flags.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

2015-08-27 Thread Pekka Paalanen
On Wed, 26 Aug 2015 12:06:59 +0100
"Ben Avison"  wrote:

> On Wed, 26 Aug 2015 09:46:49 +0100, Pekka Paalanen  
> wrote:
> > It's clearly controversial to add C fast paths, because it affects all
> > targets that don't have an asm fast path for the same, and we cannot
> > tell by just review whether it is going to be for better or (much)
> > worse.
> 
> Yes, it's always going to be a risk changing the cross-platform
> functions. Few developers are going to be able to test all the supported
> platforms, so we're always going to need help checking performance.
> Should that be a reason not to even try C fast paths though?

This is very much IMHO, but (to not even try):
- yes, if one is going to have a platform-specific fast path anyway
- yes, if one can reasonably write a platform-specific fast path, that
  is, you are working on just one or maybe two platforms
- ??, if it's something new that rarely any platform accelerates

If Oded hadn't benchmarked this, we would likely have landed this C
fast path. Then no-one would have realized it's actually slower than
the generic path afterwards.

Whether this is an isolated case or not, I don't know. But since we
tend to be more safe than sorry with pixman, I would err on the side of
not writing C fast paths. Hmm, assuming you can beat the compiler with
hand-crafted asm...?

Or, we could say we want N amount of checking on different platforms
before we can land a new C fast path, but I suspect that is as good as
just dropping it, considering how much reviews we get on patches.


It would be *really* nice if we could somehow use a benchmark mode
where we could run an operation with every possible implementation and
compare them. I wonder, can we already do that with PIXMAN_DISABLE?

I see the code recognizes these names for PIXMAN_DISABLE:

pixman/pixman-arm.c:210:if (!_pixman_disabled ("arm-simd") && have_feature 
(ARM_V6))
pixman/pixman-arm.c:215:if (!_pixman_disabled ("arm-iwmmxt") && 
have_feature (ARM_IWMMXT))
pixman/pixman-arm.c:220:if (!_pixman_disabled ("arm-neon") && have_feature 
(ARM_NEON))
pixman/pixman-implementation.c:390:if (!_pixman_disabled ("fast"))
pixman/pixman-mips.c:73:if (!_pixman_disabled ("loongson-mmi") && 
have_feature ("Loongson"))
pixman/pixman-mips.c:78:if (!_pixman_disabled ("mips-dspr2"))
pixman/pixman-ppc.c:150:if (!_pixman_disabled ("vmx") && pixman_have_vmx ())
pixman/pixman-x86.c:233:if (!_pixman_disabled ("mmx") && have_feature 
(MMX_BITS))
pixman/pixman-x86.c:238:if (!_pixman_disabled ("sse2") && have_feature 
(SSE2_BITS))
pixman/pixman-x86.c:243:if (!_pixman_disabled ("ssse3") && have_feature 
(SSSE3_BITS))

Would that be everything we need to control to be able to test every
possible path on a platform?

This could be the way to test whether Pixman is choosing the optimal
implementation of what it already has. Some heavy scripting around
lowlevel-blt-bench could produce a nice table of results.

Just an idea. Of course it doesn't remove the need to actually try
things on the various machines.


Thanks,
pq


pgp3WQH2oz_Fl.pgp
Description: OpenPGP digital signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Resolve implementation-defined behaviour for division rounded to -infinity

2015-08-27 Thread Pekka Paalanen
On Wed, 26 Aug 2015 14:21:07 +0100
"Ben Avison"  wrote:

> I hadn't really investigated that, but having had a bit of a play with
> ARM GCC, I see that it fails to use the runtime function that returns
> both the quotient and remainder (__aeabi_idivmod) with the operations in
> macro form. I get more luck writing them as functions:
> 
> inline int
> DIV (int a, int b)
> {
> int q = a / b;
> int r = a - q * b;
> return q - (r < 0);
> }
> 
> inline int
> MOD (int a, int b)
> {
> int r = a % b;
> if (r < 0)
>   r += b;
> return r;
> }
> 
> with the caveat that these are based on the macros from my 2015-08-18
> post, which rely on b being positive. (Set aside for the moment whether
> an inline function with an all-caps name is a good idea...)

FWIW, when I looked at the macros (old and proposed), my head started
spinning. Looking at these inline functions, I feel I could actually
understand them without rewriting them.
- my 2c on readability

Not to mention that unlike the macros, these do not evaluate the
arguments multiple times.

Btw. I personally agree with Siarhei's testing argument. If there is
any uncertainty whether existing code is good or not, writing a test to
explicitly check for it is a nice way. If users come back reporting
test failures, then we have a problem. Otherwise no need to pay
attention.

I just wish testing for performance was as reliable as for
correctness...


Thanks,
pq


pgpllvIGY4CcZ.pgp
Description: OpenPGP digital signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman