[Pixman] [PATCH] test: List more details of the operation when running scaling-test verbosely

2015-08-20 Thread Ben Avison
---
I've been doing a fair bit of work on scaled plots again recently, and I was
finding that scaling-test omitted to mention lots of useful information
(notably whether a mask was used, and what the mask parameers were).

 test/scaling-test.c |   63 +-
 1 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/test/scaling-test.c b/test/scaling-test.c
index e2f7fa9..02bd0d0 100644
--- a/test/scaling-test.c
+++ b/test/scaling-test.c
@@ -73,7 +73,7 @@ test_composite (int  testnum,
 pixman_op_top;
 pixman_repeat_trepeat = PIXMAN_REPEAT_NONE;
 pixman_repeat_tmask_repeat = PIXMAN_REPEAT_NONE;
-pixman_format_code_t src_fmt, dst_fmt;
+pixman_format_code_t src_fmt, mask_fmt, dst_fmt;
 uint32_t * srcbuf;
 uint32_t * dstbuf;
 uint32_t * maskbuf;
@@ -145,6 +145,7 @@ test_composite (int  testnum,
 prng_randmemset (dstbuf, dst_stride * dst_height, 0);
 
 src_fmt = get_format (src_bpp);
+mask_fmt = PIXMAN_a8;
 dst_fmt = get_format (dst_bpp);
 
 if (prng_rand_n (2))
@@ -169,7 +170,7 @@ test_composite (int  testnum,
 src_fmt, src_width, src_height, srcbuf, src_stride);
 
 mask_img = pixman_image_create_bits (
-PIXMAN_a8, mask_width, mask_height, maskbuf, mask_stride);
+mask_fmt, mask_width, mask_height, maskbuf, mask_stride);
 
 dst_img = pixman_image_create_bits (
 dst_fmt, dst_width, dst_height, dstbuf, dst_stride);
@@ -255,21 +256,6 @@ test_composite (int  testnum,
 else
pixman_image_set_filter (mask_img, PIXMAN_FILTER_BILINEAR, NULL, 0);
 
-if (verbose)
-{
-   printf (src_fmt=%s, dst_fmt=%s\n, 
-   format_name (src_fmt), format_name (dst_fmt));
-   printf (op=%s, scale_x=%d, scale_y=%d, repeat=%d\n,
-   operator_name (op), scale_x, scale_y, repeat);
-   printf (translate_x=%d, translate_y=%d\n,
-   translate_x, translate_y);
-   printf (src_width=%d, src_height=%d, dst_width=%d, dst_height=%d\n,
-   src_width, src_height, dst_width, dst_height);
-   printf (src_x=%d, src_y=%d, dst_x=%d, dst_y=%d\n,
-   src_x, src_y, dst_x, dst_y);
-   printf (w=%d, h=%d\n, w, h);
-}
-
 if (prng_rand_n (8) == 0)
 {
pixman_box16_t clip_boxes[2];
@@ -352,10 +338,42 @@ test_composite (int  testnum,
 }
 
 if (prng_rand_n (2) == 0)
-   pixman_image_composite (op, src_img, NULL, dst_img,
-src_x, src_y, 0, 0, dst_x, dst_y, w, h);
-else
-   pixman_image_composite (op, src_img, mask_img, dst_img,
+{
+mask_fmt = PIXMAN_null;
+pixman_image_unref (mask_img);
+mask_img = NULL;
+mask_x = 0;
+mask_y = 0;
+}
+
+if (verbose)
+{
+printf (op=%s, src_fmt=%s, mask_fmt=%s, dst_fmt=%s\n,
+operator_name (op), format_name (src_fmt), format_name 
(mask_fmt), format_name (dst_fmt));
+printf (scale_x=%d, scale_y=%d, repeat=%d, filter=%d\n,
+scale_x, scale_y, repeat, src_img-common.filter);
+printf (translate_x=%d, translate_y=%d\n,
+translate_x, translate_y);
+if (mask_fmt != PIXMAN_null)
+{
+printf (mask_scale_x=%d, mask_scale_y=%d, mask_repeat=%d, 
mask_filter=%d\n,
+mask_scale_x, mask_scale_y, mask_repeat, 
mask_img-common.filter);
+printf (mask_translate_x=%d, mask_translate_y=%d\n,
+mask_translate_x, mask_translate_y);
+}
+printf (src_width=%d, src_height=%d, ,
+src_width, src_height);
+if (mask_fmt != PIXMAN_null)
+{
+printf (mask_width=%d, mask_height=%d, ,
+mask_width, mask_height);
+}
+printf (dst_width=%d, dst_height=%d\n,
+dst_width, dst_height);
+printf (w=%d, h=%d\n, w, h);
+}
+
+pixman_image_composite (op, src_img, mask_img, dst_img,
 src_x, src_y, mask_x, mask_y, dst_x, dst_y, w, h);
 
 crc32 = compute_crc32_for_image (0, dst_img);
@@ -364,7 +382,8 @@ test_composite (int  testnum,
print_image (dst_img);
 
 pixman_image_unref (src_img);
-pixman_image_unref (mask_img);
+if (mask_img != NULL)
+pixman_image_unref (mask_img);
 pixman_image_unref (dst_img);
 
 if (src_stride  0)
-- 
1.7.5.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking

2015-08-20 Thread Bill Spitzak
On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen ppaala...@gmail.com wrote:


 A thing that explains a great deal of these anomalies, but not all of it,
 has
 something to do with function addresses. There are hypotheses that it might
 have to do with the branch predictor and its cache. We made a test
 targeting
 exactly that idea: pick a fast path function that seems to be most
 susceptible
 to unexpected changes, pad it with x nops before the function start and N-x
 nops after the function end. We never execute those nops, but changing x
 changes the function start address while keeping everything else in the
 whole
 binary in the same place.

 The results were mind-boggling: depending on the function starting
 address, the
 src__ L1 test of lowlevel-blt-bench went either 355 Mpx/s or 470
 Mpx/s.
 There does not seem to be any predictable pattern on which addresses are
 fast
 and which are slow. Obviously this will screw up our benchmarks, because
 a
 change in an unrelated function may cause another function's address to
 shift,
 and therefore change its performance. See [1] for the plot.

 [1] The plot of alignment vs. performance

 https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/plain/octave/figures/fig-src---L1.pdf


Could this be whether some bad instruction ends up next to or split by a
cache line boundary? That would produce a random-looking plot, though it
really is a plot of the location of the bad instructions in the measured
function.

If this really is a problem then the ideal fix is for the compiler to
insert NOP instructions in order to move the bad instructions away from the
locations that make them bad. Yike.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 3/4] armv6: Add over_n_8888 fast path (disabled)

2015-08-20 Thread Pekka Paalanen
From: Ben Avison bavi...@riscosopen.org

This new fast path is initially disabled by putting the entries in the
lookup table after the sentinel. The compiler cannot tell the new code
is not used, so it cannot eliminate the code. Also the lookup table size
will include the new fast path. When the follow-up patch then enables
the new fast path, the binary layout (alignments, size, etc.) will stay
the same compared to the disabled case.

Keeping the binary layout identical is important for benchmarking on
Raspberry Pi 1. The addresses at which functions are loaded will have a
significant impact on benchmark results, causing unexpected performance
changes. Keeping all function addresses the same across the patch
enabling a new fast path improves the reliability of benchmarks.

Benchmark results are included in the patch enabling this fast path.

[Pekka: disabled the fast path, commit message]
Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk
---
 pixman/pixman-arm-simd-asm.S | 41 +
 pixman/pixman-arm-simd.c |  7 +++
 2 files changed, 48 insertions(+)

diff --git a/pixman/pixman-arm-simd-asm.S b/pixman/pixman-arm-simd-asm.S
index 7b0727b..a74a0a8 100644
--- a/pixman/pixman-arm-simd-asm.S
+++ b/pixman/pixman-arm-simd-asm.S
@@ -1136,3 +1136,44 @@ generate_composite_function \
 in_reverse___process_tail
 
 
/**/
+
+.macro over_n__init
+ldr SRC, [sp, #ARGS_STACK_OFFSET]
+/* Hold loop invariant in MASK */
+ldr MASK, =0x00800080
+/* Hold multiplier for destination in STRIDE_M */
+mov STRIDE_M, #255
+sub STRIDE_M, STRIDE_M, SRC, lsr #24
+/* Set GE[3:0] to 0101 so SEL instructions do what we want */
+uadd8   SCRATCH, MASK, MASK
+.endm
+
+.macro over_n__process_head  cond, numbytes, firstreg, unaligned_src, 
unaligned_mask, preload
+pixld   , numbytes, firstreg, DST, 0
+.endm
+
+.macro over_n__1pixel dst
+mul__8  WKdst, STRIDE_M, SCRATCH, MASK
+uqadd8  WKdst, WKdst, SRC
+.endm
+
+.macro over_n__process_tail  cond, numbytes, firstreg
+ .set PROCESS_REG, firstreg
+ .rept numbytes / 4
+over_n__1pixel %(PROCESS_REG)
+  .set PROCESS_REG, PROCESS_REG+1
+ .endr
+pixst   , numbytes, firstreg, DST
+.endm
+
+generate_composite_function \
+pixman_composite_over_n__asm_armv6, 0, 0, 32 \
+FLAG_DST_READWRITE | FLAG_BRANCH_OVER | FLAG_PROCESS_DOES_STORE \
+2, /* prefetch distance */ \
+over_n__init, \
+nop_macro, /* newline */ \
+nop_macro, /* cleanup */ \
+over_n__process_head, \
+over_n__process_tail
+
+/**/
diff --git a/pixman/pixman-arm-simd.c b/pixman/pixman-arm-simd.c
index f40ff36..62c0f41 100644
--- a/pixman/pixman-arm-simd.c
+++ b/pixman/pixman-arm-simd.c
@@ -51,6 +51,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6, over__,
 PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6, in_reverse__,
uint32_t, 1, uint32_t, 1)
 
+PIXMAN_ARM_BIND_FAST_PATH_N_DST (SKIP_ZERO_SRC, armv6, over_n_,
+ uint32_t, 1)
 PIXMAN_ARM_BIND_FAST_PATH_N_DST (0, armv6, over_reverse_n_,
  uint32_t, 1)
 
@@ -271,6 +273,11 @@ static const pixman_fast_path_t arm_simd_fast_paths[] =
 SIMPLE_NEAREST_FAST_PATH (SRC, x8b8g8r8, x8b8g8r8, armv6__),
 
 { PIXMAN_OP_NONE },
+
+PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, 
armv6_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, 
armv6_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, 
armv6_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, 
armv6_composite_over_n_),
 };
 
 pixman_implementation_t *
-- 
2.4.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 2/4] pixman-fast-path: enable over_n_8888

2015-08-20 Thread Pekka Paalanen
From: Pekka Paalanen pekka.paala...@collabora.co.uk

Enable the fast path added in the previous patch by moving the lookup
table entries to their proper locations.

Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the
effect of adding this one patch on top of
pixman-fast-path: Add over_n_ fast path (disabled).

   Before  After
  Mean StdDev Mean StdDev   Confidence   Change
L112.5   0.03 21.2   0.05100.00% +69.5%
L211.1   0.02 17.4   0.01100.00% +57.3%
M  9.4   0.00 13.6   0.00100.00% +45.1%
HT 8.5   0.02 12.2   0.02100.00% +43.0%
VT 8.4   0.02 11.9   0.02100.00% +41.7%
R  8.2   0.01 11.5   0.02100.00% +40.5%
RT 5.5   0.05  7.6   0.08100.00% +39.1%

Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk
---
 pixman/pixman-fast-path.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c
index 6803037..1f96f56 100644
--- a/pixman/pixman-fast-path.c
+++ b/pixman/pixman-fast-path.c
@@ -1869,9 +1869,13 @@ static const pixman_fast_path_t c_fast_paths[] =
 PIXMAN_STD_FAST_PATH (OVER, x8r8g8b8, a8, a8r8g8b8, 
fast_composite_over_x888_8_),
 PIXMAN_STD_FAST_PATH (OVER, x8b8g8r8, a8, x8b8g8r8, 
fast_composite_over_x888_8_),
 PIXMAN_STD_FAST_PATH (OVER, x8b8g8r8, a8, a8b8g8r8, 
fast_composite_over_x888_8_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, 
fast_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, 
fast_composite_over_n_),
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
fast_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
fast_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, r5g6b5, 
fast_composite_over__0565),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, 
fast_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, 
fast_composite_over_n_),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
fast_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, x8b8g8r8, 
fast_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, b5g6r5, 
fast_composite_over__0565),
@@ -2003,11 +2007,6 @@ static const pixman_fast_path_t c_fast_paths[] =
 },
 
 {   PIXMAN_OP_NONE },
-
-PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, 
fast_composite_over_n_),
-PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, 
fast_composite_over_n_),
-PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, 
fast_composite_over_n_),
-PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, 
fast_composite_over_n_),
 };
 
 #ifdef WORDS_BIGENDIAN
-- 
2.4.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

2015-08-20 Thread Pekka Paalanen
From: Ben Avison bavi...@riscosopen.org

This is a C fast path, useful for reference or for platforms that don't
have their own fast path for this operation.

This new fast path is initially disabled by putting the entries in the
lookup table after the sentinel. The compiler cannot tell the new code
is not used, so it cannot eliminate the code. Also the lookup table size
will include the new fast path. When the follow-up patch then enables
the new fast path, the binary layout (alignments, size, etc.) will stay
the same compared to the disabled case.

Keeping the binary layout identical is important for benchmarking on
Raspberry Pi 1. The addresses at which functions are loaded will have a
significant impact on benchmark results, causing unexpected performance
changes. Keeping all function addresses the same across the patch
enabling a new fast path improves the reliability of benchmarks.

Benchmark results are included in the patch enabling this fast path.

[Pekka: disabled the fast path, commit message]
Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk
---
 pixman/pixman-fast-path.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c
index 53d4a1f..6803037 100644
--- a/pixman/pixman-fast-path.c
+++ b/pixman/pixman-fast-path.c
@@ -1122,6 +1122,37 @@ fast_composite_over_n_1_0565 (pixman_implementation_t 
*imp,
 }
 }
 
+static void
+fast_composite_over_n_ (pixman_implementation_t *imp,
+pixman_composite_info_t *info)
+{
+PIXMAN_COMPOSITE_ARGS (info);
+uint32_t  src;
+uint32_t *dst_line, *dst;
+int   dst_stride;
+int32_t   w;
+
+src = _pixman_image_get_solid (imp, src_image, dest_image-bits.format);
+
+if (src == 0)
+return;
+
+PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint32_t, dst_stride, 
dst_line, 1);
+
+while (height--)
+{
+dst = dst_line;
+dst_line += dst_stride;
+w = width;
+
+while (w--)
+{
+*dst = over (src, *dst);
+dst++;
+}
+}
+}
+
 /*
  * Simple bitblt
  */
@@ -1972,6 +2003,11 @@ static const pixman_fast_path_t c_fast_paths[] =
 },
 
 {   PIXMAN_OP_NONE },
+
+PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, 
fast_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, 
fast_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, 
fast_composite_over_n_),
+PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, 
fast_composite_over_n_),
 };
 
 #ifdef WORDS_BIGENDIAN
-- 
2.4.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking

2015-08-20 Thread Ben Avison

On Thu, 20 Aug 2015 19:34:37 +0100, Bill Spitzak spit...@gmail.com wrote:


Could this be whether some bad instruction ends up next to or split
by a cache line boundary? That would produce a random-looking plot,
though it really is a plot of the location of the bad instructions in
the measured function.

If this really is a problem then the ideal fix is for the compiler to
insert NOP instructions in order to move the bad instructions away from
the locations that make them bad. Yike.


Thought of that, tried it, still baffled at the results. In other words,
merely ensuring instructions retained the same alignment to cachelines
wasn't enough to ensure reproducibility - it could only be achieved by
ensuring the same absolute address (which isn't an option with shared
libraries in the presence of ASLR).

My best theory at the moment is that the branch predictor in the ARM11
uses a hash of both the source and destination addresses of a branch to
choose which index in the predictor cache. Because it's a direct-mapped
cache, any collisions due to the branch moving to a different address can
have major effects on very tight loops like src__.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman