[Pixman] [PATCH] test: List more details of the operation when running scaling-test verbosely
--- I've been doing a fair bit of work on scaled plots again recently, and I was finding that scaling-test omitted to mention lots of useful information (notably whether a mask was used, and what the mask parameers were). test/scaling-test.c | 63 +- 1 files changed, 41 insertions(+), 22 deletions(-) diff --git a/test/scaling-test.c b/test/scaling-test.c index e2f7fa9..02bd0d0 100644 --- a/test/scaling-test.c +++ b/test/scaling-test.c @@ -73,7 +73,7 @@ test_composite (int testnum, pixman_op_top; pixman_repeat_trepeat = PIXMAN_REPEAT_NONE; pixman_repeat_tmask_repeat = PIXMAN_REPEAT_NONE; -pixman_format_code_t src_fmt, dst_fmt; +pixman_format_code_t src_fmt, mask_fmt, dst_fmt; uint32_t * srcbuf; uint32_t * dstbuf; uint32_t * maskbuf; @@ -145,6 +145,7 @@ test_composite (int testnum, prng_randmemset (dstbuf, dst_stride * dst_height, 0); src_fmt = get_format (src_bpp); +mask_fmt = PIXMAN_a8; dst_fmt = get_format (dst_bpp); if (prng_rand_n (2)) @@ -169,7 +170,7 @@ test_composite (int testnum, src_fmt, src_width, src_height, srcbuf, src_stride); mask_img = pixman_image_create_bits ( -PIXMAN_a8, mask_width, mask_height, maskbuf, mask_stride); +mask_fmt, mask_width, mask_height, maskbuf, mask_stride); dst_img = pixman_image_create_bits ( dst_fmt, dst_width, dst_height, dstbuf, dst_stride); @@ -255,21 +256,6 @@ test_composite (int testnum, else pixman_image_set_filter (mask_img, PIXMAN_FILTER_BILINEAR, NULL, 0); -if (verbose) -{ - printf (src_fmt=%s, dst_fmt=%s\n, - format_name (src_fmt), format_name (dst_fmt)); - printf (op=%s, scale_x=%d, scale_y=%d, repeat=%d\n, - operator_name (op), scale_x, scale_y, repeat); - printf (translate_x=%d, translate_y=%d\n, - translate_x, translate_y); - printf (src_width=%d, src_height=%d, dst_width=%d, dst_height=%d\n, - src_width, src_height, dst_width, dst_height); - printf (src_x=%d, src_y=%d, dst_x=%d, dst_y=%d\n, - src_x, src_y, dst_x, dst_y); - printf (w=%d, h=%d\n, w, h); -} - if (prng_rand_n (8) == 0) { pixman_box16_t clip_boxes[2]; @@ -352,10 +338,42 @@ test_composite (int testnum, } if (prng_rand_n (2) == 0) - pixman_image_composite (op, src_img, NULL, dst_img, -src_x, src_y, 0, 0, dst_x, dst_y, w, h); -else - pixman_image_composite (op, src_img, mask_img, dst_img, +{ +mask_fmt = PIXMAN_null; +pixman_image_unref (mask_img); +mask_img = NULL; +mask_x = 0; +mask_y = 0; +} + +if (verbose) +{ +printf (op=%s, src_fmt=%s, mask_fmt=%s, dst_fmt=%s\n, +operator_name (op), format_name (src_fmt), format_name (mask_fmt), format_name (dst_fmt)); +printf (scale_x=%d, scale_y=%d, repeat=%d, filter=%d\n, +scale_x, scale_y, repeat, src_img-common.filter); +printf (translate_x=%d, translate_y=%d\n, +translate_x, translate_y); +if (mask_fmt != PIXMAN_null) +{ +printf (mask_scale_x=%d, mask_scale_y=%d, mask_repeat=%d, mask_filter=%d\n, +mask_scale_x, mask_scale_y, mask_repeat, mask_img-common.filter); +printf (mask_translate_x=%d, mask_translate_y=%d\n, +mask_translate_x, mask_translate_y); +} +printf (src_width=%d, src_height=%d, , +src_width, src_height); +if (mask_fmt != PIXMAN_null) +{ +printf (mask_width=%d, mask_height=%d, , +mask_width, mask_height); +} +printf (dst_width=%d, dst_height=%d\n, +dst_width, dst_height); +printf (w=%d, h=%d\n, w, h); +} + +pixman_image_composite (op, src_img, mask_img, dst_img, src_x, src_y, mask_x, mask_y, dst_x, dst_y, w, h); crc32 = compute_crc32_for_image (0, dst_img); @@ -364,7 +382,8 @@ test_composite (int testnum, print_image (dst_img); pixman_image_unref (src_img); -pixman_image_unref (mask_img); +if (mask_img != NULL) +pixman_image_unref (mask_img); pixman_image_unref (dst_img); if (src_stride 0) -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking
On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen ppaala...@gmail.com wrote: A thing that explains a great deal of these anomalies, but not all of it, has something to do with function addresses. There are hypotheses that it might have to do with the branch predictor and its cache. We made a test targeting exactly that idea: pick a fast path function that seems to be most susceptible to unexpected changes, pad it with x nops before the function start and N-x nops after the function end. We never execute those nops, but changing x changes the function start address while keeping everything else in the whole binary in the same place. The results were mind-boggling: depending on the function starting address, the src__ L1 test of lowlevel-blt-bench went either 355 Mpx/s or 470 Mpx/s. There does not seem to be any predictable pattern on which addresses are fast and which are slow. Obviously this will screw up our benchmarks, because a change in an unrelated function may cause another function's address to shift, and therefore change its performance. See [1] for the plot. [1] The plot of alignment vs. performance https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/plain/octave/figures/fig-src---L1.pdf Could this be whether some bad instruction ends up next to or split by a cache line boundary? That would produce a random-looking plot, though it really is a plot of the location of the bad instructions in the measured function. If this really is a problem then the ideal fix is for the compiler to insert NOP instructions in order to move the bad instructions away from the locations that make them bad. Yike. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 3/4] armv6: Add over_n_8888 fast path (disabled)
From: Ben Avison bavi...@riscosopen.org This new fast path is initially disabled by putting the entries in the lookup table after the sentinel. The compiler cannot tell the new code is not used, so it cannot eliminate the code. Also the lookup table size will include the new fast path. When the follow-up patch then enables the new fast path, the binary layout (alignments, size, etc.) will stay the same compared to the disabled case. Keeping the binary layout identical is important for benchmarking on Raspberry Pi 1. The addresses at which functions are loaded will have a significant impact on benchmark results, causing unexpected performance changes. Keeping all function addresses the same across the patch enabling a new fast path improves the reliability of benchmarks. Benchmark results are included in the patch enabling this fast path. [Pekka: disabled the fast path, commit message] Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk --- pixman/pixman-arm-simd-asm.S | 41 + pixman/pixman-arm-simd.c | 7 +++ 2 files changed, 48 insertions(+) diff --git a/pixman/pixman-arm-simd-asm.S b/pixman/pixman-arm-simd-asm.S index 7b0727b..a74a0a8 100644 --- a/pixman/pixman-arm-simd-asm.S +++ b/pixman/pixman-arm-simd-asm.S @@ -1136,3 +1136,44 @@ generate_composite_function \ in_reverse___process_tail /**/ + +.macro over_n__init +ldr SRC, [sp, #ARGS_STACK_OFFSET] +/* Hold loop invariant in MASK */ +ldr MASK, =0x00800080 +/* Hold multiplier for destination in STRIDE_M */ +mov STRIDE_M, #255 +sub STRIDE_M, STRIDE_M, SRC, lsr #24 +/* Set GE[3:0] to 0101 so SEL instructions do what we want */ +uadd8 SCRATCH, MASK, MASK +.endm + +.macro over_n__process_head cond, numbytes, firstreg, unaligned_src, unaligned_mask, preload +pixld , numbytes, firstreg, DST, 0 +.endm + +.macro over_n__1pixel dst +mul__8 WKdst, STRIDE_M, SCRATCH, MASK +uqadd8 WKdst, WKdst, SRC +.endm + +.macro over_n__process_tail cond, numbytes, firstreg + .set PROCESS_REG, firstreg + .rept numbytes / 4 +over_n__1pixel %(PROCESS_REG) + .set PROCESS_REG, PROCESS_REG+1 + .endr +pixst , numbytes, firstreg, DST +.endm + +generate_composite_function \ +pixman_composite_over_n__asm_armv6, 0, 0, 32 \ +FLAG_DST_READWRITE | FLAG_BRANCH_OVER | FLAG_PROCESS_DOES_STORE \ +2, /* prefetch distance */ \ +over_n__init, \ +nop_macro, /* newline */ \ +nop_macro, /* cleanup */ \ +over_n__process_head, \ +over_n__process_tail + +/**/ diff --git a/pixman/pixman-arm-simd.c b/pixman/pixman-arm-simd.c index f40ff36..62c0f41 100644 --- a/pixman/pixman-arm-simd.c +++ b/pixman/pixman-arm-simd.c @@ -51,6 +51,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6, over__, PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6, in_reverse__, uint32_t, 1, uint32_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_N_DST (SKIP_ZERO_SRC, armv6, over_n_, + uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_N_DST (0, armv6, over_reverse_n_, uint32_t, 1) @@ -271,6 +273,11 @@ static const pixman_fast_path_t arm_simd_fast_paths[] = SIMPLE_NEAREST_FAST_PATH (SRC, x8b8g8r8, x8b8g8r8, armv6__), { PIXMAN_OP_NONE }, + +PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, armv6_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, armv6_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, armv6_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, armv6_composite_over_n_), }; pixman_implementation_t * -- 2.4.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 2/4] pixman-fast-path: enable over_n_8888
From: Pekka Paalanen pekka.paala...@collabora.co.uk Enable the fast path added in the previous patch by moving the lookup table entries to their proper locations. Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the effect of adding this one patch on top of pixman-fast-path: Add over_n_ fast path (disabled). Before After Mean StdDev Mean StdDev Confidence Change L112.5 0.03 21.2 0.05100.00% +69.5% L211.1 0.02 17.4 0.01100.00% +57.3% M 9.4 0.00 13.6 0.00100.00% +45.1% HT 8.5 0.02 12.2 0.02100.00% +43.0% VT 8.4 0.02 11.9 0.02100.00% +41.7% R 8.2 0.01 11.5 0.02100.00% +40.5% RT 5.5 0.05 7.6 0.08100.00% +39.1% Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk --- pixman/pixman-fast-path.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c index 6803037..1f96f56 100644 --- a/pixman/pixman-fast-path.c +++ b/pixman/pixman-fast-path.c @@ -1869,9 +1869,13 @@ static const pixman_fast_path_t c_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER, x8r8g8b8, a8, a8r8g8b8, fast_composite_over_x888_8_), PIXMAN_STD_FAST_PATH (OVER, x8b8g8r8, a8, x8b8g8r8, fast_composite_over_x888_8_), PIXMAN_STD_FAST_PATH (OVER, x8b8g8r8, a8, a8b8g8r8, fast_composite_over_x888_8_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, fast_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, fast_composite_over_n_), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, fast_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, fast_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, r5g6b5, fast_composite_over__0565), +PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, fast_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, fast_composite_over_n_), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, fast_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, x8b8g8r8, fast_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, b5g6r5, fast_composite_over__0565), @@ -2003,11 +2007,6 @@ static const pixman_fast_path_t c_fast_paths[] = }, { PIXMAN_OP_NONE }, - -PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, fast_composite_over_n_), -PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, fast_composite_over_n_), -PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, fast_composite_over_n_), -PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, fast_composite_over_n_), }; #ifdef WORDS_BIGENDIAN -- 2.4.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)
From: Ben Avison bavi...@riscosopen.org This is a C fast path, useful for reference or for platforms that don't have their own fast path for this operation. This new fast path is initially disabled by putting the entries in the lookup table after the sentinel. The compiler cannot tell the new code is not used, so it cannot eliminate the code. Also the lookup table size will include the new fast path. When the follow-up patch then enables the new fast path, the binary layout (alignments, size, etc.) will stay the same compared to the disabled case. Keeping the binary layout identical is important for benchmarking on Raspberry Pi 1. The addresses at which functions are loaded will have a significant impact on benchmark results, causing unexpected performance changes. Keeping all function addresses the same across the patch enabling a new fast path improves the reliability of benchmarks. Benchmark results are included in the patch enabling this fast path. [Pekka: disabled the fast path, commit message] Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk --- pixman/pixman-fast-path.c | 36 1 file changed, 36 insertions(+) diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c index 53d4a1f..6803037 100644 --- a/pixman/pixman-fast-path.c +++ b/pixman/pixman-fast-path.c @@ -1122,6 +1122,37 @@ fast_composite_over_n_1_0565 (pixman_implementation_t *imp, } } +static void +fast_composite_over_n_ (pixman_implementation_t *imp, +pixman_composite_info_t *info) +{ +PIXMAN_COMPOSITE_ARGS (info); +uint32_t src; +uint32_t *dst_line, *dst; +int dst_stride; +int32_t w; + +src = _pixman_image_get_solid (imp, src_image, dest_image-bits.format); + +if (src == 0) +return; + +PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1); + +while (height--) +{ +dst = dst_line; +dst_line += dst_stride; +w = width; + +while (w--) +{ +*dst = over (src, *dst); +dst++; +} +} +} + /* * Simple bitblt */ @@ -1972,6 +2003,11 @@ static const pixman_fast_path_t c_fast_paths[] = }, { PIXMAN_OP_NONE }, + +PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, fast_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, fast_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, fast_composite_over_n_), +PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, fast_composite_over_n_), }; #ifdef WORDS_BIGENDIAN -- 2.4.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking
On Thu, 20 Aug 2015 19:34:37 +0100, Bill Spitzak spit...@gmail.com wrote: Could this be whether some bad instruction ends up next to or split by a cache line boundary? That would produce a random-looking plot, though it really is a plot of the location of the bad instructions in the measured function. If this really is a problem then the ideal fix is for the compiler to insert NOP instructions in order to move the bad instructions away from the locations that make them bad. Yike. Thought of that, tried it, still baffled at the results. In other words, merely ensuring instructions retained the same alignment to cachelines wasn't enough to ensure reproducibility - it could only be achieved by ensuring the same absolute address (which isn't an option with shared libraries in the presence of ASLR). My best theory at the moment is that the branch predictor in the ARM11 uses a hash of both the source and destination addresses of a branch to choose which index in the predictor cache. Because it's a direct-mapped cache, any collisions due to the branch moving to a different address can have major effects on very tight loops like src__. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman