[Pixman] [PATCH 6/9] armv7: Use aligned memory writes in both copies of bilinear code
pixman-arm-neon-asm-bilinear.S contains duplicates of some macro definitions from pixman-arm-neon-asm.S, but they were taken before commit 9638af9 added the aligns and they have never been brought back into line. An equivalent macro to load from the destination buffer (not applicable to the operations implemented in pixman-arm-neon-asm.S) benefits from the same alignent hints. Verified that the aligned versions don't cause memory faults for the fast paths defined in pixman-arm-neon-asm-bilinear.S. Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon-asm-bilinear.S | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/pixman/pixman-arm-neon-asm-bilinear.S b/pixman/pixman-arm-neon-asm-bilinear.S index 0fd92d6..aba8d00 100644 --- a/pixman/pixman-arm-neon-asm-bilinear.S +++ b/pixman/pixman-arm-neon-asm-bilinear.S @@ -186,9 +186,9 @@ .macro bilinear_store_ numpix, tmp1, tmp2 .if numpix == 4 -vst1.32 {d0, d1}, [OUT]! +vst1.32 {d0, d1}, [OUT, :128]! .elseif numpix == 2 -vst1.32 {d0}, [OUT]! +vst1.32 {d0}, [OUT, :64]! .elseif numpix == 1 vst1.32 {d0[0]}, [OUT, :32]! .else @@ -203,11 +203,11 @@ vuzp.u8 d0, d2 convert__to_0565 d2, d1, d0, q1, tmp1, tmp2 .if numpix == 4 -vst1.16 {d2}, [OUT]! +vst1.16 {d2}, [OUT, :64]! .elseif numpix == 2 -vst1.32 {d2[0]}, [OUT]! +vst1.32 {d2[0]}, [OUT, :32]! .elseif numpix == 1 -vst1.16 {d2[0]}, [OUT]! +vst1.16 {d2[0]}, [OUT, :16]! .else .error bilinear_store_0565 numpix is unsupported .endif @@ -251,11 +251,11 @@ .macro bilinear_load_dst_ numpix, dst0, dst1, dst01 .if numpix == 4 -vld1.32 {dst0, dst1}, [OUT] +vld1.32 {dst0, dst1}, [OUT, :128] .elseif numpix == 2 -vld1.32 {dst0}, [OUT] +vld1.32 {dst0}, [OUT, :64] .elseif numpix == 1 -vld1.32 {dst0[0]}, [OUT] +vld1.32 {dst0[0]}, [OUT, :32] .else .error bilinear_load_dst_ numpix is unsupported .endif -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 4/9] armv7: Simplify constant load
A minor point, but 0xFF00 is already a valid immediate constant for NEON, there's no need to construct it in two steps. Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon-asm.S |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 9a5d85a..97315d4 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -1186,8 +1186,7 @@ generate_composite_function \ .endm .macro pixman_composite_src_x888__init -vmov.u8 q2, #0xFF -vshl.u32 q2, q2, #24 +vmov.u32 q2, #0xFF00 .endm generate_composite_function \ -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 8/9 v2] armv7: More use of fast paths with localized destination alpha
There are a group of combiner types - SRC, OVER, IN_REVERSE, OUT_REVERSE and ADD - where the destination alpha component is only used (if at all) to determine the destination alpha component. This means that any such fast paths with an a8r8g8b8 destination can also be applied to an x8r8g8b8 destination just by updating the fast path table, and likewise with a8b8g8r8 and x8b8g8r8. The following operations are affected: over___x888 add_n_8_x888 add__8_x888 add___x888 add__n_x888 add__x888 out_reverse_8_x888 v2: Changed summary line to make it distinct from similar patch relating to localized source alpha Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index be761c9..5f0561a 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -331,6 +331,7 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, a8, b5g6r5, neon_composite_over__8_0565), PIXMAN_STD_FAST_PATH (OVER, r5g6b5, a8, r5g6b5, neon_composite_over_0565_8_0565), PIXMAN_STD_FAST_PATH (OVER, b5g6r5, a8, b5g6r5, neon_composite_over_0565_8_0565), +PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, x8r8g8b8, neon_composite_over___), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, a8r8g8b8, neon_composite_over___), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, r5g6b5, neon_composite_over__0565), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, b5g6r5, neon_composite_over__0565), @@ -341,17 +342,26 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER, x8r8g8b8, null, a8r8g8b8, neon_composite_src_x888_), PIXMAN_STD_FAST_PATH (OVER, x8b8g8r8, null, a8b8g8r8, neon_composite_src_x888_), PIXMAN_STD_FAST_PATH (ADD, solid,a8, a8, neon_composite_add_n_8_8), +PIXMAN_STD_FAST_PATH (ADD, solid,a8, x8r8g8b8, neon_composite_add_n_8_), PIXMAN_STD_FAST_PATH (ADD, solid,a8, a8r8g8b8, neon_composite_add_n_8_), +PIXMAN_STD_FAST_PATH (ADD, solid,a8, x8b8g8r8, neon_composite_add_n_8_), PIXMAN_STD_FAST_PATH (ADD, solid,a8, a8b8g8r8, neon_composite_add_n_8_), PIXMAN_STD_FAST_PATH (ADD, a8, a8, a8, neon_composite_add_8_8_8), PIXMAN_STD_FAST_PATH (ADD, r5g6b5, a8, r5g6b5, neon_composite_add_0565_8_0565), PIXMAN_STD_FAST_PATH (ADD, b5g6r5, a8, b5g6r5, neon_composite_add_0565_8_0565), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8, x8r8g8b8, neon_composite_add__8_), +PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, a8, x8b8g8r8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8, a8r8g8b8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, a8, a8b8g8r8, neon_composite_add__8_), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, x8r8g8b8, neon_composite_add___), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, a8r8g8b8, neon_composite_add___), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, solid,x8r8g8b8, neon_composite_add__n_), +PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, solid,x8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, solid,a8r8g8b8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, solid,a8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8, null, a8, neon_composite_add_8_8), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, x8r8g8b8, neon_composite_add__), +PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, x8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, a8r8g8b8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (IN, solid,null, a8, neon_composite_in_n_8), @@ -359,7 +369,9 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, r5g6b5, neon_composite_out_reverse_8_0565), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, b5g6r5, neon_composite_out_reverse_8_0565), +PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, x8r8g8b8, neon_composite_out_reverse_8_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, a8r8g8b8, neon_composite_out_reverse_8_), +PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, x8b8g8r8, neon_composite_out_reverse_8_), PIXMAN_STD_FAST_PATH (OUT_REVERS
[Pixman] [PATCH 3/9 repost] armv7: Use VLD-to-all-lanes
I noticed in passing that a number of opportunities to use the all-lanes variant of VLD has been missed. I don't expect any measurable speedup because these are all in init code, but this simplifies the code a bit. Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon-asm.S | 142 +- 1 files changed, 58 insertions(+), 84 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 7e949a3..9a5d85a 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -396,11 +396,10 @@ generate_composite_function \ .macro pixman_composite_over_n_0565_init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d3[0]}, [DUMMY] -vdup.8 d0, d3[0] -vdup.8 d1, d3[1] -vdup.8 d2, d3[2] -vdup.8 d3, d3[3] +vld1.8 {d0[]}, [DUMMY]! +vld1.8 {d1[]}, [DUMMY]! +vld1.8 {d2[]}, [DUMMY]! +vld1.8 {d3[]}, [DUMMY]! vmvn.8 d3, d3 /* invert source alpha */ .endm @@ -761,11 +760,10 @@ generate_composite_function_single_scanline \ .macro pixman_composite_over_n__init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d3[0]}, [DUMMY] -vdup.8 d0, d3[0] -vdup.8 d1, d3[1] -vdup.8 d2, d3[2] -vdup.8 d3, d3[3] +vld1.8 {d0[]}, [DUMMY]! +vld1.8 {d1[]}, [DUMMY]! +vld1.8 {d2[]}, [DUMMY]! +vld1.8 {d3[]}, [DUMMY]! vmvn.8 d24, d3 /* get inverted alpha */ .endm @@ -813,11 +811,10 @@ generate_composite_function \ .macro pixman_composite_over_reverse_n__init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d7[0]}, [DUMMY] -vdup.8 d4, d7[0] -vdup.8 d5, d7[1] -vdup.8 d6, d7[2] -vdup.8 d7, d7[3] +vld1.8 {d4[]}, [DUMMY]! +vld1.8 {d5[]}, [DUMMY]! +vld1.8 {d6[]}, [DUMMY]! +vld1.8 {d7[]}, [DUMMY]! .endm generate_composite_function \ @@ -956,11 +953,10 @@ generate_composite_function \ .macro pixman_composite_over_n_8_0565_init add DUMMY, sp, #ARGS_STACK_OFFSET vpush {d8-d15} -vld1.32 {d11[0]}, [DUMMY] -vdup.8 d8, d11[0] -vdup.8 d9, d11[1] -vdup.8 d10, d11[2] -vdup.8 d11, d11[3] +vld1.8 {d8[]}, [DUMMY]! +vld1.8 {d9[]}, [DUMMY]! +vld1.8 {d10[]}, [DUMMY]! +vld1.8 {d11[]}, [DUMMY]! .endm .macro pixman_composite_over_n_8_0565_cleanup @@ -981,10 +977,9 @@ generate_composite_function \ /**/ .macro pixman_composite_over__n_0565_init -add DUMMY, sp, #(ARGS_STACK_OFFSET + 8) +add DUMMY, sp, #(ARGS_STACK_OFFSET + 11) vpush {d8-d15} -vld1.32 {d24[0]}, [DUMMY] -vdup.8 d24, d24[3] +vld1.8 {d24[]}, [DUMMY] .endm .macro pixman_composite_over__n_0565_cleanup @@ -1049,12 +1044,8 @@ generate_composite_function \ .macro pixman_composite_src_n_8_init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d0[0]}, [DUMMY] -vsli.u64d0, d0, #8 -vsli.u64d0, d0, #16 -vsli.u64d0, d0, #32 -vorrd1, d0, d0 -vorrq1, q0, q0 +vld1.8 {d0[],d1[]}, [DUMMY] +vld1.8 {d2[],d3[]}, [DUMMY] .endm .macro pixman_composite_src_n_8_cleanup @@ -1089,11 +1080,8 @@ generate_composite_function \ .macro pixman_composite_src_n_0565_init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d0[0]}, [DUMMY] -vsli.u64d0, d0, #16 -vsli.u64d0, d0, #32 -vorrd1, d0, d0 -vorrq1, q0, q0 +vld1.16 {d0[],d1[]}, [DUMMY] +vld1.16 {d2[],d3[]}, [DUMMY] .endm .macro pixman_composite_src_n_0565_cleanup @@ -1128,10 +1116,8 @@ generate_composite_function \ .macro pixman_composite_src_n__init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d0[0]}, [DUMMY] -vsli.u64d0, d0, #32 -vorrd1, d0, d0 -vorrq1, q0, q0 +vld1.32 {d0[],d1[]}, [DUMMY] +vld1.32 {d2[],d3[]}, [DUMMY] .endm .macro pixman_composite_src_n__cleanup @@ -1271,11 +1257,10 @@ generate_composite_function \ .macro pixman_composite_src_n_8__init add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d3[0]}, [DUMMY] -vdup.8 d0, d3[0] -vdup.8 d1, d3[1] -vdup.8 d2, d3[2] -vdup.8 d3, d3[3] +vld1.8 {d0[]}, [DUMMY]! +vld1.8 {d1[]}, [DUMMY]! +vld1.8 {d2[]}, [DUMMY]! +vld1.8 {d3[]}, [DUMMY]! .endm .macro pixman_composite_src_n_8__cleanup @@ -1339,9 +1324,8 @@ generate_composite_function \ .endm .macro pixman_composite_src_n_8_8_init -add DUMMY, sp, #ARGS_STACK_OFFSET -vld1.32 {d16[0]}, [DUMMY] -vdup.8 d16, d16[3] +add DUM
[Pixman] [PATCH 0/9] Changes to existing ARMv7 routines
Since there are a few people around on the list at the moment who are familiar with NEON, I'm hoping someone will be able to review my work so it can make it into git. To keep the number of patches manageable, here are a group which improve incrementally upon existing ARMv7 routines, without adding any new ones yet. Most of these are reposts which have had no review of the technical content. The patch numbers have been reassigned within this series of 9 patches, and won't match the numbers used when originally posted. Ben Avison (9): armv7: Coalesce scalar accesses where possible armv7: Faster fill operations armv7: Use VLD-to-all-lanes armv7: Simplify constant load armv7: Use prefetch for small-width images too armv7: Use aligned memory writes in both copies of bilinear code armv7: Move common bilinear macro definitions to a new header file armv7: More use of fast paths with localized destination alpha armv7: More use of fast paths with localized source alpha pixman/Makefile.am|3 +- pixman/pixman-arm-neon-asm-bilinear.S | 153 +- pixman/pixman-arm-neon-asm-bilinear.h | 165 +++ pixman/pixman-arm-neon-asm.S | 280 +++-- pixman/pixman-arm-neon-asm.h | 20 +++ pixman/pixman-arm-neon.c | 21 +++ 6 files changed, 272 insertions(+), 370 deletions(-) create mode 100644 pixman/pixman-arm-neon-asm-bilinear.h -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 5/9 repost] armv7: Use prefetch for small-width images too
After discovering that the ARMv6 optimised fast paths often out-performed the ARMv7 ones on a Cortex-A7, particularly on the RT benchmark, I found that the problem was due to the fact that the ARMv7 macros didn't attempt any sort of prefetch for small images (fewer than pixblock_size * 2 pixels across). Since a pixblock is chosen to be no larger than a cacheline, and is in many cases smaller, it seemed a reasonable compromise to avoid adding a lot of complexity by simply doing one prefetch for the start of a pixel row when starting to process the preceding one, and that is what this patch does. I compared the effect of using LDRB (which is what is currently used at the end of each long pixel row) against PLD for each of the source and destination buffers for a selection of common operations: src__, over__ and add__, and in each case PLD of both buffers was the most beneficial. PLDW didn't make any measurable difference. The overall effect of this patch on the three operations is as follows (L1, L2 and M tests can be ignored because they're known not to involve the use of short rows): src__ Before After Mean StdDev Mean StdDev Confidence Change HT 60.8 0.1 61.1 0.1 100.0% +0.6% VT 61.0 0.3 62.6 0.2 100.0% +2.6% R 45.5 0.2 46.2 0.2 100.0% +1.5% RT 19.8 0.0 21.4 0.0 100.0% +7.8% over__ Before After Mean StdDev Mean StdDev Confidence Change HT 40.2 0.1 40.7 0.4 100.0% +1.0% VT 35.5 0.2 37.9 0.3 100.0% +6.7% R 32.8 0.0 33.8 0.3 100.0% +3.0% RT 12.9 0.0 15.6 0.2 100.0% +21.4% add__ Before After Mean StdDev Mean StdDev Confidence Change HT 51.0 0.6 51.9 0.5 100.0% +1.7% VT 44.0 0.4 46.8 0.5 100.0% +6.3% R 39.6 0.5 41.0 0.4 100.0% +3.5% RT 15.2 0.2 18.0 0.2 100.0% +18.5% Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon-asm.h |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.h b/pixman/pixman-arm-neon-asm.h index 03257cc..a116e47 100644 --- a/pixman/pixman-arm-neon-asm.h +++ b/pixman/pixman-arm-neon-asm.h @@ -881,6 +881,15 @@ local skip1 * nor prefetch are used. */ 8: +.if src_bpp_shift >= 0 +PF pld, [SRC, SRC_STRIDE, lsl #src_bpp_shift] +.endif +.if dst_r_bpp != 0 +PF pld, [DST_R, DST_STRIDE, lsl #dst_bpp_shift] +.endif +.if mask_bpp_shift >= 0 +PF pld, [MASK, MASK_STRIDE, lsl #mask_bpp_shift] +.endif /* Process exactly pixblock_size pixels if needed */ tst W, #pixblock_size beq 1f -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 1/9 repost] armv7: Coalesce scalar accesses where possible
Where the alignment of a block of elements is known to equal the size of the block, but the block is smaller than 8 bytes, it is safe to use a larger element size in a scalar VLD or VST without risking an alignment exception. Typically the effect of this can be seen when accessing leading or trailing halfwords or words in the destination buffer for long scanlines. Sadly, the effect of this is too small to be measured, but it seems like a good idea anyway. Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon-asm.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.h b/pixman/pixman-arm-neon-asm.h index bdcf6a9..76b3985 100644 --- a/pixman/pixman-arm-neon-asm.h +++ b/pixman/pixman-arm-neon-asm.h @@ -183,6 +183,10 @@ pixldst30 vst3, 8, %(basereg+0), %(basereg+1), %(basereg+2), 3, mem_operand .elseif (bpp == 24) && (numpix == 1) pixldst30 vst3, 8, %(basereg+0), %(basereg+1), %(basereg+2), 1, mem_operand +.elseif numpix * bpp == 32 && abits == 32 +pixldst 4, vst1, 32, basereg, mem_operand, abits +.elseif numpix * bpp == 16 && abits == 16 +pixldst 2, vst1, 16, basereg, mem_operand, abits .else pixldst %(numpix * bpp / 8), vst1, %(bpp), basereg, mem_operand, abits .endif -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 2/9 repost] armv7: Faster fill operations
This eliminates a number of branches over blocks of code that are either empty or can be trivially combined with a separate code block at the start and end of each scanline. This has a surprisingly big effect, at least on Cortex-A7, for src_n_8: Before After Mean StdDev Mean StdDev Confidence Change L1 1570.4 133.11639.6 110.7 100.0% +4.4% L2 1042.6 19.9 1086.6 23.4100.0% +4.2% M 1030.8 7.2 1036.8 3.2 100.0% +0.6% HT 287.4 3.5 303.3 2.9 100.0% +5.5% VT 262.0 2.6 263.3 2.6 99.9% +0.5% R 206.5 2.4 209.9 2.4 100.0% +1.7% RT 56.5 1.0 59.2 0.5 100.0% +4.7% Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/pixman-arm-neon-asm.h |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.h b/pixman/pixman-arm-neon-asm.h index 76b3985..03257cc 100644 --- a/pixman/pixman-arm-neon-asm.h +++ b/pixman/pixman-arm-neon-asm.h @@ -468,6 +468,7 @@ tst DST_R, #0xF beq 2f +.if src_bpp > 0 || mask_bpp > 0 || dst_r_bpp > 0 .irp lowbit, 1, 2, 4, 8, 16 local skip1 .if (dst_w_bpp <= (lowbit * 8)) && ((lowbit * 8) < (pixblock_size * dst_w_bpp)) @@ -487,6 +488,7 @@ local skip1 1: .endif .endr +.endif pixdeinterleave src_bpp, src_basereg pixdeinterleave mask_bpp, mask_basereg pixdeinterleave dst_r_bpp, dst_r_basereg @@ -503,6 +505,9 @@ local skip1 tst DST_W, #lowbit beq 1f .endif +.if src_bpp == 0 && mask_bpp == 0 && dst_r_bpp == 0 +sub W, W, #(lowbit * 8 / dst_w_bpp) +.endif pixst_a (lowbit * 8 / dst_w_bpp), dst_w_bpp, dst_w_basereg, DST_W 1: .endif @@ -533,6 +538,7 @@ local skip1 process_pixblock_tail_head tst W, #(pixblock_size - 1) beq 2f +.if src_bpp > 0 || mask_bpp > 0 || dst_r_bpp > 0 .irp chunk_size, 16, 8, 4, 2, 1 .if pixblock_size > chunk_size tst W, #chunk_size @@ -550,6 +556,7 @@ local skip1 1: .endif .endr +.endif pixdeinterleave src_bpp, src_basereg pixdeinterleave mask_bpp, mask_basereg pixdeinterleave dst_r_bpp, dst_r_basereg -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 7/9] armv7: Move common bilinear macro definitions to a new header file
This reduces code duplication. Signed-off-by: Ben Avison <bavi...@riscosopen.org> --- pixman/Makefile.am|3 +- pixman/pixman-arm-neon-asm-bilinear.S | 147 +- pixman/pixman-arm-neon-asm-bilinear.h | 165 + pixman/pixman-arm-neon-asm.S | 135 +-- 4 files changed, 169 insertions(+), 281 deletions(-) create mode 100644 pixman/pixman-arm-neon-asm-bilinear.h diff --git a/pixman/Makefile.am b/pixman/Makefile.am index 581b6f6..b0cffaf 100644 --- a/pixman/Makefile.am +++ b/pixman/Makefile.am @@ -88,7 +88,8 @@ libpixman_arm_neon_la_SOURCES = \ pixman-arm-neon-asm.S \ pixman-arm-neon-asm-bilinear.S \ pixman-arm-asm.h \ -pixman-arm-neon-asm.h +pixman-arm-neon-asm.h \ +pixman-arm-neon-asm-bilinear.h libpixman_1_la_LIBADD += libpixman-arm-neon.la ASM_CFLAGS_arm_neon= diff --git a/pixman/pixman-arm-neon-asm-bilinear.S b/pixman/pixman-arm-neon-asm-bilinear.S index aba8d00..1194d2d 100644 --- a/pixman/pixman-arm-neon-asm-bilinear.S +++ b/pixman/pixman-arm-neon-asm-bilinear.S @@ -67,152 +67,7 @@ #include "pixman-private.h" #include "pixman-arm-asm.h" #include "pixman-arm-neon-asm.h" - -/* - * Bilinear macros from pixman-arm-neon-asm.S - */ - -/* - * Bilinear scaling support code which tries to provide pixel fetching, color - * format conversion, and interpolation as separate macros which can be used - * as the basic building blocks for constructing bilinear scanline functions. - */ - -.macro bilinear_load_ reg1, reg2, tmp -mov TMP1, X, asr #16 -add X, X, UX -add TMP1, TOP, TMP1, asl #2 -vld1.32 {reg1}, [TMP1], STRIDE -vld1.32 {reg2}, [TMP1] -.endm - -.macro bilinear_load_0565 reg1, reg2, tmp -mov TMP1, X, asr #16 -add X, X, UX -add TMP1, TOP, TMP1, asl #1 -vld1.32 {reg2[0]}, [TMP1], STRIDE -vld1.32 {reg2[1]}, [TMP1] -convert_four_0565_to_x888_packed reg2, reg1, reg2, tmp -.endm - -.macro bilinear_load_and_vertical_interpolate_two_ \ -acc1, acc2, reg1, reg2, reg3, reg4, tmp1, tmp2 - -bilinear_load_ reg1, reg2, tmp1 -vmull.u8 acc1, reg1, d28 -vmlal.u8 acc1, reg2, d29 -bilinear_load_ reg3, reg4, tmp2 -vmull.u8 acc2, reg3, d28 -vmlal.u8 acc2, reg4, d29 -.endm - -.macro bilinear_load_and_vertical_interpolate_four_ \ -xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi \ -yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi - -bilinear_load_and_vertical_interpolate_two_ \ -xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi -bilinear_load_and_vertical_interpolate_two_ \ -yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi -.endm - -.macro bilinear_load_and_vertical_interpolate_two_0565 \ -acc1, acc2, reg1, reg2, reg3, reg4, acc2lo, acc2hi - -mov TMP1, X, asr #16 -add X, X, UX -add TMP1, TOP, TMP1, asl #1 -mov TMP2, X, asr #16 -add X, X, UX -add TMP2, TOP, TMP2, asl #1 -vld1.32 {acc2lo[0]}, [TMP1], STRIDE -vld1.32 {acc2hi[0]}, [TMP2], STRIDE -vld1.32 {acc2lo[1]}, [TMP1] -vld1.32 {acc2hi[1]}, [TMP2] -convert_0565_to_x888 acc2, reg3, reg2, reg1 -vzip.u8 reg1, reg3 -vzip.u8 reg2, reg4 -vzip.u8 reg3, reg4 -vzip.u8 reg1, reg2 -vmull.u8 acc1, reg1, d28 -vmlal.u8 acc1, reg2, d29 -vmull.u8 acc2, reg3, d28 -vmlal.u8 acc2, reg4, d29 -.endm - -.macro bilinear_load_and_vertical_interpolate_four_0565 \ -xacc1, xacc2, xreg1, xreg2, xreg3, xreg4, xacc2lo, xacc2hi \ -yacc1, yacc2, yreg1, yreg2, yreg3, yreg4, yacc2lo, yacc2hi - -mov TMP1, X, asr #16 -add X, X, UX -add TMP1, TOP, TMP1, asl #1 -mov TMP2, X, asr #16 -add X, X, UX -add TMP2, TOP, TMP2, asl #1 -vld1.32 {xacc2lo[0]}, [TMP1], STRIDE -vld1.32 {xacc2hi[0]}, [TMP2], STRIDE -vld1.32 {xacc2lo[1]}, [TMP1] -vld1.32 {xacc2hi[1]}, [TMP2] -convert_0565_to_x888 xacc2, xreg3, xreg2, xreg1 -mov TMP1, X, asr #16 -add X, X, UX -add TMP1, TOP, TMP1, asl #1 -mov TMP2, X, asr #16 -add X, X, UX -add TMP2, TOP, TMP2, asl #1 -vld1.32 {yacc2lo[0]}, [TMP1], STRIDE -vzip.u8 xreg1, xreg3 -vld1.32 {yacc2hi[0]}, [TMP2], STRIDE -vzip.u8 xreg2, xreg4 -vld1.32 {yacc2lo[1]}, [TMP1] -vzip.u8 xreg3, xreg4 -vld1.32 {yacc2hi[1]}, [TMP2] -vzip.u8 xreg1, xreg2 -convert_0565_to_x888 yacc2, yreg3, yreg2, yreg1 -vmull.u8 xacc1, xreg1, d28 -vzip.u8 yreg1, yreg3 -vmlal.u8 xacc1, xreg2, d29 -vzip.u8 yreg2, yreg4 -vmull.u8 xacc2, xr
Re: [Pixman] [PATCH] Add support for aarch64 neon optimization
On Sat, 02 Apr 2016 13:30:58 +0100, Mizuki Asakurawrote: This patch only contains STD_FAST_PATH codes, not scaling (nearest, bilinear) codes. Hi Mizuki, It looks like you have used an automated process to convert the AArch32 NEON code to AArch64. Will you be able to repeat that process for other code, or at least assist others to repeat your steps? The reason I ask is that I have a large number of outstanding patches to the ARM NEON support. The process of getting them merged into the FreeDesktop git repository has been very slow because there aren't many people on this list with the time and ability to review them, however my versions are in many cases up to twice the speed of the FreeDesktop versions, and it would be a shame if AArch64 couldn't benefit from them. If your AArch64 conversion is a one-time thing, it will make make it extremely difficult to merge my changes in. After completing optimization this patch, scaling related codes should be done. One of my aims was to implement missing "iter" routines so as to accelerate scaled plots for a much wider combination of pixels formats and Porter-Duff combiner rules than the existing limited selection of fast paths could cover. If you look towards the end of my patch series here: https://github.com/bavison/pixman/commits/arm-neon-release1 you'll see that I discovered that I was actually outperforming Pixman's existing bilinear plotters so consistently that I'm advocating removing them entirely, with the additional advantage that it simplifies the code base a lot. So you might want to consider whether it's worth bothering converting those to AArch64 in the first place. I would maybe go so far as to suggest that you try converting all the iters first and only add fast paths if you find they do better than the iters. One of the drawbacks of using iters is that the prefetch code can't be as sophisticated - it can't easily be prefetching the start of the next row while it is still working on the end of the current one. But since hardware prefetchers are better now and conditional execution is hard in AArch64, this will be less of a drawback with AArch64 CPUs. I'll also repeat what has been said, that it's very neat the way the existing prefetch code sneaks calculations into pipeline stalls, but it was only ever really ideal for Cortex-A8. With Cortex-A7 (despite the number, actually a much more recent 32-bit core) I noted that it was impossible to schedule such complex prefetch code without adding to the cycle count, at least when the images were already in the cache. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Cleaning patchwork
On Tue, 22 Dec 2015 13:14:21 -, Oded Gabbaywrote: Hi Ben, I would like to clean pixman's patchwork and you have about 20 outstanding patches. [...] [4/4] pixman-fast-path: Make bilinear cover fetcher use COVER_CLIP_TIGHT flag [3/4] armv7/mips/sse2: Make bilinear cover fast paths use COVER_CLIP_TIGHT flag [2/4] Introduce FAST_PATH_SAMPLES_COVER_CLIP_TIGHT_BILINEAR flag I still think these are a worthwhile improvement and have described some conditions under which it should be measurable using affine-bench. I believe Pekka wanted to prove it using real-world traces though, and since there was no measurable change for better or worse using the normal Cairo traces, he was attempting to capture some new ones that would exercise the cases I described. Last I heard though, he had found that Cairo's tracer was broken, so he was unable to make progress. Under the circumstances, do you think affine-bench results alone would be acceptable? Resolve implementation-defined behaviour for division rounded to -infinity This one got bogged down in an argument about whether C89 should still be supported or not, which I'm not sure was ever resolved? [05/37,v2] composite flags: Early detection of fully opaque 1x1 repeating source images [10/37,v2] pixman-fast-path: Add in__8 fast path [11/37,v2] armv6: Add in__8 fast path [23/37,v2] armv6: Add optimised scanline fetchers and writeback for r5g6b5 and a8 [24/37,v2] armv6: Add optimised scanline fetcher for a1r5g5b5 [25/37,v2] armv6: Add src_1555_ fast path v1 of these patches were reviewed (excluding the ARM assembly parts) by Søren on 05 Oct 2014, and v2 addressed the issue he raised. There haven't been any comments on the reposted versions. armv7: Re-use existing fast paths in more cases armv7: Re-use existing fast paths in more cases These apply the same rules that Søren suggested for my ARMv6 paths in the v2 patches above to the pre-existing ARMv7 paths as well. The only review comment received so far was that the two patches needed different names. [2/5,v2] armv7: Add in__8 fast path The only difference for v2 here was again just a matter of defining the widest possible set of pixel formats to match the fast path. [5/5] armv7: Add src_1555_ fast path [4/5] armv7: Add in_reverse__ fast path [3/5] armv7: Add in_n_ fast path [1/5] armv7: Use prefetch for small-width images too [3/3] armv7: Use VLD-to-all-lanes [2/3] armv7: Faster fill operations [1/3] armv7: Coalesce scalar accesses where possible Require destination images to be bitmaps None of these have received any comments to date. In many cases, I suspect it's the fact that they involve ARM assembly, and we don't have many (any?) active reviewers who know it. I'm not sure what we can do about that. I also suggest that for the relevant ones (if there are any), you would rebase them on the latest master and re-send them as a single series in order to restart the review process. I can certainly do that, but I had previously been asked to split my postings into smaller series that were independent. I have a lot more patches waiting to post that depend on the ones that are stuck in limbo - the complete set can be seen at https://github.com/bavison/pixman/commits/arm-neon-release1 if you're interested. Or I could just post/repost the whole lot (72 patches now), like I did with my 37-patch series from 2014. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v3 1/2] Remove the 8e extra safety margin in COVER_CLIP analysis
On Wed, 23 Sep 2015 09:40:32 +0100, Pekka Paalanen <ppaala...@gmail.com> wrote: Ben, Siarhei, for the record, gentlemen, please give your reviewed-by's so I can just push these two out without any confusion. :-) Sure... Reviewed-by: Ben Avison <bavi...@riscosopen.org> Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH v2] pixman-general: Tighten up calculation of temporary buffer sizes
Each of the aligns can only add a maximum of 15 bytes to the space requirement. This permits some edge cases to use the stack buffer where previously it would have deduced that a heap buffer was required. --- This is an update of my previous patch (now posted over a year ago): https://patchwork.freedesktop.org/patch/49898/ which conflicts with the patch just pushed: http://patchwork.freedesktop.org/patch/60052/ Since this piece of code is fresh in people's minds, this might be a good time to get this one pushed as well. Note that the scope of this change has been reduced: while the old code added space to align the end address to a cacheline boundary (which as I pointed out, was unnecessary), the new version works using buffer lengths only. As a discussion point, wouldn't it be better for the ALIGN macro to assume 32-byte cache lines? Both 16-byte and 32-byte cachelines are currently in common use, but by aligning the buffers to 32-byte addresses we would simultaneously achieve 16-byte alignment. pixman/pixman-general.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c index fa88463..6141cb0 100644 --- a/pixman/pixman-general.c +++ b/pixman/pixman-general.c @@ -158,9 +158,9 @@ general_composite_rect (pixman_implementation_t *imp, if (width <= 0 || _pixman_multiply_overflows_int (width, Bpp * 3)) return; -if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 32 * 3) +if (width * Bpp * 3 > sizeof (stack_scanline_buffer) - 15 * 3) { - scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3); + scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3); if (!scanline_buffer) return; -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] Remove the 8e extra safety margin in COVER_CLIP analysis
On Mon, 21 Sep 2015 06:32:48 +0100, Siarhei Siamashkawrote: Since we are trying to justify the 8e extra safety margin removal in the context of this patch, this is what I wanted to see explained in the commit message. But maybe I'm just bad at math and it was perfectly obvious to everyone else without any special proof. I think it's just that if you've come on board since the old rounding code was removed (as I have) it's hard to see why this would ever have been a problem. If you express your P vector as a linear combination of vectors | frac(x_dst) | | int(x_dst) | | frac(y_dst) | + | int(y_dst) | | 0x1 | | 0 | then it's clear that the first component is an invariant (0x8000, 0x8000, 0x1) irrespective of whether you reach P in a stepwise manner or not, and that the other one uses only integers. Any integer multiplied by a fixed-point number is a fixed-point number (of the same number of fractional digits) without any rounding errors, so in the absence of any intermediate rounding steps, the rounding error of the expression as a whole is the same as the rounding error of the first component, and that hasn't changed. Proof: All implementations must give the same numerical results as bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear(). The former does int x0 = pixman_fixed_to_int (x - pixman_fixed_e); which maps directly to the new test for the nearest flag, when you consider that x0 must fall in the interval [0,width). The latter does x1 = x - pixman_fixed_1 / 2; x1 = pixman_fixed_to_int (x1); x2 = x1 + 1; but then x2 is brought back in range by the repeat() function, so it can't stray beyond the end of the source line. The wording does not look very good here. It seems to imply that the repeat() function has some special use in the latter (BILINEAR) case. But the repeat handling is exactly the same for NEAREST and BILINEAR. Also for NONE repeat and fetching pixels from the outside of the source image bounds, we are not bringing the coordinate back into range but interpreting the pixel value as zero. I can't follow your argument there - I don't think I implied that repeat() acted differently for the bilinear case? On NONE repeat, yes I neglected that detail, but I was generalising. How about: The latter does x1 = x - pixman_fixed_1 / 2; x1 = pixman_fixed_to_int (x1); x2 = x1 + 1; but any values of x2 that correspond to a pixel offset beyond the end of the source line are never used to dereference the pixel array. In the case of NONE repeat, a pixel value of zero is substituted, and otherwise the action of the repeat() function, when applied to x2, is to select a different pixel offset which *does* lie within the source line (the exact choice depends upon the repeat type selected). Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Fri, 18 Sep 2015 19:35:02 +0100, Bill Spitzak <spit...@gmail.com> wrote: On Wed, Sep 16, 2015 at 7:30 AM, Ben Avison <bavi...@riscosopen.org> wrote: Just by thinking things through, I realised that we would regularly fail to hit COVER paths if Pixman's caller set the scale factors such that the centre of the outermost destination pixels aligned with the centre of the outermost source pixels. There has been some argument about whether this is representative of how Pixman should be used. I happen to think this is a perfectly reasonable thing to expect Pixman to support, but there are other models you can follow, notably setting the scale factors such that the outer edges of the outermost destination pixels align to the outer edges of the outermost source pixels. If the Cairo traces are using this latter model, then it's understandable if you aren't hitting the edge case that I'm concerned about very often. There are currently very good reasons for clients to use the first model: it is a method of removing undesirable "fuzzy edges" from zoomed- in images. The alternative of scaling the outer edge, while not hitting this fast path, will not hit the other fast path either! (the other fast path is the one that allows a zero-weighted pixel at one end to be read from the image). I'm generally in agreement with you. For the sake of completeness though, I should point out that when zooming *out*, both models will match the conditions for COVER paths. But then again, when you're zooming out, particularly by a large factor, it is more appropriate to use a multi-tap filter rather than bilinear scaling, so maybe that's not much of an argument. NOTE: the method of scaling the centers of the pixels is generally wrong. The math is wonky: does scaling an image by 2 produce a 2*w-1 image, or does it spread the pixels slightly more than 2 apart? Programs are going to disagree. I'd suggest that if the highest layer software specifies a particular scale factor, then that's what should be used, but if it requests a particular destination size (e.g. "full screen") then that should be the primary consideration even if it means the corresponding scale factor is not the "obvious" one. But this is all really a concern for the layers above Pixman, because the starting fractional position and per-pixel increment is part of the Pixman API. Pixman just has to do whatever has been asked of it. Just because some of the ways the layers above Pixman do their calculations are impossible for Pixman to optimise further shouldn't be a reason for Pixman not to optimise the cases that it can do, especially when that set of cases include many that use a scale factor of 1 on one axis. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Wed, 16 Sep 2015 12:25:59 +0100, Pekka Paalanenwrote: I tried both "image" and "image16" on an x86_64 (Sandybridge), and got no performance differences in the trimmed-cairo-traces set in either baseline/cleanup or cleanup/tight. I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I verified I am really running what I think I am by editing Pixman and seeing the effect in the benchmark. Am I missing something? Well, there has been a long and tangled history behind this change at this point. I'll admit my twin motivations were code cleanup and making it possible to measure the speed of bilinear operations using a benchmarker, but it's also ended up spawning the creation of affine-bench and cover-test, which isn't a bad thing. Just by thinking things through, I realised that we would regularly fail to hit COVER paths if Pixman's caller set the scale factors such that the centre of the outermost destination pixels aligned with the centre of the outermost source pixels. There has been some argument about whether this is representative of how Pixman should be used. I happen to think this is a perfectly reasonable thing to expect Pixman to support, but there are other models you can follow, notably setting the scale factors such that the outer edges of the outermost destination pixels align to the outer edges of the outermost source pixels. If the Cairo traces are using this latter model, then it's understandable if you aren't hitting the edge case that I'm concerned about very often. The useful thing about the scaled-in-one-axis-only example is that on the axis that isn't scaled, there is no dispute about how you set the scale factors, so it sidesteps this argument. Should I run the same on rpi2? Or is the best effect on the fast paths we haven't merged yet? The get a sizeable speed difference, you need two things: 1) a plot geometry that aligns the centres of the high-coordinate pixels of the source and destination 2) a reasonable speed difference between COVER and repeat fast paths or iterators At present, the speed difference will be most marked when you are using the bilinear iterators from pixman-fast-path.c or pixman-ssse3.c, which are both currently restricted to COVER plots of a8r8g8b8 source images only. If you test on a Raspberry Pi 2, then you also need to allow for the fact that the ARMv7 implementation has bilinear fast paths for src__, src__0565, over__ and add__, for most types of image repeat as well as for COVER, so these reduce the applicability of the iterator still further. In my patch series from last year, I added ARMv6 bilinear iterators for COVER plots of a8r8g8b8, x8r8g8b8, r5g6b5 and a8 images. Of course, you won't be seeing the effect of these because they haven't been accepted yet - but when (I hope) they are then having the new COVER_CLIP definition in place should help demonstrate their effectiveness. Basically, these are all just pieces of one big jigsaw puzzle. Or maybe our test set is not enough? I recall having some problems with that in the past. Yes, I know I have a number of fast paths queued up which were not represented at all in the Cairo perf traces, but were being hammered by the Epiphany browser on the Raspberry Pi. I'm not sure how we justify their inclusion if the Cairo perf traces is the only criterion allowed. I hope there will be some flexibility in this respect. If you want a more real-world example of when you might encounter bilinear scaling in one axis only, here's one that I think is plausible. Consider screen grabs of taken of an NTSC SD video at ITU656 sample positions (720x480) - or equally the output of a codec for a video at that resolution. Now try to display it with bilinear scaling at the correct aspect ratio on a display with square pixels: the destination rectangle will likely be chosen to be 640x480, with a vertical pixel increment of 1 and a horizontal pixel increment of 1.125. There's also a reasonable chance that you'll also be wanting to re-plot this 30 times per second, for obvious reasons. You'd hope that this could be achieved using a COVER fast path or iterator, but with the current flag definitions, they can't be used. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Wed, 16 Sep 2015 03:09:50 +0100, Søren Sandmannwrote: If all existing fast paths and iterators can be changed/verified to not read outside their bounds, I have no objection to changing the meaning of the existing flag (or adding a new one if that's more convenient). OK, it sounds like the way to go is use two flags for at least a transitional period. Then if and when all fast paths and iterators are changed, then we can delete the old flag. I realized that there is another use for a new flag like this: It could likely also be used here: http://cgit.freedesktop.org/pixman/tree/pixman/pixman.c#n662 so that images would be considered opaque in more cases, leading to more cases of reducing the operator from OVER to SRC. I had come across that when searching the source code for references to the COVER_CLIP flags, but had mentally filed it away because I wasn't sure at which point it would be appropriate to switch it over to using the new bilinear flag. But having thought about it a bit more, I agree it could use the new definition straight away, even before any fast paths or iterators do. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Explaining bilinear in rounding.txt
On Mon, 14 Sep 2015 12:53:15 +0100, Pekka Paalanen <ppaala...@gmail.com> wrote: On Sat, 12 Sep 2015 01:26:48 +0100 "Ben Avison" <bavi...@riscosopen.org> wrote: -- BILINEAR filtering: The BILINEAR filter calculates the linear interpolation between (i.e. a weighted mean of) the two closest pixels to the given position - one found by rounding down, one by rounding up. round_up(x) = ceil(x - o) + o round_down(x) = floor(x - o) + o The weight factor applied to each of these is given by 1 - abs(round(x) - x) except in the case where two to rounding functions amount to the same pixel - which only occurs if the given position aligns precisely with one pixel. In that case, that one pixel value is used directly. I don't understand this. We have a definition for round(x) earlier and used with NEAREST, but I don't think that is what you meant here. My interpretation was that the notation "round(x)" wasn't tied to a specific definition. It had already been used for two different purposes by that point in the document - for round to nearest with ties to plus and minus infinity. Here I was just using it as a catch-all for the two functions round_up(x) and round_down(x) whose unambiguous definitions I'd repeated in the preceding paragraph. Perhaps we should go back and rename the existing uses of round(x) to round_nearest_ties_up(x) and round_nearest_ties_down(x) since it seems this wasn't obvious? Are you saying the weights would be: w1 = 1 - (round_up(x) - x) w2 = 1 - (x - round_down(x)). And then the weigths do not sum to 1, when round_up(x) == x and round_down(x) == x, because it leads to w1 = w2 = 1? Yes, that was what I was trying to say, but more concisely. How about the following: [...] To enforce w1 + w2 = 1, we can choose between two modifications to the above choices of x1 and x2: x1 = round_down(x) x2 = x1 + 1 and x1 = x2 - 1 x2 = round_up(x). While this second pair of equalities are mathematically true, they do need to be evaluated in the opposite order, which might be worth noting. *** Application to Pixman I didn't quite follow the document structuring here. I took lines prefixed with "***" to be top-level headings and lines prefixed with "--" to be subheadings, but I think you've assumed the existing "Application to Pixman" section to be a subsection of "General notes on rounding" which stops before you get to "NEAREST filtering" and you're adding a new "Application to Pixman" subsection under "BILINEAR filtering"? Perhaps numbering the sections would make this clearer. As we have the choice of rounding x either up or down first, we consider the other pixel to be the one before or after the first pixel, respectively. This choice is directly connected to the definition of FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR in Pixman. Flag FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR signifies that bilinear sampling will not read beyond the defined image contents so that repeat modes can be ignored. As coordinates are transformed from destination space to source space and rounded only once, the logic determining the flag must assume whether the other pixel is before or after in each dimension, and whether it will be read even if its weight is 0. Pixman's choice is to assume that the other pixel is always after, and it can be read even if the weight is 0. The rounding mode is therefore round_down(x). 8< - The last paragaphs may seem out of place, but I don't see any other place atm. I'm fairly sure this stuff has not been documented anywhere before, and it would be really good to have somewhere. OTOH, rounding.txt does explain exactly which pixel gets loaded for NEAREST. All the rest of the document is a functional specification - if you changed the rounding mode used for NEAREST filtering, even to round nearest-with-ties-up, then you'd be having a material effect on what Pixman does, the test harnesses would fail, and it would hopefully be considered a bug. This subsection is a design specification - the COVER_CLIP flags are a contract between one internal part of Pixman (the operation dispatcher) and another internal part of Pixman (the fast paths and iterators). It can be changed without affecting Pixman's correctness (although its speed may well be affected by any such changes). I think I'd be tempted to put the description - either the last paragraph or the entire subsection - in a large source code comment instead - say, alongside the #define of FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR. Hiding it away in a text file increases the likelihood of it not being kept up-to- date. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Fri, 11 Sep 2015 10:13:08 +0100, Pekka Paalanenwrote: If you actually want to document things, then I think pixman/rounding.txt would be the right place (and another patch). After all, commit messages are only used to justify the patch, they are not documentation people usually read. Somehow, I don't remember ever having noticed that file! Perhaps it was because it was called rounding.txt that it never occurred to me that filtering might be documented there? It's an odd omission that it doesn't talk about BILINEAR filtering, though. However, having briefly read through the text, even though some of it goes over my head a bit, I'd say it's describing from a strictly mathematical point of view. Discussion of exactly which pixels get loaded from memory in order to reach this mathematical outcome feels outside the scope of that document to me. Here's a draft section for BILINEAR filtering, comments welcome: 8< - -- BILINEAR filtering: The BILINEAR filter calculates the linear interpolation between (i.e. a weighted mean of) the two closest pixels to the given position - one found by rounding down, one by rounding up. round_up(x) = ceil(x - o) + o round_down(x) = floor(x - o) + o The weight factor applied to each of these is given by 1 - abs(round(x) - x) except in the case where two to rounding functions amount to the same pixel - which only occurs if the given position aligns precisely with one pixel. In that case, that one pixel value is used directly. A common simplification, to avoid having to treat this case differently, is to define one (and only one) of the two round functions such that when the given positions aligns with a pixel, abs(round(x) - x) = 1, and hence the corresponding weight factor is 0. Either of the following pairs of definitions satisfy this requirement: round_down(x) = floor(x - o) + o round_up(x) = round_down(x) + 1 round_up(x) = ceil(x - o) + o round_down(x) = round_up(x) - 1 8< - Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] Remove the 8e extra safety margin in COVER_CLIP analysis
On Fri, 11 Sep 2015 12:30:23 +0100, Pekka Paalanenwrote: As samplers are allowed to fetch the pixel at x2 unconditionally, we require x1 >= 0 x2 < width I may be getting picky, but that's circular logic - the samplers are only allowed to fetch the pixel at x2 unconditionally because of the way BILINEAR_COVER_CLIP is defined in this piece of code, so you can't use it as its own justification. As I wrote the commit log originally, the premise is that some samplers expect to be able to fetch the pixel at x2 unconditionally (on at least one axis) and the conclusion is that we need to define BILINEAR_COVER_CLIP to allow that (for now). How about: Because samplers may fetch the pixel at x2 unconditionally, we require... Other than that, the series looks good - good thinking about the impact of projective transforms (though maybe someday that could do with a more rigorous examination). Also good to see Bill's way of phrasing the ultimate aim - that it'll be defined to be safe to fetch all pixels that have a non-zero weight - made it into patch 2. I think it's a clear and concise description. I had to look up what bikeshedding meant since you've used it a couple of times - very apt. I'll have to drop it into conversation sometime :) Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 3/3] test: Add cover-test v4
On Wed, 09 Sep 2015 09:37:41 +0100, Pekka Paalanenwrote: I think we need some indication whether cover-test runs with or without fencing. So far I have thought that if fence-image-self-test is skipped, then cover-test can only run without fencing. If fence-image-self-test is not skipped, then cover-test uses fencing if it is not skipped. It's perhaps a bit too subtle. Too subtle for me :) Maybe cover-test should have a single printf telling if it is fenced or not? That would show up on old autotools, but on new ones you have to go look in the logs anyway. Maybe it would be most obvious if cover-test either always used fencing or skipped. We'd lose the CRC check on too-large-page systems, but at least if we see it as a PASS, we can be sure it used fencing. How's that? Since one test is compile-time (availability of fencing) and the other is runtime (reading the page size) I admit it's easier to settle on skipping the test in both cases - the alternative would need to be a layer of runtime abstraction between fenced and non-fenced images. Perhaps a compromise is to a) skip the test if the page size is too large (i.e. treat this as an error condition, until someone is motivated to either abstract the fence image code so it can be disabled at runtime, or to rework it to support larger page sizes) b) printf a warning iff fencing isn't available (a bit like the way the PIXMAN_DISABLE parser doesn't feel the need to list the implementations that aren't being skipped) ? Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v3] test: Add cover-test
On Mon, 07 Sep 2015 10:19:04 +0100, Pekka Paalanen <ppaala...@gmail.com> wrote: On Fri, 04 Sep 2015 14:29:22 +0100 "Ben Avison" <bavi...@riscosopen.org> wrote: Obviously that would require work to the fence image code as well as to cover-test, so I thought I'd ask for opinions - is it worth the added complexity, or should we just bail out if the page size is > 32K as Pekka suggests? My opinion is that let's get this test landed now with skip on too large page size, and enchance it later if wanted. Ben, shall I add the skips and bikeshed with style and send the cover-test patch to the list once more? That's OK with me if nobody else is going to object. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v3] test: Add cover-test
On Fri, 04 Sep 2015 11:18:03 +0100, Pekka Paalanenwrote: I think we may have a problem with sizes/coordinates. With 64 kB page size, the minimum fenced image width for r5g6b5 image is 32768 pixels. GDB tells me src_img->bits.width << 16 is negative. Similarly the minimum width of an a8 image becomes 64k pixels. These coordinates do not fit in pixman_fixed_t. This is a good point. However, I have thought of a workaround. Only the image width is constrained to be < 65536 (or < 32768 if we want to avoid signed issues, since pixman_fixed_t is defined to be a signed quantity). The *stride* isn't. In other words, if we accept that any one fenced image can only be guaranteed to detect accesses off the right of an image, or the left of an image, but not both, then we can construct a page layout of * 1 protected page * enough unprotected pages to hold at least the specified minimum number of pixels * 1 protected page * more unprotected pages * 1 protected page and so on. In the case where the page size is much larger than a row, this amounts to alternating protected and unprotected pages. Then we set the stride to the distance between protected pages, and set image->bits.bits such that either the left or right edge of each row lines up with the page boundary. For example, if we wanted a fenced 8bpp image of width 16384 pixels on a system where the page size is 64K, then a left-fenced image would be laid out like this: pixels: ** ** stride: <--> protected? ^ ^ ^ ^ ^ 0 64K 128K192K256K and a right-fenced one would look like: pixels: ** ** stride: <--> protected? ^ ^ ^ ^ ^ 0 64K 128K192K256K Obviously that would require work to the fence image code as well as to cover-test, so I thought I'd ask for opinions - is it worth the added complexity, or should we just bail out if the page size is > 32K as Pekka suggests? Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
As discussed in http://lists.freedesktop.org/archives/pixman/2015-August/003905.html the 8 * pixman_fixed_e adjustment which was applied to the transformed coordinates is a legacy of rounding errors which used to occur in old versions of Pixman, but which no longer apply. For any affine transform, you are now guaranteed to get the same result by transforming the upper coordinate as though you transform the lower coordinate and add (size-1) steps of the increment in source coordinate space. No projective transform routines use the COVER_CLIP flags, so they cannot be affected. However, removing the 8 * pixman_fixed_e border exposes a bug in the calculation of the COVER_CLIP_NEAREST flag. Because the half-way cases round down, an adjustment by pixman_fixed_e is needed. This patch applies this adjustment. Many bilinear fast paths currently assume that COVER_CLIP_BILINEAR is not set when the transformed upper coordinate corresponds exactly to the last row/pixel in source space. This is due to a detail of many current implementations - they assume they can always load the pixel after the one you get by dividing the coordinate by 2^16 with rounding to -infinity. To avoid causing these implementations to exceed array bounds in these cases, the COVER_CLIP_BILINEAR flag retains this feature in this patch. Subsequent patches deal with removing this assumption, to enable cover fast paths to be used in the maximum number of cases. Proof: All implementations must give the same numerical results as bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear(). The former does int x0 = pixman_fixed_to_int (x - pixman_fixed_e); which maps directly to the new test for the nearest flag, when you consider that x0 must fall in the interval [0,width). The latter does x1 = x - pixman_fixed_1 / 2; x1 = pixman_fixed_to_int (x1); x2 = x1 + 1; but then x2 is brought back in range by the repeat() function, so it can't stray beyond the end of the source line. When you write a cover path, you are effectively omitting the repeat() function. The weight applied to the pixel at offset x2 will be zero, so if you skip the load and leave the pixel value undefined, the numerical result is unaffected, but you have avoided a memory access that could potentially cause a page fault. If however, we assume that the implementation loads from offset x2 unconditionally, then we need x1 >= 0 x2 < width so x - pixman_fixed_1 / 2 >= 0 x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1 so pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0 pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width which matches the source code lines for the bilinear case, once you delete the lines that apply the 8 * pixman_fixed_e border. --- pixman/pixman.c | 17 - test/affine-bench.c | 16 ++-- 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/pixman/pixman.c b/pixman/pixman.c index a07c577..f932eac 100644 --- a/pixman/pixman.c +++ b/pixman/pixman.c @@ -497,21 +497,12 @@ analyze_extent (pixman_image_t *image, if (!compute_transformed_extents (transform, extents, )) return FALSE; -/* Expand the source area by a tiny bit so account of different rounding that - * may happen during sampling. Note that (8 * pixman_fixed_e) is very far from - * 0.5 so this won't cause the area computed to be overly pessimistic. - */ -transformed.x1 -= 8 * pixman_fixed_e; -transformed.y1 -= 8 * pixman_fixed_e; -transformed.x2 += 8 * pixman_fixed_e; -transformed.y2 += 8 * pixman_fixed_e; - if (image->common.type == BITS) { - if (pixman_fixed_to_int (transformed.x1) >= 0 && - pixman_fixed_to_int (transformed.y1) >= 0 && - pixman_fixed_to_int (transformed.x2) < image->bits.width&& - pixman_fixed_to_int (transformed.y2) < image->bits.height) + if (pixman_fixed_to_int (transformed.x1 - pixman_fixed_e) >= 0 && + pixman_fixed_to_int (transformed.y1 - pixman_fixed_e) >= 0 && + pixman_fixed_to_int (transformed.x2 - pixman_fixed_e) < image->bits.width && + pixman_fixed_to_int (transformed.y2 - pixman_fixed_e) < image->bits.height) { *flags |= FAST_PATH_SAMPLES_COVER_CLIP_NEAREST; } diff --git a/test/affine-bench.c b/test/affine-bench.c index 9e0121e..697684b 100644 --- a/test/affine-bench.c +++ b/test/affine-bench.c @@ -396,13 +396,17 @@ main (int argc, char *argv[]) } compute_transformed_extents (, _box, ); -/* The source area is expanded by a tiny bit (8/65536th pixel) - * to match the calculation of the COVER_CLIP flags in analyze_extent() +xmin = pixman_fixed_to_int (transformed.x1 - pixman_fixed_1 / 2); +ymin = pixman_fixed_to_int (transformed.y1 - pixman_fixed_1 / 2); +xmax = pixman_fixed_to_int (transformed.x2 +
Re: [Pixman] [PATCH] test: Add cover-test
On Thu, 03 Sep 2015 11:13:25 +0100, Pekka Paalanenwrote: Unless, you go and change the implementation meaning of Pixman's cover flags which, reading your other reply, is what you are doing. I'm finally starting to see it. Glad the penny has dropped! The issue has been muddied somewhat by the 8*epsilon border. Perhaps I should have reposted the series as soon as it became clear what the history was in that respect. (from your other post on this thread) This is *not* for protecting against out-of-bounds access, but this is allowing for the extreme cases where a cover path is still *theoretically* possible. (Ignoring what the COVER flags actually mean in Pixman implementation.) Yes. Pixman as a whole should be able to handle any coordinates without bounds violations. The COVER_CLIP flags just determine the threshold coordinates at which it chooses to use a cover path rather than the appropriate repeat path (PAD, NONE, NORMAL or REFLECT); while you would get the correct mathematical result even if always use the repeat path, the cover path is very likely to be faster because it will be implemented in a way that assumes it doesn't need to do bounds checking (because the COVER_CLIP calculation has effectively done the bounds check for it). Previously, the flag definition was overly cautious, resulting in some cases which could have been handled by cover paths instead being sent to repeat paths. I first encountered this myself with lowlevel-blt-bench, which routinely used repeat paths for its bilinear tests but cover paths for its nearest tests. However, we have to be careful when adjusting the flag calculations, as the cases where bounds violations occur may only be 1*epsilon wide, and so would have a high chance of not being detected by the previous test suite in a reasonable length of time - hence the need for a new test program. When EXACT is set, cover-test only performs operations which are theoretically achievable with a cover path (and in fact, they *are* all achieved with a cover path if you apply all my patches). When it isn't set, some operations will stray into territory that definitely needs a repeat path. The size of the the fuzz factors that are applied to the coordinates was set large enough, using knowledge of the 8*epsilon factor in the old definition of the COVER_CLIP flags, that we should exercise both cover and repeat paths right up to the coordinate limits at which either old or new definition will allow them to be used. Of course, I've now changed cover-test so it doesn't do randmemset straight into fenced images anyway (due to needing to handle systems with different memory page sizes) so this is no longer an issue. D'oh, of course, that's why Oded didn't see a segfault. But he did say the CRC does not match on PPC64, neither LE nor BE. Just to be clear, is that still an issue after I fixed the DST_HEIGHT SRC_HEIGHT bug? Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Add cover-test
On Thu, 03 Sep 2015 17:14:05 +0100, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote: On Thu, 03 Sep 2015 13:59:07 +0100 "Ben Avison" <bavi...@riscosopen.org> wrote: Previously, the flag definition was overly cautious, resulting in some cases which could have been handled by cover paths instead being sent to repeat paths. I first encountered this myself with lowlevel-blt-bench, which routinely used repeat paths for its bilinear tests Oh, this was actually intended. The bilinear scaling benchmark in the lowlevel-blt-bench program tries to stress the use case, where we are fetching pixels from the outside of the source image a little bit. The reason is that this is quite typical for bilinear scaling and that's how it is designed to work. If this whole ordeal with the new COVER flag interpretation was just a way to make it run fast in lowlevel-blt-bench, then this new interpretation may not make much real practical sense. That was just where it started... I should maybe explain that I encountered it when I was writing bilinear fetchers for ARMv6, which (at the time (*)) only handled the COVER case. This meant that lowlevel-blt-bench was useless to me as it stood, because my COVER fetchers were never selected, and it was important to me because I rely on it to be able to compare different implementations (for example to select the best prefetch distance). I was worried there might be objections to changing lowlevel-blt-bench, and I also knew that I followed different code paths for different scale factors (which are fixed in lowlevel-blt-bench), many of which were so extreme that the L1/L2/M/HT/VT/RT/R tests of lowlevel-blt-bench no longer made sense - so that's how I came to write affine-bench. But while writing affine-bench, it became apparent that it's really complicated from a high level caller's point of view to construct a transformation matrix that's guaranteed to hit a COVER path - and it requires knowledge of details of the internal implementation of the guts of Pixman, even after you deal with the obsolete 8*pixman_fixed_e adjustment. This seemed wrong to me; I can imagine applications might well want to use Pixman to scale between pixel centres rather than pixel edges, and to expect such an application to know whether to exclude the left pixel, or the right pixel, or both, or neither, and the same in the vertical direction (for which the rules were often different) in order to get the best performance is surely unreasonable. Ben (*) Actually, I've been revisiting this recently, and for various reasons I've added repeat support to these fetchers. But now I'm getting a long way ahead of myself... ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 3/4] armv7/mips/sse2: Make bilinear cover fast paths use COVER_CLIP_TIGHT flag
This patch affects all functions defined by the macro FAST_BILINEAR_MAINLOOP_INT, which can be identified by the fact that their names start with the string "fast_composite_scaled_bilinear". These are currently found in arm-neon, mips-dspr2 and sse2 implementations. The inner pixel loop for these functions is provided by a separate function (typically written in assembly) whose name is given in a macro parameter. These functions accept, amongst other things, a pair of line buffer pointers (one upper, one lower) and fixed-point initial and incremental offsets into those lines. Many of these assembly functions assume that they can calculate the value of a pixel by truncating the offset of each pixel with rounding to -infinity to locate one source pixel, and unconditionally loading the one to its right as well. Rather than rework all the assembly on all three platforms, this patch permits the assembly function to continue with these assumptions, even when the C function is called such that the rightmost coordinate coincides exactly with the last source pixel. In such cases, we plot the final pixel of each row via a temporary buffer (similar to the manner in which PAD repeat is accomplished). It is worth noting that by comparison, these operations would previously have been treated as PAD/NONE/NORMAL/REFLECT repeat operations, so they would have been slower to process than other COVER cases anyway. Tested using cover-test on armv7. --- pixman/pixman-inlines.h | 69 --- 1 files changed, 65 insertions(+), 4 deletions(-) diff --git a/pixman/pixman-inlines.h b/pixman/pixman-inlines.h index 1c8441d..e53da08 100644 --- a/pixman/pixman-inlines.h +++ b/pixman/pixman-inlines.h @@ -924,6 +924,67 @@ fast_composite_scaled_bilinear ## scale_func_name (pixman_implementation_t *imp, src_width_fixed = pixman_int_to_fixed (src_width); \ } \ \ +if (PIXMAN_REPEAT_ ## repeat_mode == PIXMAN_REPEAT_COVER) \ +{ \ + /* Detect cases where a typical scanline function would read beyond array bounds. */\ + int32_t last_src_pixel = src_image->bits.width - 1; \ + max_vx = v.vector[0] + (width - 1) * unit_x; \ + if (pixman_fixed_to_int (max_vx) == last_src_pixel) \ + { \ + src_type_t buf1[2]; \ + src_type_t buf2[2]; \ + src_type_t *src_line_top; \ + src_type_t *src_line_bottom; \ + width--; \ + while (--height >= 0) \ + { \ + int weight1, weight2; \ + dst = dst_line; \ + dst_line += dst_stride; \ + vx = v.vector[0]; \ + if (flags & FLAG_HAVE_NON_SOLID_MASK) \ + { \ + mask = mask_line; \ + mask_line += mask_stride; \ + } \ + \ + y1 = pixman_fixed_to_int (vy); \ + weight2 = pixman_fixed_to_bilinear_weight (vy); \ + if (weight2) \ + { \ + /* both weight1 and weight2 are smaller than
[Pixman] [PATCH 4/4] pixman-fast-path: Make bilinear cover fetcher use COVER_CLIP_TIGHT flag
The bilinear cover fetcher was implemented with similar assumptions to FAST_BILINEAR_MAINLOOP_INT - that for all transformed destination coordinates, you could divide by 2^16, round down and add 1, and still be within the source image. This patch effectively reverses this - dividing by 2^16, rounding up and subtracting 1. The big advantage of this is we need only test for this subtracted number being out of bounds and skip the corresponding pixel load just once per row, at the start before entering the loop, whereas with the original scheme, you would need to check every pixel. To make the rounding up a simple operation, all the X coordinates (including increments) are negated. There is an additional offset of (1 << 16 - BILINEAR_INTERPOLATION_BITS)) - 1, to allow for the fact that coordinates are truncated with rounding towards -infinity during generation of weighting factors. Because the weight is inverted along with the coordinates, you will also see that the weight is now considered as the fraction of the way from the right pixel to the left pixel, rather than vice versa - none of this increases the computational complexity per pixel. Tested using cover-test on armv7 with PIXMAN_DISABLE="arm-neon arm-simd" --- pixman/pixman-fast-path.c | 37 + 1 files changed, 25 insertions(+), 12 deletions(-) diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c index 53d4a1f..a17ca26 100644 --- a/pixman/pixman-fast-path.c +++ b/pixman/pixman-fast-path.c @@ -2280,18 +2280,29 @@ fetch_horizontal (bits_image_t *image, line_t *line, int y, pixman_fixed_t x, pixman_fixed_t ux, int n) { uint32_t *bits = image->bits + y * image->rowstride; -int i; +int i, x0, x1; +int32_t dist_x; +uint32_t left, right; + +x = (1 << (16 - BILINEAR_INTERPOLATION_BITS)) - 1 - x; +ux = -ux; + +x0 = pixman_fixed_to_int (x); +x1 = x0 + 1; +dist_x = pixman_fixed_to_bilinear_weight (x); +left = dist_x ? *(bits - x1) : 0; for (i = 0; i < n; ++i) { - int x0 = pixman_fixed_to_int (x); - int x1 = x0 + 1; - int32_t dist_x; - - uint32_t left = *(bits + x0); - uint32_t right = *(bits + x1); + if (i > 0) + { + x0 = pixman_fixed_to_int (x); + x1 = x0 + 1; + dist_x = pixman_fixed_to_bilinear_weight (x); + left = *(bits - x1); + } - dist_x = pixman_fixed_to_bilinear_weight (x); + right = *(bits - x0); dist_x <<= (8 - BILINEAR_INTERPOLATION_BITS); #if SIZEOF_LONG <= 4 @@ -2301,11 +2312,11 @@ fetch_horizontal (bits_image_t *image, line_t *line, lag = (left & 0xff00ff00) >> 8; rag = (right & 0xff00ff00) >> 8; - ag = (lag << 8) + dist_x * (rag - lag); + ag = (rag << 8) + dist_x * (lag - rag); lrb = (left & 0x00ff00ff); rrb = (right & 0x00ff00ff); - rb = (lrb << 8) + dist_x * (rrb - lrb); + rb = (rrb << 8) + dist_x * (lrb - rrb); *((uint32_t *)(line->buffer + i)) = ag; *((uint32_t *)(line->buffer + i) + 1) = rb; @@ -2323,7 +2334,7 @@ fetch_horizontal (bits_image_t *image, line_t *line, lagrb = (((uint64_t)lag) << 24) | lrb; ragrb = (((uint64_t)rag) << 24) | rrb; - line->buffer[i] = (lagrb << 8) + dist_x * (ragrb - lagrb); + line->buffer[i] = (ragrb << 8) + dist_x * (lagrb - ragrb); } #endif @@ -2350,6 +2361,8 @@ fast_fetch_bilinear_cover (pixman_iter_t *iter, const uint32_t *mask) y0 = pixman_fixed_to_int (info->y); y1 = y0 + 1; +if (y1 == iter->image->bits.height) + y1 = y0; dist_y = pixman_fixed_to_bilinear_weight (info->y); dist_y <<= (8 - BILINEAR_INTERPOLATION_BITS); @@ -3187,7 +3200,7 @@ static const pixman_iter_info_t fast_iters[] = (FAST_PATH_STANDARD_FLAGS| FAST_PATH_SCALE_TRANSFORM | FAST_PATH_BILINEAR_FILTER | - FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR), + FAST_PATH_SAMPLES_COVER_CLIP_TIGHT_BILINEAR), ITER_NARROW | ITER_SRC, fast_bilinear_cover_iter_init, NULL, NULL -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v3] test: add fence-image-self-test
On Wed, 02 Sep 2015 09:15:24 +0100, Pekka Paalanenwrote: Changes in v3: - Do not print progress messages unless VERBOSE environment variable is set. Avoid spamming the terminal output of 'make check' on some versions of autotools. I can confirm that "make check" is now quiet for me. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Add cover-test
On Wed, 02 Sep 2015 12:56:40 +0100, Pekka Paalanenwrote: Right. I looked at fast_bilinear_cover_iter_init() and fast_fetch_bilinear_cover() -> fetch_horizontal(), and yes, that really is how it is implemented. The leftmost pixel is chosen essentially by n = pixman_fixed_to_int(x - pixman_fixed_1 / 2). So, pixman_fixed_1 / 2 -> 0, not -1. The thing that confuses me is that with nearest, x=0 will sample pixel n=-1. However with bilinear, x=0.5 will sample pixels n=0 and n=1, not n=-1 and n=0. When x=0.5, the numerical output of bilinear scaling only depends upon source pixel n=0. The stuff about whether you consider pixel n=+1 or n=-1 to be involved but with a weighting factor of 0 is incidental to the numerical output. With nearest scaling, you don't have the option of sitting on the fence so you have to round one way or the other. If you look at the following ASCII art diagrams, the vertical axis is the contribution from source pixel 0 (+) and source pixel 1 (x). Where these coincide, I have used a *. Each column represents an increment of 0.125 (you'll have to read the labels on the horizontal axis vertically). Upper diagram is nearest, lower diagram is bilinear. * - - + + + + + + 1 0 0 0 1 1 2 2 3 . . . . . . . . . 0 5 0 5 0 5 0 5 0 + x + + x x + * x + x + x * * See how the bilinear diagram is already symmetrical; no rounding is needed of coordinates on the horizontal axis. Of course, coordinates actually have 13 more bits of granularity than this, but the same patterns hold. > - Looks like it processes the whole stride, not width * bytespp, > which means this is going to explode on fenced images in the > padding, right? Yes, looks like it is. Good spot. That probably does belong in a separate patch; are you volunteering? I should, shouldn't I? :-) I'll do it. Of course, I've now changed cover-test so it doesn't do randmemset straight into fenced images anyway (due to needing to handle systems with different memory page sizes) so this is no longer an issue. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Add cover-test
On Wed, 02 Sep 2015 14:03:01 +0100, Pekka Paalanenwrote: I am still a tiny bit struggling with why 31 and not 32, but it does add up. Maybe the reasoning in the comments in random_scale_factor() make more sense? There, it only talks about the numbers as integers, avoiding the additional confusion caused by MAX_INC being 0.32 fixed-point and the return value being 16.16 fixed-point. If you still want to think of it as real numbers, then: * random_scale_factor() generates a scale factor between 2^(-LOG2_MAX_FACTOR-0.5) and 2^(LOG2_MAX_FACTOR+0.5) * INV_SQRT_2_0POINT32_FIXED represents 2^-0.5 * to get from 2^-0.5 to 2^(LOG2_MAX_FACTOR+0.5) we must multiply by 2^(LOG2_MAX_FACTOR+1) * but to convert from 0.32 to 16.16 fixed point, we multiply by 2^(-32+16) * so the total factor is 2^(LOG2_MAX_FACTOR+1-32+16) * but this is < 1, so is actually a shift right by -LOG2_MAX_FACTOR-1+32-16, which can trivially be rearranged to match the macro definition. +static pixman_image_t * +create_src_image (pixman_format_code_t fmt) +{ [...] +prng_randmemset (tmp_img->bits.bits, + tmp_img->bits.rowstride * DST_HEIGHT * sizeof (uint32_t), 0); DST_HEIGHT? Should that not be SRC_HEIGHT? I think you're right, yes - looks like careless copy-and-paste from the code to initialise the destination image. That means part of the image was uninitialised, yet it never showed up as a difference on subsequent runs. The checksums will need updating too. I'll repost shortly. +static void +check_transform (pixman_image_t *dst_img, + pixman_image_t *src_img, + pixman_transform_t *transform, + pixman_bool_t bilinear) +{ [...] +if (bilinear) +{ +assert (v1.vector[0] >= pixman_fixed_1 / 2); +assert (v1.vector[1] >= pixman_fixed_1 / 2); +assert (v2.vector[0] <= pixman_int_to_fixed (src_img->bits.width) - +pixman_fixed_1 / 2); +assert (v2.vector[1] <= pixman_int_to_fixed (src_img->bits.height) - +pixman_fixed_1 / 2); Since we had that discussion about bilinear filtering sampling pixels at n=0,1 instead of n=-1,0, should the last two asserts not have < instead of <=? No. calc_translate() specifically ensures that (depending upon low_align) either the lower or upper coordinate exactly corresponds to the lowest or highest value that contains no non-zero-weighted contribution from any pixel beyond the source image (and the image sizes are chosen so that the opposite coordinate should always fall within the source image too, at the largest possible increment). These are in fact the very cases that are most likely to trigger a fast path or fetcher to perform an out-of bounds access, so they were deliberately included. I mean, wouldn't sampling for x=width-0.5 cause reads on pixels n=width-1,width, where the latter would be out of bounds (even if weight zero)? My recent 7-patch series deals with precisely this case: http://lists.freedesktop.org/archives/pixman/2015-August/003878.html Before this patch series, the conditions under which the FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR flag is set mean that this case never gets passed to a "cover" fast path, even though such fast paths can actually handle them by at least these three methods: * check the weight and don't load the upper sample if it's zero (it turns out this can be achieved for free with ARM fetchers) * invert the increments so that when x=width-0.5 it reads samples n=width-2,width-1 and applies a weight of 0 to the former (I used this for pixman-fast-path.c) * detect the case in the C binding function and handle it separately (I used this for the existing armv7/mips/sse2 fast paths) The fuzz factor added into the non-EXACT case means that even with the old rules for the cover clip flag, we should end up exercising both cover fast paths and repeat fast paths at least some of the time, right up to the limits of their applicability. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v2 0/3] Add fence_image_create_bits() test helper
On Fri, 28 Aug 2015 12:53:35 +0100, Pekka Paalanen ppaala...@gmail.com wrote: This is v2, addressing the review comments from the last time: http://lists.freedesktop.org/archives/pixman/2015-May/003643.html The changes all look good to me. One minor thing: the self-test spews a lot of text to my console when I'm running make check, whereas none of the other tests do in normal functioning (save for an occasional message about the final CRC for some tests). I've a feeling this might be something to do with the version of autotools I'm running, but if so it might be nice to at least record this in a comment somewhere? Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Selectively disabling fast paths
On Fri, 28 Aug 2015 13:43:06 +0100, Pekka Paalanen ppaala...@gmail.com wrote: On Thu, 27 Aug 2015 17:20:26 +0100 Ben Avison bavi...@riscosopen.org wrote: One thing it wouldn't be able to detect, though, would be where the fetch/ combine/writeback iterators are faster than fast paths for the *same* implementation level - such as with the ARMv6 nearest-scaled patches I was revisiting recently. In that specific case, it turned out that my original solution of bespoke C wrappers for the fetchers turned out to be even faster - but we don't have any way at present of detecting if there are other cases where we would be better off deleting the fast paths and letting the iterators do the work instead. Sorry, but I'm a bit hazy on the details here. Based on the discussions, I have developed the following mental model: 1. asm fast paths (whole operation) 2. C fast paths (whole operation) 3. _general_composite_rect (fetch/combine/writeback; iterators) - asm implementation or - C implementation for each Yes, that's pretty much it, except that some platforms have multiple levels of asm fast paths, and some or all of those will be enabled depending upon the CPU features detected via a combination of compile- time and runtime tests. Basically, there is a chain of pixman_implementation_t structs, in decreasing priority order (that's why you'll see the name implementation used to refer to a set of routines tuned for a particular instruction set). Each implementation contains a table of pixman_fast_path_t structs (which we refer to as fast paths) and a table of pixman_iter_info_t structs (the fetcher and writeback iterators) and an array of combiner routines and a few other bits and pieces. For example, on an ARMv7 platform, you'll normally find the following implementations are enabled, in decreasing priority order: pixman-noop.c (can't be disabled) pixman-arm-neon.c (unless PIXMAN_DISABLE contains arm-neon) pixman-arm-simd.c (unless PIXMAN_DISABLE contains arm-simd) pixman-fast-path.c (unless PIXMAN_DISABLE contains fast) pixman-general.c (can't be disabled; also references last-resort functions in pixman-bits-image.c / pixman-*-gradient.c / pixman-combine32.c / pixman-combine-float.c) When you call pixman_image_composite(), it scans through the fast paths from each implementation in order, looking for one which matches the criteria in the fast path tables. pixman-general.c contains a single fast path, which is universally applicable, and therefore handles anything that wasn't caught by higher implementations - and it uses the function general_composite_rect(). In turn, general_composite_rect scans the implementations in order, looking for fetchers, combiners and writeback function which will allow it to perform the requested operation line by line, stage by stage. When you set PIXMAN_DISABLE, you knock out the whole of an implementation, both its fast paths and its iterators/combiners. The point I was trying to make (badly, it seems) is that iterators/ combiners are relatively widely applicable, and are chosen at lower priority than all the fast paths, but because they were developed relatively recently, many of the fast paths have never had their performance compared against the iterators/combiners to see if their inclusion is perhaps no longer warranted since the iterators/combiners were added. Maybe we could fix that by introducing a PIXMAN_DISABLE=wholeop or similar, that would disable all whole operation fast paths, but leave the iterator paths untouched? Should I do that, would it be worth it? It could probably be done in _pixman_implementation_create(), as long as _pixman_implementation_create_general() explicitly initialises imp-fast_paths so that at least general_composite_rect() always ends up on the chain of fast paths. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/7] armv7/mips/sse2: Fix bounds violations in bilinear cover scaled fast paths
On Thu, 27 Aug 2015 04:44:58 +0100, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: On Mon, 24 Aug 2015 21:42:04 +0100 Ben Avison bavi...@riscosopen.org wrote: The current COVER flag assumes that every pixel is fetched from the source image as a 2x2 block (regardless of whether the fractional part of the offset is zero or not) and this is explicitly expected to be safe. The situation isn't quite that simple though. With the bilinear fast paths defined using FAST_BILINEAR_MAINLOOP_INT(), the height of the block is already set to 1 when the fractional part is 0 - albeit as a side effect of not being able to express a fixed-point weight of 1.0 in a 16-bit value: http://cgit.freedesktop.org/pixman/tree/pixman/pixman-inlines.h?id=pixman-0.33.2#n947 My ARMv6 bilinear fetchers (the ones still waiting to be committed) already contained an optimisation to avoid processing a scanline whose vertical weight was 0. Patch 6 of this series is a tweak to fast_fetch_bilinear_cover() so it skips such scanlines, and ssse3_fetch_bilinear_cover() would benefit from the same optimisation. In the horizontal direction, it was actually Søren's post (accidentally or otherwise) which first prompted me to see there's an issue: if you're going to say that you always fetch two source pixels even when the coordinate is exactly aligned to a single source pixel, then who's to say whether the zero-weighted pixel is the one to the right of the true pixel rather than the one to its left? In practice, FAST_BILINEAR_MAINLOOP_INT chooses the one on the right, and this happens to match your suggested change to the calculation of the FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR, where we simply remove the 8 * pixman_fixed_e border. As a reminder of what I wrote earlier, this test corresponds to: transformed.x1 = pixman_fixed_1 / 2 transformed.y1 = pixman_fixed_1 / 2 transformed.x2 image-bits.width * pixman_fixed_1 - pixman_fixed_1 / 2 transformed.y2 image-bits.height * pixman_fixed_1 - pixman_fixed_1 / 2 If we assume that zero-weighted pixels to the left are always loaded, these would need to be the tests instead: transformed.x1 pixman_fixed_1 / 2 transformed.y1 pixman_fixed_1 / 2 transformed.x2 = image-bits.width * pixman_fixed_1 - pixman_fixed_1 / 2 transformed.y2 = image-bits.height * pixman_fixed_1 - pixman_fixed_1 / 2 Søren talked about samples with index -1 being fetched - in other words, the zero-weighted pixel being the one to the left of the true one. This actually turns out to be more efficient, because to avoid out-of-bounds accesses when transformed.x1 == pixman_fixed_1 / 2, you only need to test the first pixel on each row, which can be moved outside the loop. Any later pixels on the same row which might fetch a zero-weighted pixel to the left, you at least know that the zero-weighted pixel is within the pixel row. If the zero-weighted pixel were to be to the right, you'd need to test every single pixel, which would normally be a significant speed hit. It turns out to be surprisingly simple to cause the zero-weighted pixel to be on the left - just negate the X increments, as demonstrated by patch 6. Things work even more smoothly with my ARMv6 bilinear fetchers (though I haven't reposted them yet). I was able to take advantage of ARM conditional execution to avoid reading zero-weighted pixels with zero cycles overhead (and in fact some speed gain due to fewer cachelines being fetched from RAM). The only necessary change to the core code: adds accumulator, accumulator, increment_fractional ldrcs in0, [source_ptr, #increment_integer + 1]! ldrcc in0, [source_ptr, #increment_integer]! ldrin1, [source_ptr, #1] is to change the last ldr to an ldrne. This is actually an exception to what I wrote above If the zero-weighted pixel were to be to the right, you'd need to test every single pixel, which would normally be a significant speed hit. But since conditional execution of arbitrary instructions is pretty unique to ARM, it make sense to use the negated-increment method for the generic C implementation. However with your new proposed definition of the COVER_CLIP_BILINEAR flag, the area is shrinked by pixman_fixed_e on the right side in (enlarged by pixman_fixed_e, but I think that's what you meant) order to allow a special corner case to pass through. And this is where the bounds violations are likely coming from. [...] Is there actually a real bug in the current pixman code? Because the commit summary looks kinda scary and may be misinterpreted by the general public. Don't worry, the existing Pixman code doesn't have any bounds violations. In fact, there aren't any bounds violations at all, despite what the commit summaries say. To explain this: quite late on in development, I added the FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR_APPROX flag, as an insurance policy against me not finding a volunteer to adapt the SSSE3 bilinear fetcher. But then I realised that if I added the flag early on, I
Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)
On Thu, 27 Aug 2015 10:59:52 +0100, Pekka Paalanen ppaala...@gmail.com wrote: It would be *really* nice if we could somehow use a benchmark mode where we could run an operation with every possible implementation and compare them. I wonder, can we already do that with PIXMAN_DISABLE? That would certainly help detect some issues, where we get worse performance at supposedly more advanced implementation levels. Obviously it wouldn't make any sense to bother with certain combinations, such as PIXMAN_DISABLE=arm-neon when testing on an x86. One thing it wouldn't be able to detect, though, would be where the fetch/ combine/writeback iterators are faster than fast paths for the *same* implementation level - such as with the ARMv6 nearest-scaled patches I was revisiting recently. In that specific case, it turned out that my original solution of bespoke C wrappers for the fetchers turned out to be even faster - but we don't have any way at present of detecting if there are other cases where we would be better off deleting the fast paths and letting the iterators do the work instead. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v2] scaling-test: list more details when verbose
On Thu, 27 Aug 2015 12:16:49 +0100, Pekka Paalanen ppaala...@gmail.com wrote: [Pekka: redo whitespace and print src,dst,mask x and y.] Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk Reviewed-by: Ben Avison bavi...@riscosopen.org ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] test: Add cover-test
This test aims to verify both numerical correctness and the honouring of array bounds for scaled plots (both nearest-neighbour and bilinear) at or close to the boundary conditions for applicability of cover type fast paths and iter fetch routines. It has a secondary purpose: by setting the env var EXACT (to any value) it will only test plots that are exactly on the boundary condition. This makes it possible to ensure that cover routines are being used to the maximum, although this requires the use of a debugger or code instrumentation to verify. --- test/Makefile.sources |1 + test/cover-test.c | 424 + 2 files changed, 425 insertions(+), 0 deletions(-) create mode 100644 test/cover-test.c diff --git a/test/Makefile.sources b/test/Makefile.sources index 5b9ae84..aece143 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -27,6 +27,7 @@ TESTPROGRAMS = \ glyph-test\ solid-test\ stress-test \ + cover-test\ blitters-test \ affine-test \ scaling-test \ diff --git a/test/cover-test.c b/test/cover-test.c new file mode 100644 index 000..dbdf0f8 --- /dev/null +++ b/test/cover-test.c @@ -0,0 +1,424 @@ +/* + * Copyright © 2015 RISC OS Open Ltd + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of the copyright holders not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. The copyright holders make no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Ben Avison (bavi...@riscosopen.org) + * + */ + +/* + * This test aims to verify both numerical correctness and the honouring of + * array bounds for scaled plots (both nearest-neighbour and bilinear) at or + * close to the boundary conditions for applicability of cover type fast paths + * and iter fetch routines. + * + * It has a secondary purpose: by setting the env var EXACT (to any value) it + * will only test plots that are exactly on the boundary condition. This makes + * it possible to ensure that cover routines are being used to the maximum, + * although this requires the use of a debugger or code instrumentation to + * verify. + */ + +#include utils.h +#include stdlib.h +#include stdio.h + +/* Approximate limits for random scale factor generation - these ensure we can + * get at least 8x reduction and 8x enlargement. + */ +#define LOG2_MAX_FACTOR (3) + +/* 1/sqrt(2) (or sqrt(0.5), or 2^-0.5) as a 0.32 fixed-point number */ +#define INV_SQRT_2_0POINT32_FIXED (0xB504F334u) + +/* The largest increment that can be generated by random_scale_factor(). + * This occurs when the mantissa part is 0x and the exponent + * part is -LOG2_MAX_FACTOR. + */ +#define MAX_INC ((pixman_fixed_t)(INV_SQRT_2_0POINT32_FIXED (31 - 16 - LOG2_MAX_FACTOR))) + +/* Minimum source width (in pixels) based on a typical page size of 4K and + * maximum colour depth of 32bpp. + */ +#define MIN_SRC_WIDTH (4096 / 4) + +/* Derive the destination width so that at max increment we fit within source */ +#define DST_WIDTH (MIN_SRC_WIDTH * pixman_fixed_1 / MAX_INC) + +/* Calculate heights the other way round - no limits due to page alignment here */ +#define DST_HEIGHT 3 +#define SRC_HEIGHT ((DST_HEIGHT * MAX_INC + pixman_fixed_1 - 1) / pixman_fixed_1) + +/* At the time of writing, all the scaled fast paths use SRC, OVER or ADD + * Porter-Duff operators. XOR is included in the list to ensure good + * representation of iter scanline fetch routines. + */ +static const pixman_op_t op_list[] = { +PIXMAN_OP_SRC, +PIXMAN_OP_OVER, +PIXMAN_OP_ADD, +PIXMAN_OP_XOR, +}; + +/* At the time of writing, all the scaled fast paths use a8r8g8b8, x8r8g8b8 + * or r5g6b5, or red-blue swapped versions of the same. When a mask channel is + * used, it is always a8 (and so implicitly not component alpha). a1r5g5b5 is + * included because it is the only other
Re: [Pixman] [PATCH 1/7] Refactor calculation of cover flags
On Thu, 27 Aug 2015 03:55:07 +0100, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: -/* Expand the source area by a tiny bit so account of different rounding that - * may happen during sampling. Note that (8 * pixman_fixed_e) is very far from - * 0.5 so this won't cause the area computed to be overly pessimistic. - */ -transformed.x1 -= 8 * pixman_fixed_e; -transformed.y1 -= 8 * pixman_fixed_e; -transformed.x2 += 8 * pixman_fixed_e; -transformed.y2 += 8 * pixman_fixed_e; Thanks for spotting this! I think that this code was used to compensate matrix multiplication accuracy problems in older versions of pixman. But we have already fixed these accuracy problems some time ago: [...] Now we can drop this 8 * pixman_fixed_e adjustment because there is no accuracy loss in the matrix multiplication for affine transformations. Ah, thank you! I couldn't work out what rounding it was that the comment was referring to, and it seemed to have been quite deliberate. Søren could only recall the related issue with bilinear scalers reading pixel pairs where one source pixel might be read from outside the source array if its weight was going to be 0. And it is probably better to just do this simple fix with a single patch. There is no need to spread it across multiple patches. Just as you do it in http://lists.freedesktop.org/archives/pixman/2015-August/003877.html this part becomes: if (pixman_fixed_to_int (transformed.x1 - pixman_fixed_e) = 0 pixman_fixed_to_int (transformed.y1 - pixman_fixed_e) = 0 pixman_fixed_to_int (transformed.x2 - pixman_fixed_e) image-bits.width pixman_fixed_to_int (transformed.y2 - pixman_fixed_e) image-bits.height) Glad you agree about the missing pixman_fixed_e offset, which was disguised by the old 8 * pixman_fixed_e enlargement. I originally wrote this stuff as a single patch, but I got the impression that it was too much to digest in one go for most people, hence why I split it up. Perhaps the compromise is to go for two patches, one to deal with the 8 * pixman_fixed_e issue, and one to deal with bilinear scaling with zero-weight pixels just beyond the edges of the image. For those following the other thread where I'm describing the operation of the cover-test program, you may want to note that if you multiply both sides of the above inequalities by pixman_fixed_1, you get transformed.x1 - pixman_fixed_e = 0 transformed.y1 - pixman_fixed_e = 0 transformed.x2 - pixman_fixed_e image-bits.width * pixman_fixed_1 transformed.y2 - pixman_fixed_e image-bits.height * pixman_fixed_1 which is equivalent to transformed.x1 = pixman_fixed_e transformed.y1 = pixman_fixed_e transformed.x2 = image-bits.width * pixman_fixed_1 transformed.y2 = image-bits.height * pixman_fixed_1 which lines up nicely with my description of which coordinates correspond to which source pixels. And [the bilinear case] does not need any changes. At least as far as dropping 8 * pixman_fixed_e is concerned. Reworking these in the same way, you get transformed.x1 - pixman_fixed_1 / 2 = 0 transformed.y1 - pixman_fixed_1 / 2 = 0 transformed.x2 + pixman_fixed_1 / 2 image-bits.width transformed.y2 + pixman_fixed_1 / 2 image-bits.height transformed.x1 = pixman_fixed_1 / 2 transformed.y1 = pixman_fixed_1 / 2 transformed.x2 image-bits.width * pixman_fixed_1 - pixman_fixed_1 / 2 transformed.y2 image-bits.height * pixman_fixed_1 - pixman_fixed_1 / 2 and then my remaining changes basically boil down to arguing that these last two rows should be = not . That could sensibly be a separate patch. I believe that neither of FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR flags matters for projective transformations, but this can be additionally checked just in case. I did survey all the existing fast paths and fetchers and discovered that both the cover flags were only being used for affine transforms when I was trying to work out what the 8 * pixman_fixed_e rounding might have been referring to. By check, do you mean we should ensure that the flags are not set for projective transforms? If so, I see that compute_image_info() is called before analyze_extent() so it could be achieved by adding an extra test in analyze_extent() that FAST_PATH_AFFINE_TRANSFORM is set before setting the cover flags. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Add cover-test
On Thu, 04 Jun 2015 14:00:30 +0100, Pekka Paalanen ppaala...@gmail.com wrote: On Tue, 26 May 2015 23:58:30 +0100 Ben Avison bavi...@riscosopen.org wrote: +#define SQRT_0POINT5_0POINT32_FIXED (0xB504F334u) Ok, I checked. Mathematically it is equivalent, but maybe a slightly more obvious name would be INV_SQRT_2_...? Fair enough, renamed (will repost when I've resolved everything else below). +#define MAX_INC (SQRT_0POINT5_0POINT32_FIXED (31 - 16 - LOG2_MAX_FACTOR)) This evaluates to 11.314 decimal I think. It does match 2^3.5 with an error in the order of 3e-6, is what Octave tells me. log2(abs(error)) = -18.321, so if I'm thinking right, it means around 18 fractional bits being correct, which means pixman_fixed_t should be accurate to the last bit? I suppose it should be obvious from the integer math, but I just checked with Octave. It's basically just the largest random number that can be generated by random_scale_factor(), in other words if you start out with the largest number and shift it right by the smallest number of places. I hope that makes it clearer where it comes from - I'll update the comment. MAX_INC is a pixman_fixed_t value, right? Could use a comment for that, or making it a static const variable. Yes. I've a feeling some compilers might force a load and runtime calculation of expressions if you use static const, so I'll stick a cast in the definition (being careful because pixman_fixed_t is signed but we need to use an unsigned shift to generate the number). I think it would be valuable to explain these high level goals in a comment at the top, even when they are in the commit message. I've duplicated the commit message in a comment. +static pixman_fixed_t +random_scale_factor(void) [...] Out of curiosity and because it's hard to reason about these things, I did an experiment, and generated 10 million samples with this function. The resulting distribution in log2 space is here: https://people.collabora.com/~pq/pixman/random_scale_factor.png I think it is close to uniform enough, right? And covers the expected range. Nice diagram. For these purposes, I'd say it was close enough to uniform; no particular value is more than 2x more or less likely than any other. Without the shift step in the random number generator, what you'd have instead would be a ramp with one end being soemthing like 128x more likely than the other. You'd also get effects such as the probability of a plot which was an enlargement in both X and Y being only 1 in 64, even though that's quite a likely outcome in real life; this way it's 1 in 4. +static pixman_fixed_t +calc_translate (intdst_size, +intsrc_size, +pixman_fixed_t scale, +pixman_bool_t low_align, +pixman_bool_t bilinear) +{ +pixman_fixed_t ref_src, ref_dst, scaled_dst; + +if (low_align) +{ +ref_src = bilinear ? pixman_fixed_1 / 2 : pixman_fixed_e; Why the bilinear case is not pixman_fixed_1 / 2 + pixman_fixed_e? At this point we're determining the translations required so that the first destination pixel, when transformed into source space, has exactly the lowest value it can have without straying into the repeat zones. With bilinear scaling, a coordinate of pixman_fixed_1 / 2 has pixel value determined only by source pixel 0. A coordinate of pixman_fixed_1 * 3 / 2 has pixel value determined only by source pixel 1. Any value in between has to be assumed to contain elements from both pixels 0 and 1. With nearest scaling, any coordinate between pixman_fixed_e and pixman_fixed_1 inclusive is determined by source pixel 0. Yes, this is slightly counter-intuitive, but it's the way Pixman has always done it. See for example in bits_image_fetch_nearest_affine() in pixman-fast-path.c where we have the lines x0 = pixman_fixed_to_int (x - pixman_fixed_e); y0 = pixman_fixed_to_int (y - pixman_fixed_e); There is no equivalent offset by pixman_fixed_e for bilinear scaling, for example in fast_bilinear_cover_iter_init() also in pixman-fast-path.c: info-x = v.vector[0] - pixman_fixed_1 / 2; info-y = v.vector[1] - pixman_fixed_1 / 2; All the other implementations follow suit (they have to, or they wouldn't pass the make check suite.) +scaled_dst = ((uint64_t) ref_dst * scale + pixman_fixed_1 / 2) / pixman_fixed_1; ref_dst is 16.16 fp, scale is 16.16 fp, so the product is 32.32 fp, but adding pixman_fixed_1/2 to that actually means +0.5/65536, not +0.5. And the final division then just scales it back to .16 fp. Basically it's adding 0.5 * pixman_fixed_e... for rounding to nearest representable value? Yes, the transform coefficients, source and destination coordinates are all 16.16 fixed point, and round-to-nearest is how Pixman handles this in all sorts of places, for example in pixman_transform_point_31_16_3d(). It's worth noting that since we increment the destination coordinate
Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)
On Wed, 26 Aug 2015 09:46:49 +0100, Pekka Paalanen ppaala...@gmail.com wrote: It's clearly controversial to add C fast paths, because it affects all targets that don't have an asm fast path for the same, and we cannot tell by just review whether it is going to be for better or (much) worse. Yes, it's always going to be a risk changing the cross-platform functions. Few developers are going to be able to test all the supported platforms, so we're always going to need help checking performance. Should that be a reason not to even try C fast paths though? It would be good to get some understanding of why VMX appears not to benefit from any sort of over_n_ fast path, considering that so many other platforms have added one. The information might be useful in designing any future cross-platform code. Rpi should not be affected if we drop the C fast path patch, there is still the asm fast path patch. Right? Yes, any ARM targets (ARMv6 or ARMv7) would use the ARMv6 implementation in preference to the C one. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/7] Cover scaling patches request for assistance
On Wed, 26 Aug 2015 10:36:06 +0100, Pekka Paalanen ppaala...@gmail.com wrote: On Mon, 24 Aug 2015 21:41:59 +0100 Ben Avison bavi...@riscosopen.org wrote: Towards this goal, the following patches were posted to the list - and they seem to have escaped Patchwork's notice: http://lists.freedesktop.org/archives/pixman/2015-May/003644.html http://lists.freedesktop.org/archives/pixman/2015-May/003645.html http://lists.freedesktop.org/archives/pixman/2015-May/003646.html these three are marked as Changes requested, so they are on my plate, waiting to be re-sent. OK, I realise now I was confused by the default filter on Patchwork, which hid all those patches. I see how to find them now. http://lists.freedesktop.org/archives/pixman/2015-May/003678.html This one is http://patchwork.freedesktop.org/patch/50516/ That has actions on me, I see. Hmm, I seem to have forgotten about that one too. I was sort of waiting for a reply to http://lists.freedesktop.org/archives/pixman/2015-June/003728.html and then forgot and wandered off to vacation. I think I was probably waiting for updates on the three related patches of yours... deadlock! Do I understand that right, that this series supersedes: http://patchwork.freedesktop.org/patch/49937/ Hmm, is that the only one? Yes, I've just updated its status in Patchwork, now I understand a bit better how it works. It turned out to be a bit of a can of worms, that patch - it was responsible for the cover-test program and all your fence malloc stuff as well. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Resolve implementation-defined behaviour for division rounded to -infinity
On Wed, 26 Aug 2015 10:22:22 +0100, Siarhei Siamashka siarhei.siamas...@gmail.com wrote: On Fri, 14 Aug 2015 15:06:07 +0100 Ben Avison bavi...@riscosopen.org wrote: The previous implementations of DIV and MOD relied upon the built-in / and % operators performing round-to-zero. This is true for C99, but rounding is implementation-defined for C89 when divisor and/or dividend is negative Do you have a practical example of an existing C89 compiler, which differs from C99 when handling '/' and '%' operators? No, but I'd have thought it was bad practice to assume C99 behaviour when compiling C89. The absence of any discussion in the accompanying comments made me wonder if the author was aware of the issue - it's certainly relevant when the whole point of those macros is to force a particular rounding direction when dealing with negative numbers. My understanding is that C compilers just used to emit a single division instruction as implemented by the target processor. This provides the best performance, but is not perfectly portable in theory. But in practice, after decades of evolution, all the remaining (major) CPU architectures happen to handle integer division rounding in the same way (round-to-zero). And C99 just promoted the de-facto standard to the official standard status (regardless of whatever was the official justification). Don't forget not all architectures even have divide instructions - ARM only just started regularly implementing them (at least for its A profile CPUs) at the Cortex-A15. Typically it's then the job of a runtime support function to do the division, and who's to say how that works? I'm not saying that this patch is wrong, but how can we be sure that it is doing the right thing? The new variant of this code still relies on / and % operators (which are implementation-defined) and uses them with negative numbers. A more in-depth explanation would be useful OK, a bit of context. I was working on some iterator code (not yet posted) which supported tiled repeat, which caused me to inspect the repeat() function in pixman-inlines.h, which in turn uses the MOD macro. This immediately set off alarm bells in my head, because I'd encountered the woolly definition in C89 several years ago. I confess that my approach to writing a correct algorithm was to hit Google and adapt one of the first well-reasoned hits I encountered into macro form, to let somebody else do the hard work. The page I used was http://www.microhowto.info/howto/round_towards_minus_infinity_when_dividing_integers_in_c_or_c++.html which cites a couple of other references. As described on that page, the approach it uses will work irrespective of the rounding rules employed by the built-in / and % operators. If we are really worried about rounding in general, should we review all the uses of / and % operators in the code too? And for the sake of consistency introduce new macro variants, which are rounding towards zero? Arguably, maybe we should aspire to, yes. Whenever the dividend and divisor are both positive, there's no ambiguity, and this should rule out the vast majority of cases that don't already need round-to-minus- infinity, I'd expect. Even if this isn't done immediately, it wouldn't prevent us from getting the minus-infinity case right now. Of course, there is an easy fix, which is to say that Pixman has to be compiled in C99 mode, or at least to wrap different implementations in #if __STDC_VERSION__ 199901L. There is also additional concern about the performance. Is the new code slower/faster than the old one? I hadn't really investigated that, but having had a bit of a play with ARM GCC, I see that it fails to use the runtime function that returns both the quotient and remainder (__aeabi_idivmod) with the operations in macro form. I get more luck writing them as functions: inline int DIV (int a, int b) { int q = a / b; int r = a - q * b; return q - (r 0); } inline int MOD (int a, int b) { int r = a % b; if (r 0) r += b; return r; } with the caveat that these are based on the macros from my 2015-08-18 post, which rely on b being positive. (Set aside for the moment whether an inline function with an all-caps name is a good idea...) Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)
On Tue, 25 Aug 2015 13:45:48 +0100, Oded Gabbay oded.gab...@gmail.com wrote: [exposing general_composite_rect] I can't say that any cleaner solution has occurred to me since then. I think the more immediate solution, as Soren have suggested on IRC, is for me to implement the equivalent fast-path in VMX. I see that it is already implemented in mmx, sse2, mips-dspr2 and arm-neon. From looking at the C code, I'm guessing that it is fairly simple to implement. Yes, it's definitely one of the simpler fast paths, with only two channels to worry about (source and destination) and with one of them being a constant. I wrote an arm-simd version as well, to add to your list - it's just that it's still waiting to be committed :) I probably ought to get round to exposing general_composite_rect sooner rather than later anyway - it's one of the few things from my mammoth patch series last year that Søren commented on and which I haven't got round to revising yet. I just had a quick look at the VMX source file, and it has hardly any iters defined. My guess would be that what's being used is noop_init_solid_narrow() from pixman-noop.c _pixman_iter_get_scanline_noop() from pixman-utils.c combine_src_u() from pixman-combine32.c I run perf on lowlevel-blt-bench over_n_ and what I got is: - 48.71%48.68% lowlevel-blt-be lowlevel-blt-bench [.] vmx_combine_over_u_no_mask - vmx_combine_over_u_no_mask Sorry, my mistake - for some reason I must have thought we were dealing with src_n_ rather than over_n_. If you can beat the C version using a solid fetcher (which fills a temporary buffer the size of the row with a constant pixel value) and an optimised OVER combiner, then you should be able to do better still if you cut out the temporary buffer and keep the solid colour in registers. Presumably for patch 3 of this series (over_n_0565) you wouldn't see the same effect, as that can't be achieved using mempcy(). Where is that patch ? I didn't see it in the mailing list. My bad again - in my mind, the patches for over_n_ and over_n_0565 in C and ARMv6 assembly were a group of four and I overlooked the fact that when Pekka split them in order to make the benchmarks more robust, he only reposted the over_n_ ones. My original over_n_0565 patches are here: http://patchwork.freedesktop.org/patch/49902/ http://patchwork.freedesktop.org/patch/49903/ Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 1/4] Expose general_composite_rect outside pixman-general.c
--- pixman/pixman-general.c |8 pixman/pixman-private.h |9 + 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c index c8f272b..bb14b5e 100644 --- a/pixman/pixman-general.c +++ b/pixman/pixman-general.c @@ -123,9 +123,9 @@ operator_needs_division (pixman_op_t op) return needs_division[op]; } -static void -general_composite_rect (pixman_implementation_t *imp, - pixman_composite_info_t *info) +void +_pixman_general_composite_rect (pixman_implementation_t *imp, + pixman_composite_info_t *info) { PIXMAN_COMPOSITE_ARGS (info); uint8_t stack_scanline_buffer[3 * SCANLINE_BUFFER_LENGTH]; @@ -240,7 +240,7 @@ general_composite_rect (pixman_implementation_t *imp, static const pixman_fast_path_t general_fast_path[] = { -{ PIXMAN_OP_any, PIXMAN_any, 0, PIXMAN_any,0, PIXMAN_any, 0, general_composite_rect }, +{ PIXMAN_OP_any, PIXMAN_any, 0, PIXMAN_any,0, PIXMAN_any, 0, _pixman_general_composite_rect }, { PIXMAN_OP_NONE } }; diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h index a8298ac..d2fbd53 100644 --- a/pixman/pixman-private.h +++ b/pixman/pixman-private.h @@ -573,6 +573,15 @@ _pixman_implementation_iter_init (pixman_implementation_t *imp, iter_flags_t flags, uint32_t image_flags); +/* Provide a grubby back door to the fetcher / combiner system for + * use by implementations where we know the iterators are faster than + * pixman-fast-path.c (or more generally, any other lower-priority + * implementation in the chain) for a specific operation. + */ +void +_pixman_general_composite_rect (pixman_implementation_t *imp, + pixman_composite_info_t *info); + /* Specific implementations */ pixman_implementation_t * _pixman_implementation_create_general (void); -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 2/4] armv6: Add nearest-scaled-cover src_8888_8888 fast path
This patch implements a shortcut so that the nearest scaled cover fetcher iterator is used in preference to the fast path in pixman-fast-path.c, or even the fast path defined using PIXMAN_ARM_BIND_SCALED_NEAREST_SRC_DST in pixman-arm-simd.c. This is because the fetcher performs better than the fast paths. This is presented as an alternative to the patch I posted last year which used macroised C wrappers around the fetcher, rather than reusing general_composite_rect. Judging by the following benchmarks: lowlevel-blt-bench -n src__ Before Old patch New patch Change Mean StdDev Mean StdDevMean StdDevOld New L1 116.8 4.15 72.1 1.2453.9 0.69 -38.3% -53.8% L235.9 2.26 43.8 1.4638.0 1.12 +22.1% +6.1% M 34.9 0.11 62.1 0.2947.7 0.25 +77.6% +36.5% HT18.4 0.14 26.4 0.3117.8 0.57 +43.7% -3.2% VT17.7 0.13 24.7 0.2317.2 0.50 +39.3% -3.3% R 16.9 0.07 23.4 0.2316.5 0.46 +38.4% -2.7% RT 8.0 0.09 9.2 0.28 5.8 0.23 +14.6% -28.4% I find it hard to advocate this patch, even though it is somewhat simpler. --- pixman/pixman-arm-common.h |9 + pixman/pixman-arm-simd.c |7 +++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-common.h b/pixman/pixman-arm-common.h index 748e933..f970868 100644 --- a/pixman/pixman-arm-common.h +++ b/pixman/pixman-arm-common.h @@ -494,6 +494,15 @@ cputype##_combine_##name##_u (pixman_implementation_t *imp, \ FAST_PATH_X_UNIT_POSITIVE| \ FAST_PATH_Y_UNIT_ZERO) +#define PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER(op,s,d,func) \ +{ PIXMAN_OP_ ## op, \ +PIXMAN_ ## s, \ +PIXMAN_ARM_NEAREST_SCALED_COVER_FLAGS, \ +PIXMAN_null, 0, \ +PIXMAN_ ## d, FAST_PATH_STD_DEST_FLAGS, \ +_pixman_general_composite_rect \ +} + /*/ /* Support for scaled bilinear fetchers */ diff --git a/pixman/pixman-arm-simd.c b/pixman/pixman-arm-simd.c index c37cec8..a72f9da 100644 --- a/pixman/pixman-arm-simd.c +++ b/pixman/pixman-arm-simd.c @@ -399,6 +399,13 @@ static const pixman_fast_path_t arm_simd_fast_paths[] = PIXMAN_STD_FAST_PATH_CA (OVER, solid, a8b8g8r8, a8b8g8r8, armv6_composite_over_n___ca), PIXMAN_STD_FAST_PATH_CA (OVER, solid, a8b8g8r8, x8b8g8r8, armv6_composite_over_n___ca), +PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER (SRC, a8r8g8b8, a8r8g8b8, src__), +PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER (SRC, a8r8g8b8, x8r8g8b8, src__), +PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER (SRC, x8r8g8b8, x8r8g8b8, src__), +PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER (SRC, a8b8g8r8, a8b8g8r8, src__), +PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER (SRC, a8b8g8r8, x8b8g8r8, src__), +PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER (SRC, x8b8g8r8, x8b8g8r8, src__), + SIMPLE_NEAREST_FAST_PATH (SRC, r5g6b5, r5g6b5, armv6_0565_0565), SIMPLE_NEAREST_FAST_PATH (SRC, b5g6r5, b5g6r5, armv6_0565_0565), -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 4/4] armv6: Add nearest-scaled-cover src_0565_0565 fast path
This is adapted from the nearest scaled cover scanline fetcher, modified to pack output data in 16-bit units. This fetcher out-performs both the fast path defined using PIXMAN_ARM_BIND_SCALED_NEAREST_SRC_DST in pixman-arm-simd.c and the fast path in pixman-fast-path.c. Since the two preceding patches no longer defined a macroised C wrapper we can use, and general_composite_rect is no use to us here (we don't want to do pixel format conversion twice) the C wrapper has been written out longhand. Unsurprisingly, the results are similar to last year's version of the patch: lowlevel-blt-bench -n src_0565_0565 Before Old patch New patch Change Mean StdDev Mean StdDevMean StdDevOld New L1 118.6 3.12 71.0 1.3273.5 1.18 -40.1% -38.0% L242.1 0.73 52.6 2.4452.1 2.00 +25.1% +23.7% M 42.1 0.15 69.3 0.1069.3 0.15 +64.9% +64.8% HT24.4 0.35 29.2 0.3329.5 0.24 +19.4% +20.9% VT23.0 0.24 27.4 0.2927.7 0.35 +19.3% +20.6% R 20.8 0.20 25.3 0.3225.7 0.18 +21.4% +23.2% RT 9.1 0.25 9.3 0.24 9.7 0.24+1.7% +6.7% --- pixman/pixman-arm-common.h |9 + pixman/pixman-arm-simd-asm-scaled.S |4 ++ pixman/pixman-arm-simd-asm-scaled.h | 69 ++- pixman/pixman-arm-simd.c| 35 ++ 4 files changed, 108 insertions(+), 9 deletions(-) diff --git a/pixman/pixman-arm-common.h b/pixman/pixman-arm-common.h index f970868..59190f0 100644 --- a/pixman/pixman-arm-common.h +++ b/pixman/pixman-arm-common.h @@ -494,6 +494,15 @@ cputype##_combine_##name##_u (pixman_implementation_t *imp, \ FAST_PATH_X_UNIT_POSITIVE| \ FAST_PATH_Y_UNIT_ZERO) +#define PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH(cputype,op,s,d,func) \ +{ PIXMAN_OP_ ## op, \ +PIXMAN_ ## s, \ +PIXMAN_ARM_NEAREST_SCALED_COVER_FLAGS, \ +PIXMAN_null, 0, \ +PIXMAN_ ## d, FAST_PATH_STD_DEST_FLAGS, \ +cputype ## _composite_nearest_scaled_cover_ ## func \ +} + #define PIXMAN_ARM_NEAREST_SCALED_COVER_SRC_DST_FAST_PATH_VIA_ITER(op,s,d,func) \ { PIXMAN_OP_ ## op, \ PIXMAN_ ## s, \ diff --git a/pixman/pixman-arm-simd-asm-scaled.S b/pixman/pixman-arm-simd-asm-scaled.S index 0116889..24c1a27 100644 --- a/pixman/pixman-arm-simd-asm-scaled.S +++ b/pixman/pixman-arm-simd-asm-scaled.S @@ -170,6 +170,10 @@ generate_nearest_scaled_cover_function \ pixman_get_scanline_nearest_scaled_cover_x8r8g8b8_asm_armv6, 32, \ 2, 3 /* prefetch distances */, nop_macro, convert_x888_ +generate_nearest_scaled_cover_function \ +pixman_get_scanline_r5g6b5_nearest_scaled_cover_r5g6b5_asm_armv6, 16, \ +2, 0 /* prefetch distances */, nop_macro, nop_macro, 16 + .macro init_ge msr CPSR_s, #0x5 .endm diff --git a/pixman/pixman-arm-simd-asm-scaled.h b/pixman/pixman-arm-simd-asm-scaled.h index 660797d..e642e7f 100644 --- a/pixman/pixman-arm-simd-asm-scaled.h +++ b/pixman/pixman-arm-simd-asm-scaled.h @@ -94,7 +94,12 @@ .macro nearest_scaled_cover_enlarge_nomask_innerloop bpp, reg, convert, mask_hint, may_be_final, exit_label, store addsACCUM, ACCUM, UX + .if PIXEL_MERGE_OFFSET == 0 mov \reg, PIXEL + .else +orr \reg, \reg, PIXEL, lsl #PIXEL_MERGE_OFFSET + .endif + .set PIXEL_MERGE_OFFSET, (PIXEL_MERGE_OFFSET + out_bpp) 31 \store branch cc, \exit_label, 1203f .ifnc \may_be_final, @@ -158,10 +163,20 @@ mov TMP, XHI addsXLO, XLO, UX, lsl #16 adc XHI, XHI, UX, lsr #16 + .if PIXEL_MERGE_OFFSET == 0 ldrx\bpp,, \reg, [PTR] + .else +ldrx\bpp,, PIXEL2, [PTR] + .endif eor TMP, TMP, XHI bicsTMP, TMP, #255/\bpp + .if PIXEL_MERGE_OFFSET == 0 \convert \reg, TMP + .else +\convert PIXEL2, TMP +orr \reg, \reg, PIXEL2, lsl #PIXEL_MERGE_OFFSET + .endif + .set PIXEL_MERGE_OFFSET, (PIXEL_MERGE_OFFSET + out_bpp) 31 \store branch eq, \exit_label, 1403f subsPLDS, PLDS, #32 @@ -185,7 +200,14 @@ \inner_loop \bpp, WK0, \convert, mask_is_0, 1, 1503f, add DST, DST, #4 b 1503f .endif + .set PIXEL_MERGE_OFFSET, 0 + .if out_bpp == 32 1502: \inner_loop \bpp, WK0, \convert, mask_is_non_0, 1,, str WK0, [DST], #4 + .elseif out_bpp == 16 +1502: \inner_loop \bpp, WK0, \convert, mask_is_non_0, 1,, strh
Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)
On Mon, 24 Aug 2015 15:20:22 +0100, Oded Gabbay oded.gab...@gmail.com wrote: I tested the patch on POWER8, ppc64le. make check passes, but when I benchmarked the performance using lowlevel-blt-bench over_n_, I got far worse results than without this patch (see numbers below). Apparently, the new C fast-path takes precedence over some vmx combine fast-paths, thus making performance worse instead of better. If that's true, it wouldn't be the first time that this issue has arisen. Pixman is designed to scan all fast paths (i.e. routines which take source, mask and destination images and perform the whole operation themselves) irrespective of how well tuned they are for a given platform, before it starts looking at iter routines (each of which reads or writes only one of source, mask or destination). Previously, in response to my attempt to work around a case where this was happening, Søren wrote: [...] the longstanding problem that pixman sometimes won't pick the fastest code for some operations. In this case the general iter based code will be faster then the C code in pixman-fast-path.c because the iter code will use assembly fetchers. As a result you end up with a bunch of partial reimplementations of general_composite_rect() inside pixman-arm-simd.c. Maybe we need to admit failure and make general_composite_rect() available for other implementations to use. Essentially we would officially provide a way for implementations to say: My iterators are faster than pixman-fast-path.c for these specific operations, so just go directly to general_composite_rect(). It's definitely not a pleasant thing to do, but given that nobody is likely to have the time/skill combination required to do the necessary redesign of pixman's fast path system, it might still be preferable to to do this instead of duplicating the code like these patches do. With a setup like that, we could also fix the same issue for the bilinear SSSE3 fetchers and possibly other cases. (ref http://lists.freedesktop.org/archives/pixman/2014-October/003457.html) I can't say that any cleaner solution has occurred to me since then. I just had a quick look at the VMX source file, and it has hardly any iters defined. My guess would be that what's being used is noop_init_solid_narrow() from pixman-noop.c _pixman_iter_get_scanline_noop() from pixman-utils.c combine_src_u() from pixman-combine32.c this last one would be responsible for the the bulk of the runtime - and it palms most of the work off on memcpy(). Presumably the PPC memcpy is super-optimised? Without the patch: reference memcpy speed = 25711.7MB/s (6427.9MP/s for 32bpp fills) --- over_n_: PIXMAN_OP_OVER, src a8r8g8b8 solid, mask null, dst a8r8g8b8 --- over_n_ = L1: 572.29 L2:1038.08 M:1104.10 ( 17.18%) HT:447.45 VT:520.82 R:407.92 RT:148.90 (1100Kops/s) There's something a bit odd about that - it's slower when working within the caches (especially the L1 cache) than when operating on main memory. I'd hazard a guess that memcpy() is using some hardware acceleration that lives between the L1 and L2 caches. Presumably for patch 3 of this series (over_n_0565) you wouldn't see the same effect, as that can't be achieved using mempcy(). Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 5/7] armv7/mips/sse2: Fix bounds violations in bilinear cover scaled fast paths
In functions named fast_composite_scaled_bilinear... as defined by FAST_BILINEAR_MAINLOOP_INT, there are multiple calls to scanline_func (a macro parameter) which is often implemented in assembly, and which takes as parameters two line buffer pointers (which are range checked) and fixed-point initial and incremental pixel offsets into those lines. When handling edge cases, there are already implicit assumptions that scanline_func reads at most two input pixels horizontally for each output pixel. In practice, implementations typically read exactly two, even when the fractional part of the offset is zero; this leads to array bounds violations for the COVER case when the offset for the final pixel is integer (in other cases, such offsets move us into special handling for repeat zones and so are not affected). This patch fixes this by special-casing operations which are affected, and plotting the final pixel of each row from a temporary buffer (similar to the manner in which PAD repeat is accomplished). Tested using cover-test on armv7. --- pixman/pixman-inlines.h | 69 --- 1 files changed, 65 insertions(+), 4 deletions(-) diff --git a/pixman/pixman-inlines.h b/pixman/pixman-inlines.h index c3324cf..8a480bd 100644 --- a/pixman/pixman-inlines.h +++ b/pixman/pixman-inlines.h @@ -924,6 +924,67 @@ fast_composite_scaled_bilinear ## scale_func_name (pixman_implementation_t *imp, src_width_fixed = pixman_int_to_fixed (src_width); \ } \ \ +if (PIXMAN_REPEAT_ ## repeat_mode == PIXMAN_REPEAT_COVER) \ +{ \ + /* Detect cases where a typical scanline function would read beyond array bounds. */\ + int32_t last_src_pixel = src_image-bits.width - 1; \ + max_vx = v.vector[0] + (width - 1) * unit_x; \ + if (pixman_fixed_to_int (max_vx) == last_src_pixel) \ + { \ + src_type_t buf1[2]; \ + src_type_t buf2[2]; \ + src_type_t *src_line_top; \ + src_type_t *src_line_bottom; \ + width--; \ + while (--height = 0) \ + { \ + int weight1, weight2; \ + dst = dst_line; \ + dst_line += dst_stride; \ + vx = v.vector[0]; \ + if (flags FLAG_HAVE_NON_SOLID_MASK) \ + { \ + mask = mask_line; \ + mask_line += mask_stride; \ + } \ + \ + y1 = pixman_fixed_to_int (vy); \ + weight2 = pixman_fixed_to_bilinear_weight (vy); \ + if (weight2) \ + { \ + /* both weight1 and weight2 are smaller than BILINEAR_INTERPOLATION_RANGE */\ + y2 = y1 + 1; \ + weight1 = BILINEAR_INTERPOLATION_RANGE - weight2; \ + } \ + else
[Pixman] [PATCH 0/7] Cover scaling patches request for assistance
Some back story... First there was this patch: http://patchwork.freedesktop.org/patch/49937/ Back last October, Søren had this to say about it: A concern I have here is that code might access pixels outside the image that have weight 0. Ie., in the bilinear case, some code might attempt to read the pixel at bits[-1] and then multiply it with 0. But we can't assume that bits[-1] is readable. If I remember correctly, the +pixman_fixed_e * 8 stuff was intended to handle this case. I think it would be worthwhile to have a test that uses fence_malloc for the source buffer and the matrix mentioned in the commit. In fact, the fence_malloc() testing could benefit from being extended in various ways: - having fence pages both before and after the image - having fence pages in the 'stride' part of the image Towards this goal, the following patches were posted to the list - and they seem to have escaped Patchwork's notice: http://lists.freedesktop.org/archives/pixman/2015-May/003644.html http://lists.freedesktop.org/archives/pixman/2015-May/003645.html http://lists.freedesktop.org/archives/pixman/2015-May/003646.html http://lists.freedesktop.org/archives/pixman/2015-May/003678.html (note that there were a few minor outstanding points on the first three). This series relies upon the test program implemented by those patches to prove its correctness, so it would be helpful if they could be finished off and committed. If you look in detail at the patches in this series, you'll see that there's an outstanding change for a single routine - the one and only scanline fetch iter in pixman-ssse3.c, which handles bilinear scaled a8r8g8b8 source images. This could probably be fixed using either of the two methods I used in the other patches, but pixman-fast-path.c is the most elegant. My problem is that I don't know SSSE3 (or any other x86 for that matter) so it would represent a big learning curve for me for the sake of just this one function. Is anyone able to help out? I've got a load of other scaling-related goodies lined up, but it doesn't make much sense to post them while all these fundamentals are still outstanding. Ben Avison (7): Refactor calculation of cover flags More accurate FAST_PATH_SAMPLES_COVER_CLIP_NEAREST Split FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR flag More accurate FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR armv7/mips/sse2: Fix bounds violations in bilinear cover scaled fast paths pixman-fast-path: Fix bounds violations in bilinear cover fetcher test: Make image size calculation match COVER_CLIP definition again pixman/pixman-fast-path.c | 35 + pixman/pixman-inlines.h | 63 - pixman/pixman-private.h |1 + pixman/pixman-ssse3.c |2 +- pixman/pixman.c | 37 +- test/affine-bench.c | 11 +++- 6 files changed, 110 insertions(+), 39 deletions(-) -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 6/7] pixman-fast-path: Fix bounds violations in bilinear cover fetcher
A common theme here: the code was unconditionally reading pixels to the right (and in this case below) the rounded-down accumulator without ensuring that the accumulator didn't point at the rightmost sample or bottommost line. The fix for the horizontal case is to negate the accumulator so that the bounds check is only needed once, at the start of each row. Speed is less critical vertically, so for each row we test the row number for validity and force it in range if not. --- pixman/pixman-fast-path.c | 37 + 1 files changed, 25 insertions(+), 12 deletions(-) diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c index 02bbc5a..3938081 100644 --- a/pixman/pixman-fast-path.c +++ b/pixman/pixman-fast-path.c @@ -2280,18 +2280,29 @@ fetch_horizontal (bits_image_t *image, line_t *line, int y, pixman_fixed_t x, pixman_fixed_t ux, int n) { uint32_t *bits = image-bits + y * image-rowstride; -int i; +int i, x0, x1; +int32_t dist_x; +uint32_t left, right; + +x = (1 (16 - BILINEAR_INTERPOLATION_BITS)) - 1 - x; +ux = -ux; + +x0 = pixman_fixed_to_int (x); +x1 = x0 + 1; +dist_x = pixman_fixed_to_bilinear_weight (x); +left = dist_x ? *(bits - x1) : 0; for (i = 0; i n; ++i) { - int x0 = pixman_fixed_to_int (x); - int x1 = x0 + 1; - int32_t dist_x; - - uint32_t left = *(bits + x0); - uint32_t right = *(bits + x1); + if (i 0) + { + x0 = pixman_fixed_to_int (x); + x1 = x0 + 1; + dist_x = pixman_fixed_to_bilinear_weight (x); + left = *(bits - x1); + } - dist_x = pixman_fixed_to_bilinear_weight (x); + right = *(bits - x0); dist_x = (8 - BILINEAR_INTERPOLATION_BITS); #if SIZEOF_LONG = 4 @@ -2301,11 +2312,11 @@ fetch_horizontal (bits_image_t *image, line_t *line, lag = (left 0xff00ff00) 8; rag = (right 0xff00ff00) 8; - ag = (lag 8) + dist_x * (rag - lag); + ag = (rag 8) + dist_x * (lag - rag); lrb = (left 0x00ff00ff); rrb = (right 0x00ff00ff); - rb = (lrb 8) + dist_x * (rrb - lrb); + rb = (rrb 8) + dist_x * (lrb - rrb); *((uint32_t *)(line-buffer + i)) = ag; *((uint32_t *)(line-buffer + i) + 1) = rb; @@ -2323,7 +2334,7 @@ fetch_horizontal (bits_image_t *image, line_t *line, lagrb = (((uint64_t)lag) 24) | lrb; ragrb = (((uint64_t)rag) 24) | rrb; - line-buffer[i] = (lagrb 8) + dist_x * (ragrb - lagrb); + line-buffer[i] = (ragrb 8) + dist_x * (lagrb - ragrb); } #endif @@ -2350,6 +2361,8 @@ fast_fetch_bilinear_cover (pixman_iter_t *iter, const uint32_t *mask) y0 = pixman_fixed_to_int (info-y); y1 = y0 + 1; +if (y1 == iter-image-bits.height) + y1 = y0; dist_y = pixman_fixed_to_bilinear_weight (info-y); dist_y = (8 - BILINEAR_INTERPOLATION_BITS); @@ -3187,7 +3200,7 @@ static const pixman_iter_info_t fast_iters[] = (FAST_PATH_STANDARD_FLAGS| FAST_PATH_SCALE_TRANSFORM | FAST_PATH_BILINEAR_FILTER | - FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR_APPROX), + FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR), ITER_NARROW | ITER_SRC, fast_bilinear_cover_iter_init, NULL, NULL -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 7/7] test: Make image size calculation match COVER_CLIP definition again
Now that we don't need so many excess pixels to ensure both COVER_CLIP fast path flags are set, remove the 8/65536ths of a pixel border used when calculating how big to make the source image. --- test/affine-bench.c | 11 --- 1 files changed, 4 insertions(+), 7 deletions(-) diff --git a/test/affine-bench.c b/test/affine-bench.c index 9e0121e..73e769b 100644 --- a/test/affine-bench.c +++ b/test/affine-bench.c @@ -396,13 +396,10 @@ main (int argc, char *argv[]) } compute_transformed_extents (binfo.transform, dest_box, transformed); -/* The source area is expanded by a tiny bit (8/65536th pixel) - * to match the calculation of the COVER_CLIP flags in analyze_extent() - */ -xmin = pixman_fixed_to_int (transformed.x1 - 8 * pixman_fixed_e - pixman_fixed_1 / 2); -ymin = pixman_fixed_to_int (transformed.y1 - 8 * pixman_fixed_e - pixman_fixed_1 / 2); -xmax = pixman_fixed_to_int (transformed.x2 + 8 * pixman_fixed_e + pixman_fixed_1 / 2); -ymax = pixman_fixed_to_int (transformed.y2 + 8 * pixman_fixed_e + pixman_fixed_1 / 2); +xmin = pixman_fixed_to_int (transformed.x1 - pixman_fixed_1 / 2); +ymin = pixman_fixed_to_int (transformed.y1 - pixman_fixed_1 / 2); +xmax = pixman_fixed_to_int (transformed.x2 + pixman_fixed_1 / 2 - pixman_fixed_e); +ymax = pixman_fixed_to_int (transformed.y2 + pixman_fixed_1 / 2 - pixman_fixed_e); binfo.src_x = -xmin; binfo.src_y = -ymin; -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: List more details of the operation when running scaling-test verbosely
On Fri, 21 Aug 2015 11:34:35 +0100, Pekka Paalanen ppaala...@gmail.com wrote: are you forgetting src_x, src_y, dst_x, dst_y? These were printed in the old code. Good spot - yes, not sure how they managed to escape. Something like this should take care of it: +printf (src_x=%d, src_y=%d, , +src_x, src_y); +if (mask_fmt != PIXMAN_null) +{ +printf (mask_x=%d, mask_y=%d, , +mask_x, mask_y); +} +printf (dst_x=%d, dst_y=%d\n, +dst_x, dst_y); Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] test: List more details of the operation when running scaling-test verbosely
--- I've been doing a fair bit of work on scaled plots again recently, and I was finding that scaling-test omitted to mention lots of useful information (notably whether a mask was used, and what the mask parameers were). test/scaling-test.c | 63 +- 1 files changed, 41 insertions(+), 22 deletions(-) diff --git a/test/scaling-test.c b/test/scaling-test.c index e2f7fa9..02bd0d0 100644 --- a/test/scaling-test.c +++ b/test/scaling-test.c @@ -73,7 +73,7 @@ test_composite (int testnum, pixman_op_top; pixman_repeat_trepeat = PIXMAN_REPEAT_NONE; pixman_repeat_tmask_repeat = PIXMAN_REPEAT_NONE; -pixman_format_code_t src_fmt, dst_fmt; +pixman_format_code_t src_fmt, mask_fmt, dst_fmt; uint32_t * srcbuf; uint32_t * dstbuf; uint32_t * maskbuf; @@ -145,6 +145,7 @@ test_composite (int testnum, prng_randmemset (dstbuf, dst_stride * dst_height, 0); src_fmt = get_format (src_bpp); +mask_fmt = PIXMAN_a8; dst_fmt = get_format (dst_bpp); if (prng_rand_n (2)) @@ -169,7 +170,7 @@ test_composite (int testnum, src_fmt, src_width, src_height, srcbuf, src_stride); mask_img = pixman_image_create_bits ( -PIXMAN_a8, mask_width, mask_height, maskbuf, mask_stride); +mask_fmt, mask_width, mask_height, maskbuf, mask_stride); dst_img = pixman_image_create_bits ( dst_fmt, dst_width, dst_height, dstbuf, dst_stride); @@ -255,21 +256,6 @@ test_composite (int testnum, else pixman_image_set_filter (mask_img, PIXMAN_FILTER_BILINEAR, NULL, 0); -if (verbose) -{ - printf (src_fmt=%s, dst_fmt=%s\n, - format_name (src_fmt), format_name (dst_fmt)); - printf (op=%s, scale_x=%d, scale_y=%d, repeat=%d\n, - operator_name (op), scale_x, scale_y, repeat); - printf (translate_x=%d, translate_y=%d\n, - translate_x, translate_y); - printf (src_width=%d, src_height=%d, dst_width=%d, dst_height=%d\n, - src_width, src_height, dst_width, dst_height); - printf (src_x=%d, src_y=%d, dst_x=%d, dst_y=%d\n, - src_x, src_y, dst_x, dst_y); - printf (w=%d, h=%d\n, w, h); -} - if (prng_rand_n (8) == 0) { pixman_box16_t clip_boxes[2]; @@ -352,10 +338,42 @@ test_composite (int testnum, } if (prng_rand_n (2) == 0) - pixman_image_composite (op, src_img, NULL, dst_img, -src_x, src_y, 0, 0, dst_x, dst_y, w, h); -else - pixman_image_composite (op, src_img, mask_img, dst_img, +{ +mask_fmt = PIXMAN_null; +pixman_image_unref (mask_img); +mask_img = NULL; +mask_x = 0; +mask_y = 0; +} + +if (verbose) +{ +printf (op=%s, src_fmt=%s, mask_fmt=%s, dst_fmt=%s\n, +operator_name (op), format_name (src_fmt), format_name (mask_fmt), format_name (dst_fmt)); +printf (scale_x=%d, scale_y=%d, repeat=%d, filter=%d\n, +scale_x, scale_y, repeat, src_img-common.filter); +printf (translate_x=%d, translate_y=%d\n, +translate_x, translate_y); +if (mask_fmt != PIXMAN_null) +{ +printf (mask_scale_x=%d, mask_scale_y=%d, mask_repeat=%d, mask_filter=%d\n, +mask_scale_x, mask_scale_y, mask_repeat, mask_img-common.filter); +printf (mask_translate_x=%d, mask_translate_y=%d\n, +mask_translate_x, mask_translate_y); +} +printf (src_width=%d, src_height=%d, , +src_width, src_height); +if (mask_fmt != PIXMAN_null) +{ +printf (mask_width=%d, mask_height=%d, , +mask_width, mask_height); +} +printf (dst_width=%d, dst_height=%d\n, +dst_width, dst_height); +printf (w=%d, h=%d\n, w, h); +} + +pixman_image_composite (op, src_img, mask_img, dst_img, src_x, src_y, mask_x, mask_y, dst_x, dst_y, w, h); crc32 = compute_crc32_for_image (0, dst_img); @@ -364,7 +382,8 @@ test_composite (int testnum, print_image (dst_img); pixman_image_unref (src_img); -pixman_image_unref (mask_img); +if (mask_img != NULL) +pixman_image_unref (mask_img); pixman_image_unref (dst_img); if (src_stride 0) -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking
On Thu, 20 Aug 2015 19:34:37 +0100, Bill Spitzak spit...@gmail.com wrote: Could this be whether some bad instruction ends up next to or split by a cache line boundary? That would produce a random-looking plot, though it really is a plot of the location of the bad instructions in the measured function. If this really is a problem then the ideal fix is for the compiler to insert NOP instructions in order to move the bad instructions away from the locations that make them bad. Yike. Thought of that, tried it, still baffled at the results. In other words, merely ensuring instructions retained the same alignment to cachelines wasn't enough to ensure reproducibility - it could only be achieved by ensuring the same absolute address (which isn't an option with shared libraries in the presence of ASLR). My best theory at the moment is that the branch predictor in the ARM11 uses a hash of both the source and destination addresses of a branch to choose which index in the predictor cache. Because it's a direct-mapped cache, any collisions due to the branch moving to a different address can have major effects on very tight loops like src__. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] Resolve implementation-defined behaviour for division rounded to -infinity
The previous implementations of DIV and MOD relied upon the built-in / and % operators performing round-to-zero. This is true for C99, but rounding is implementation-defined for C89 when divisor and/or dividend is negative, and I believe Pixman is still supposed to support C89. --- pixman/pixman-private.h |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h index 73108a0..80506be 100644 --- a/pixman/pixman-private.h +++ b/pixman/pixman-private.h @@ -889,12 +889,12 @@ pixman_list_move_to_front (pixman_list_t *list, pixman_link_t *link) #endif /* Integer division that rounds towards -infinity */ -#define DIV(a, b) \ -a) 0) == ((b) 0)) ? (a) / (b) :\ - ((a) - (b) + 1 - (((b) 0) 1)) / (b)) +#define DIV(a, b) \ +((a) / (b) - ((a) % (b) != 0 ((a) % (b) 0) != ((b) 0) ? 1 : 0)) /* Modulus that produces the remainder wrt. DIV */ -#define MOD(a, b) ((a) 0 ? ((b) - ((-(a) - 1) % (b))) - 1 : (a) % (b)) +#define MOD(a, b) \ +((a) % (b) + ((a) % (b) != 0 ((a) % (b) 0) != ((b) 0) ? (b) : 0)) #define CLIP(v, low, high) ((v) (low) ? (low) : ((v) (high) ? (high) : (v))) -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/9] lowlevel-blt-bench improvements for automated testing
On Wed, 10 Jun 2015 14:32:49 +0100, Pekka Paalanen ppaala...@gmail.com wrote: most of the patches are trivial cleanups. The meat are the last two: CSV output mode and skipping the memory speed benchmark. Both new features are designed for an external benchmarking harness, that runs several different versions of Pixman with lowlevel-blt-bench in an alternating fashion. Alternating iterations are needed to get reliable results on platforms like the Raspberry Pi. These look like sensible improvements to me. A few minor points: Patch 2 commit message: Move explanation printing to a new file. This will help with function Patch 8: if the aim is machine readability, I'd suggest no space after the comma. Otherwise if comma is used as a field separator, the first field has no leading space but all the others do, which may make things a little more fiddly when post-processing the results with some tools. Not really your fault, but I noticed it when trying out the new versions: it doesn't fault an attempt to list more than one pattern on the command line. It just acts on the last one. It should really either fault it, or (more usefully) benchmark all of the patterns specified. This would allow a subset of the all tests to be performed, sharing the same memcpy measurement overhead, or a group of operations including those not in the all list. In other respects, happy to give my Reviewed-by:. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] Fwd: pixman clang issue
Forwarding a message received off-list. (Speculation: you need to be subscribed to the list before messages will be accepted, and Shane isn't?) --- Forwarded message --- From: Shane Burrell sh...@shaneburrell.com To: bavi...@riscosopen.org Cc: Subject: pixman clang issue Date: Thu, 25 Jun 2015 02:57:11 +0100 I'm using the homebrew built version of pixman and it seems that clang doesn't like the inline assembly at line 100 in pixman-mmx.c . It produces the following error. pixman-mmx.c20: error: constraint 'K' expects an integer constant expression : y (__A), K (__N) ^~~ The error is referenced by others here. https://code.google.com/p/nativeclient/issues/detail?id=4201 I attempted to submit a bug report via the list but received a bounce back. Just trying to reach someone to see if it is something that can be fixed upstream from homebrew or a patch or disabling-mmx is needed.___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Wed, 03 Jun 2015 08:51:25 +0100, Pekka Paalanen ppaala...@gmail.com wrote: If we fixed gettime() for clock() wraparounds rather than ignoring them, I suppose there would be no reason to have gettimei(). Ben? Well, that would help, but I still don't like the idea of using floating point in the middle of benchmarking loops when an integer version works just as well and floating point doesn't really gain you anything. Even the division by 100 is nearly always undone by the time we reach the printf because megapixels per second or megabytes per second are practical units - and those are the same things as pixels or bytes per microsecond. Nobody is currently doing more to the times than adding or subtracting them. I know they're increasingly rare these days, but a machine with no hardware FPU might take an appreciable time to do the integer-floating point conversion and floating point maths. Even if you have an FPU, it might be powered down on each context switch and only have its state restored lazily on the first floating point instruction encountered, resulting in a random timing element. In both cases this can be avoided by sticking to integers. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Wed, 03 Jun 2015 08:40:37 +0100, Pekka Paalanen ppaala...@gmail.com wrote: On Tue, 02 Jun 2015 17:03:30 +0100 Ben Avison bavi...@riscosopen.org wrote: If I were to make one change to gettimei() now, looking back, it would be to make it return int32_t instead. This is because most often you'd be subtracting two sample outputs of the function, and it's more often useful to consider time intervals as signed (say if you're comparing the current time against a timeout time which may be in the past or the future). If gettimei() returns a signed integer, then C's type promotion rules make the result of the subtraction signed automatically. Wasn't overflow well-defined only for unsigned integers? You're right, I'd forgotten that aspect of C. To be fully portable, a modulo-2^32-arithmetic comparison needs to be written uint32_t time1, time2; if (((time1 - time2) (1u31)) != 0) rather than int32_t time1, time2; if ((time1 - time2) 0) or uint32_t time1, time2; if ((int32_t) (time1 - time2) 0) even though the latter two are more readable and all three are actually equivalent if you're using two's complement integers, which (nearly?) everybody does nowadays. Sometimes I wish C had an inbuilt modulo integer type. At the assembly level for a lot of CPUs it's a third type of comparison with similar ease of use as signed and unsigned comparisons, but it's a pain to express in C. Anyway, this doesn't change the fact that struct timeval currently uses signed long ints, and will enter undefined behaviour territory in 2038. I think it's reasonable to assume that in practice, tv_sec will actually contain an unsigned long int value (albeit one that has been cast to signed) after that, as that would break the least software. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH v2] test: Fix solid-test for big-endian targets
Tested-by: Fernando Seiti Furusato ferse...@linux.vnet.ibm.com (ppc64le, ppc64, powerpc) Tested-by: Ben Avison bavi...@riscosopen.org (armv6l, armv7l, i686) --- test/solid-test.c |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/test/solid-test.c b/test/solid-test.c index 7be5466..c6ea397 100644 --- a/test/solid-test.c +++ b/test/solid-test.c @@ -237,7 +237,7 @@ create_solid_image (const pixman_format_code_t *allowed_formats, pixman_image_unref (dummy_img); /* Now set the bitmap contents to a random value */ -*buffer = prng_rand (); +prng_randmemset (buffer, 4, 0); image_endian_swap (img); if (used_fmt) @@ -251,7 +251,10 @@ create_solid_image (const pixman_format_code_t *allowed_formats, pixman_color_t color; pixman_image_t *img; -prng_randmemset (color, sizeof color, 0); +color.alpha = prng_rand_n (UINT16_MAX + 1); +color.red = prng_rand_n (UINT16_MAX + 1); +color.green = prng_rand_n (UINT16_MAX + 1); +color.blue = prng_rand_n (UINT16_MAX + 1); img = pixman_image_create_solid_fill (color); if (used_fmt) @@ -345,6 +348,6 @@ main (int argc, const char *argv[]) } return fuzzer_test_main (solid, 50, -0x1B6DFF8D, + 0xC30FD380, test_solid, argc, argv); } -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC PATCH] solid-test: Allow for big-endian targets
On Fri, 29 May 2015 11:08:49 +0100, Pekka Paalanen ppaala...@gmail.com wrote: I think pixman_color needs to be pre-multiplied just like all other alpha formats, which means that the above way to randomize values into a pixman_color can produce luminous (or invalid?) colors. Would that raise undefined behaviour, or should we expect Pixman to handle also those cases always the same? Yes, when you're dealing with pre-multiplied alpha, you're allowed to have the RGB components exceed the alpha one, and this equates to a luminous effect. Pixman uses saturated arithmetic throughout, so it should be able to handle these (and so they need to be tested). Maybe the iteration count (50 in fuzzer_test_main call) should also be increased? [versus 2 million in blitters-test] Well, I kind of settled on that on the basis of the runtime of the test being similar to most of the other fuzz testers. Blitters-test is one of the absolute slowest tests, and although it does include roughly the same number of solid tests as this, they're very skewed towards solid masks rather than solid sources and only test single-pixel bitmap images, so I think the test as written is still usefully increasing the test coverage. Updated patch to follow, as requested. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH v2] arm: Retire PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH
This macro does exactly the same thing as the platform-neutral macro SIMPLE_NEAREST_FAST_PATH. --- Pekka is correct in deducing that the only difference here is a few lines of context in the fast path tables. pixman/pixman-arm-common.h |7 --- pixman/pixman-arm-neon.c | 24 pixman/pixman-arm-simd.c | 18 +- 3 files changed, 21 insertions(+), 28 deletions(-) diff --git a/pixman/pixman-arm-common.h b/pixman/pixman-arm-common.h index 3a7cb2b..c368516 100644 --- a/pixman/pixman-arm-common.h +++ b/pixman/pixman-arm-common.h @@ -266,13 +266,6 @@ FAST_NEAREST_MAINLOOP (cputype##_##name##_normal_##op, \ scaled_nearest_scanline_##cputype##_##name##_##op, \ src_type, dst_type, NORMAL) -/* Provide entries for the fast path table */ -#define PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH(op,s,d,func) \ -SIMPLE_NEAREST_FAST_PATH_COVER (op,s,d,func), \ -SIMPLE_NEAREST_FAST_PATH_NONE (op,s,d,func), \ -SIMPLE_NEAREST_FAST_PATH_PAD (op,s,d,func), \ -SIMPLE_NEAREST_FAST_PATH_NORMAL (op,s,d,func) - #define PIXMAN_ARM_BIND_SCALED_NEAREST_SRC_A8_DST(flags, cputype, name, op, \ src_type, dst_type) \ void \ diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index 60e9c78..be761c9 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -362,21 +362,21 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, a8r8g8b8, neon_composite_out_reverse_8_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, a8b8g8r8, neon_composite_out_reverse_8_), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, neon__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, neon__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, neon__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, neon__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, neon__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, neon__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, neon__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, neon__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, r5g6b5, neon__0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, b5g6r5, neon__0565), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, r5g6b5, neon__0565), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, b5g6r5, neon__0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, a8r8g8b8, r5g6b5, neon__0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, x8r8g8b8, r5g6b5, neon__0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, a8b8g8r8, b5g6r5, neon__0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, x8b8g8r8, b5g6r5, neon__0565), +SIMPLE_NEAREST_FAST_PATH (SRC, a8r8g8b8, r5g6b5, neon__0565), +SIMPLE_NEAREST_FAST_PATH (SRC, x8r8g8b8, r5g6b5, neon__0565), +SIMPLE_NEAREST_FAST_PATH (SRC, a8b8g8r8, b5g6r5, neon__0565), +SIMPLE_NEAREST_FAST_PATH (SRC, x8b8g8r8, b5g6r5, neon__0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, b5g6r5, x8b8g8r8, neon_0565_), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, r5g6b5, x8r8g8b8, neon_0565_), +SIMPLE_NEAREST_FAST_PATH (SRC, b5g6r5, x8b8g8r8, neon_0565_), +SIMPLE_NEAREST_FAST_PATH (SRC, r5g6b5, x8r8g8b8, neon_0565_), /* Note: NONE repeat is not supported yet */ SIMPLE_NEAREST_FAST_PATH_COVER (SRC, r5g6b5, a8r8g8b8, neon_0565_), SIMPLE_NEAREST_FAST_PATH_COVER (SRC, b5g6r5, a8b8g8r8, neon_0565_), diff --git a/pixman/pixman-arm-simd.c b/pixman/pixman-arm-simd.c index fa1ab5c..f40ff36 100644 --- a/pixman/pixman-arm-simd.c +++ b/pixman/pixman-arm-simd.c @@ -260,15 +260,15 @@ static const pixman_fast_path_t arm_simd_fast_paths[] = PIXMAN_STD_FAST_PATH_CA (OVER, solid, a8b8g8r8, a8b8g8r8, armv6_composite_over_n___ca), PIXMAN_STD_FAST_PATH_CA (OVER, solid, a8b8g8r8, x8b8g8r8, armv6_composite_over_n___ca), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, r5g6b5, r5g6b5, armv6_0565_0565), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, b5g6r5, b5g6r5, armv6_0565_0565), - -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, armv6__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, armv6__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, x8r8g8b8, x8r8g8b8, armv6__), -PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (SRC, a8b8g8r8, a8b8g8r8, armv6__), -
Re: [Pixman] [PATCH] mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH macro
On Wed, 27 May 2015 12:24:58 +0100, Pekka Paalanen ppaala...@gmail.com wrote: with summary fixed, this patch seems ok to me (it does not change the content), but it does reorder the rows. I would assume the relative ordering change in this case does not matter, but can someone confirm that? Yes, there is some reordering, but the only significant thing to ensure that the same routine is chosen is that a COVER fast path for a given combination of operator and source/destination pixel formats must precede all the variants of repeated fast paths for the same combination. This patch (and the other mmx/sse2 one) still follows that rule. I believe that in every other case, the set of operations that match any pair of fast paths that are reordered in these patches are mutually exclusive. While there will be a very subtle timing difference due to the distance through the table we have to search to find a match (sometimes faster, sometime slower) there is no evidence that the tables have been carefully ordered by frequency of occurrence - just for ease of copy-and- pasting. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC PATCH] solid-test: Allow for big-endian targets
On Tue, 26 May 2015 19:16:32 +0100, I wrote: +color.alpha = prng_rand_n (UINT16_MAX); +color.red = prng_rand_n (UINT16_MAX); +color.green = prng_rand_n (UINT16_MAX); +color.blue = prng_rand_n (UINT16_MAX); Oops, those should have been UINT16_MAX + 1; the argument is used as a modulo not as a bitmask. The corresponding checksum is then 0xC30FD380 for me on ARM and x86 (both little-endian of course). Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] test: Add cover-test
This test aims to verify both numerical correctness and the honouring of array bounds for scaled plots (both nearest-neighbour and bilinear) at or close to the boundary conditions for applicability of cover type fast paths and iter fetch routines. It has a secondary purpose: by setting the env var EXACT (to any value) it will only test plots that are exactly on the boundary condition. This makes it possible to ensure that cover routines are being used to the maximum, although this requires the use of a debugger or code instrumentation to verify. --- Note that this must be pushed after Pekka's fence-image patches. test/Makefile.sources |1 + test/cover-test.c | 376 + 2 files changed, 377 insertions(+), 0 deletions(-) create mode 100644 test/cover-test.c diff --git a/test/Makefile.sources b/test/Makefile.sources index 14a3710..5b901db 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -26,6 +26,7 @@ TESTPROGRAMS = \ glyph-test\ solid-test\ stress-test \ + cover-test\ blitters-test \ affine-test \ scaling-test \ diff --git a/test/cover-test.c b/test/cover-test.c new file mode 100644 index 000..fb173b2 --- /dev/null +++ b/test/cover-test.c @@ -0,0 +1,376 @@ +/* + * Copyright © 2015 RISC OS Open Ltd + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of the copyright holders not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. The copyright holders make no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Ben Avison (bavi...@riscosopen.org) + * + */ + +#include utils.h +#include stdlib.h +#include stdio.h + +/* Approximate limits for random scale factor generation - these ensure we can + * get at least 8x reduction and 8x enlargement. + */ +#define LOG2_MAX_FACTOR (3) + +#define SQRT_0POINT5_0POINT32_FIXED (0xB504F334u) + +#define MAX_INC (SQRT_0POINT5_0POINT32_FIXED (31 - 16 - LOG2_MAX_FACTOR)) + +/* Minimum source width (in pixels) based on a typical page size of 4K and + * maximum colour depth of 32bpp. + */ +#define MIN_SRC_WIDTH (4096 / 4) + +/* Derive the destination width so that at max increment we fit within source */ +#define DST_WIDTH (MIN_SRC_WIDTH * pixman_fixed_1 / MAX_INC) + +/* Calculate heights the other way round - no limits due to page alignment here */ +#define DST_HEIGHT 3 +#define SRC_HEIGHT ((DST_HEIGHT * MAX_INC + pixman_fixed_1 - 1) / pixman_fixed_1) + +/* At the time of writing, all the scaled fast paths use SRC, OVER or ADD + * Porter-Duff operators. XOR is included in the list to ensure good + * representation of iter scanline fetch routines. + */ +static const pixman_op_t op_list[] = { +PIXMAN_OP_SRC, +PIXMAN_OP_OVER, +PIXMAN_OP_ADD, +PIXMAN_OP_XOR, +}; + +/* At the time of writing, all the scaled fast paths use a8r8g8b8, x8r8g8b8 + * or r5g6b5, or red-blue swapped versions of the same. When a mask channel is + * used, it is always a8 (and so implicitly not component alpha). a1r5g5b5 is + * included because it is the only other format to feature in any iters. */ +static const pixman_format_code_t img_fmt_list[] = { +PIXMAN_a8r8g8b8, +PIXMAN_x8r8g8b8, +PIXMAN_r5g6b5, +PIXMAN_a1r5g5b5 +}; + +/* This is a flag reflecting the environment variable EXACT. It can be used + * to ensure that source coordinates corresponding exactly to the cover limits + * are used, rather than any near misses. This can, for example, be used in + * conjunction with a debugger to ensure that only COVER fast paths are used. + */ +static int exact; + +static pixman_fixed_t +random_scale_factor(void) +{ +/* Get a random number with top bit set. */ +uint32_t f = prng_rand () | 0x8000u; + +/* In log(2) space, this is still approximately evenly spread between 31 + * and 32. Divide by sqrt(2) to centre
[Pixman] [RFC PATCH] solid-test: Allow for big-endian targets
--- Hi Fernando, solid-test is a very new test, and you're quite possibly the first person to try it on any big-endian machine. Can you try this variation and see if it helps please? Thanks, Ben test/solid-test.c |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/test/solid-test.c b/test/solid-test.c index 7be5466..fd95e1f 100644 --- a/test/solid-test.c +++ b/test/solid-test.c @@ -237,7 +237,7 @@ create_solid_image (const pixman_format_code_t *allowed_formats, pixman_image_unref (dummy_img); /* Now set the bitmap contents to a random value */ -*buffer = prng_rand (); +prng_randmemset (buffer, 4, 0); image_endian_swap (img); if (used_fmt) @@ -251,7 +251,10 @@ create_solid_image (const pixman_format_code_t *allowed_formats, pixman_color_t color; pixman_image_t *img; -prng_randmemset (color, sizeof color, 0); +color.alpha = prng_rand_n (UINT16_MAX); +color.red = prng_rand_n (UINT16_MAX); +color.green = prng_rand_n (UINT16_MAX); +color.blue = prng_rand_n (UINT16_MAX); img = pixman_image_create_solid_fill (color); if (used_fmt) @@ -345,6 +348,6 @@ main (int argc, const char *argv[]) } return fuzzer_test_main (solid, 50, -0x1B6DFF8D, + 0xFD523DB8, test_solid, argc, argv); } -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] mips: Retire PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH
This macro does exactly the same thing as the platform-neutral macro SIMPLE_NEAREST_A8_MASK_FAST_PATH. --- Question for anyone who can test this code: can working NORMAL repeat versions of these operations (over__8_0565 and over_0565_8_0565) be added reasonably easily? This would enable the same SIMPLE_NEAREST_A8_MASK_FAST_PATH macros to be used for all platforms. pixman/pixman-mips-dspr2.c |8 pixman/pixman-mips-dspr2.h |6 -- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/pixman/pixman-mips-dspr2.c b/pixman/pixman-mips-dspr2.c index e10c9df..122d9dc 100644 --- a/pixman/pixman-mips-dspr2.c +++ b/pixman/pixman-mips-dspr2.c @@ -388,11 +388,11 @@ static const pixman_fast_path_t mips_dspr2_fast_paths[] = SIMPLE_NEAREST_FAST_PATH_PAD (SRC, r5g6b5, a8r8g8b8, mips_0565_), SIMPLE_NEAREST_FAST_PATH_PAD (SRC, b5g6r5, a8b8g8r8, mips_0565_), -PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, a8r8g8b8, r5g6b5, mips__8_0565), -PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, a8b8g8r8, b5g6r5, mips__8_0565), +SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, a8r8g8b8, r5g6b5, mips__8_0565), +SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, a8b8g8r8, b5g6r5, mips__8_0565), -PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, r5g6b5, r5g6b5, mips_0565_8_0565), -PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, b5g6r5, b5g6r5, mips_0565_8_0565), +SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, r5g6b5, r5g6b5, mips_0565_8_0565), +SIMPLE_NEAREST_A8_MASK_FAST_PATH (OVER, b5g6r5, b5g6r5, mips_0565_8_0565), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, mips__), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, mips__), diff --git a/pixman/pixman-mips-dspr2.h b/pixman/pixman-mips-dspr2.h index 955ed70..b9e0684 100644 --- a/pixman/pixman-mips-dspr2.h +++ b/pixman/pixman-mips-dspr2.h @@ -328,12 +328,6 @@ FAST_NEAREST_MAINLOOP_COMMON (mips_##name##_pad_##op, \ scaled_nearest_scanline_mips_##name##_##op, \ src_type, uint8_t, dst_type, PAD, TRUE, FALSE) -/* Provide entries for the fast path table */ -#define PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH(op,s,d,func) \ -SIMPLE_NEAREST_A8_MASK_FAST_PATH_COVER (op,s,d,func), \ -SIMPLE_NEAREST_A8_MASK_FAST_PATH_NONE (op,s,d,func), \ -SIMPLE_NEAREST_A8_MASK_FAST_PATH_PAD (op,s,d,func) - // #define PIXMAN_MIPS_BIND_SCALED_BILINEAR_SRC_DST(flags, name, op,\ -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH macro
Sorry folks, typo in subject line, should read mmx/sse2: Use SIMPLE_NEAREST_FAST_PATH macro Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH macro
--- pixman/pixman-mmx.c | 20 pixman/pixman-sse2.c | 20 2 files changed, 8 insertions(+), 32 deletions(-) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 42826d9..877b6e9 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -4094,22 +4094,10 @@ static const pixman_fast_path_t mmx_fast_paths[] = PIXMAN_STD_FAST_PATH(IN, a8, null, a8, mmx_composite_in_8_8 ), PIXMAN_STD_FAST_PATH(IN, solid,a8, a8, mmx_composite_in_n_8_8), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, x8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, x8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, a8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, a8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, x8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, x8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, a8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_PAD(OVER, a8r8g8b8, x8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_PAD(OVER, a8b8g8r8, x8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_PAD(OVER, a8r8g8b8, a8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_PAD(OVER, a8b8g8r8, a8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, mmx__ ), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, mmx__ ), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, mmx__ ), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, mmx__ ), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, mmx__ ), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__ ), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, mmx__n_ ), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__n_ ), diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c index a6e7808..1a8c430 100644 --- a/pixman/pixman-sse2.c +++ b/pixman/pixman-sse2.c @@ -6274,22 +6274,10 @@ static const pixman_fast_path_t sse2_fast_paths[] = PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, sse2_composite_in_n_8_8), PIXMAN_STD_FAST_PATH (IN, solid, null, a8, sse2_composite_in_n_8), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, x8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, x8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, a8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, a8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, x8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, x8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, a8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, x8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, x8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, a8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, sse2__), -SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, sse2__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, sse2__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, sse2__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, sse2__), +SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, sse2__), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, sse2__n_), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, sse2__n_), -- 1.7.5.4
[Pixman] [PATCH] mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH for NORMAL repeat
These two architectures were the only place where SIMPLE_NEAREST_SOLID_MASK_FAST_PATH was used, and in both cases the equivalent SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL macro was used immediately afterwards, so including the NORMAL case in the main macro simplifies the fast path table. --- pixman/pixman-inlines.h |3 ++- pixman/pixman-mmx.c |4 pixman/pixman-sse2.c|4 3 files changed, 2 insertions(+), 9 deletions(-) diff --git a/pixman/pixman-inlines.h b/pixman/pixman-inlines.h index dd1c2f1..76338f3 100644 --- a/pixman/pixman-inlines.h +++ b/pixman/pixman-inlines.h @@ -747,7 +747,8 @@ fast_composite_scaled_nearest ## scale_func_name (pixman_implementation_t *imp, #define SIMPLE_NEAREST_SOLID_MASK_FAST_PATH(op,s,d,func) \ SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_COVER (op,s,d,func), \ SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NONE (op,s,d,func),\ -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_PAD (op,s,d,func) +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_PAD (op,s,d,func), \ +SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (op,s,d,func), /*/ diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 877b6e9..05c48a4 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -4103,10 +4103,6 @@ static const pixman_fast_path_t mmx_fast_paths[] = SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, mmx__n_ ), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, mmx__n_ ), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, mmx__n_ ), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, mmx__n_ ), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, mmx__n_ ), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, mmx__n_ ), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, mmx__n_ ), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, mmx__ ), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, mmx__ ), diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c index 1a8c430..8955103 100644 --- a/pixman/pixman-sse2.c +++ b/pixman/pixman-sse2.c @@ -6283,10 +6283,6 @@ static const pixman_fast_path_t sse2_fast_paths[] = SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, sse2__n_), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, sse2__n_), SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, sse2__n_), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, sse2__n_), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, sse2__n_), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, sse2__n_), -SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, sse2__n_), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, sse2__), SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, sse2__), -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] arm: Simplify PIXMAN_ARM_SIMPLE_NEAREST_A8_MASK_FAST_PATH
This macro is a superset of the platform-neutral macro SIMPLE_NEAREST_A8_MASK_FAST_PATH. In other words, in addition to the _COVER, _NONE and _PAD suffixes, its expansion includes the _NORMAL suffix. --- pixman/pixman-arm-common.h |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/pixman/pixman-arm-common.h b/pixman/pixman-arm-common.h index 8a552fd..c039626 100644 --- a/pixman/pixman-arm-common.h +++ b/pixman/pixman-arm-common.h @@ -311,9 +311,7 @@ FAST_NEAREST_MAINLOOP_COMMON (cputype##_##name##_normal_##op, \ /* Provide entries for the fast path table */ #define PIXMAN_ARM_SIMPLE_NEAREST_A8_MASK_FAST_PATH(op,s,d,func) \ -SIMPLE_NEAREST_A8_MASK_FAST_PATH_COVER (op,s,d,func), \ -SIMPLE_NEAREST_A8_MASK_FAST_PATH_NONE (op,s,d,func), \ -SIMPLE_NEAREST_A8_MASK_FAST_PATH_PAD (op,s,d,func), \ +SIMPLE_NEAREST_A8_MASK_FAST_PATH (op,s,d,func), \ SIMPLE_NEAREST_A8_MASK_FAST_PATH_NORMAL (op,s,d,func) /*/ -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC PATCH 3/3] test: add fence-image-self-test
On Fri, 08 May 2015 13:45:37 +0100, Pekka Paalanen ppaala...@gmail.com wrote: +ok = test_image_faults (PIXMAN_a8r8g8b8, 7, 5); +ok = test_image_faults (PIXMAN_r8g8b8, 7, 5); +ok = test_image_faults (PIXMAN_r5g6b5, 7, 5); +ok = test_image_faults (PIXMAN_a8, 7, 5); +ok = test_image_faults (PIXMAN_a4, 7, 5); +ok = test_image_faults (PIXMAN_a1, 7, 5); Maybe there's a case for checking that the addresses just the other side of the boundaries *don't* cause segfaults, in addition to checking that these ones *do*? Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC PATCH 0/3] Add fence_image_create_bits() test helper
On Fri, 08 May 2015 13:45:34 +0100, Pekka Paalanen ppaala...@gmail.com wrote: The self-test uses the following new things that have not been used in Pixman code base before: - fork() - waitpid() - sys/wait.h Should I be adding tests in configure.ac for these? I don't know the answer to that one, sorry. Ben, I think you know better what the actual tests for More accurate COVER_CLIP_NEAREST|BILINEAR should look like. I hope this implements the fence part properly. Would you like to write the out-of-bounds tests or explain me what they should do? Without going to a lot more trouble to trap bad accesses at finer than page granularity, I think those fences are the best you can do. The main aim of the test should be to check that when a scaled plot requires source pixels going right up to the edge of the source image coordinate space, that it doesn't read beyond the limits. At present, this is guaranteed because we fall back from a COVER fetcher to a PAD/ REPEAT/NONE/REFLECT fetcher instead (or the equivalent categories of fast paths) before we get all the way there - although this will only be apparent from benchmarking or by interrupting the test in a debugger. To prove the validity of the patch in question, we need to be able to demonstrate that we can safely continue to use a COVER routine all the way to the edge of the source. Or at least, if we have to compromise and set a tighter threshold before switching to a more general purpose routine, that it is still safe to do so. What we need to test: * Both nearest-neighbour and bilinear scaling. * A range of scale factors, both reduction and enlargement. I remember that I used a variety of routines for difference scales of reduction (1..2x, 2..3x, 3..4x and so on) to avoid having to hold the integer part of the increment in a register, so we need to test up to at least 8x reduction. * We need to test alignment with both the extreme minimum and extreme maximum source coordinates permitted. Given that the scale factors can't be assumed to be nice numbers, I think we need to test them independently, using the translation offsets in the transformation matrix to ensure they align as desired. * Possibly test near miss cases as well, though these would probably want to be weighted so that positions at or closer to the limit are checked most. * Test both horizontal and vertical axes. * We should survey which Porter-Duff operations have scaled fast paths (across all platforms) and make sure we test each of those, plus at least one other which will guarantee that we're exercising the fetchers - XOR is probably safe for this. For nearest-neighbour scaling, the acceptable criteria should be: * Low end: centre of first output pixel coincides with source coordinate pixman_fixed_e * High end: centre of last pixel output coincides with source coordinate pixman_fixed_1*width For bilinear scaling: * Low end: centre of first output pixel coincides with source coordinate pixman_fixed_1/2 (i.e. centre of first source pixel) * High end: I'm slightly torn on this one. You could say pixman_fixed_1*width - pixman_fixed_1/2 But it's also true that BILINEAR_INTERPOLATION_BITS comes into play, meaning that some slightly higher coordinates should also be achievable without requiring any input from the out-of-bounds pixel. For BILINEAR_INTERPOLATION_BITS=7, anything up to and including 0x0.01FF higher than this should be indistinguishable. And yes, that does mean that there is a very slight bias because coordinates are truncated by 9 bits, rounding towards zero. Having had to carefully think through writing the specification above, I have some idea how I'd go about writing it - but if you can follow that description, feel free to have a go yourself! Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH v2] test: Add new fuzz tester targeting solid images
This places a heavier emphasis on solid images than the other fuzz testers, and tests both single-pixel repeating bitmap images as well as those created using pixman_image_create_solid_fill(). In the former case, it also exercises the case where the bitmap contents are written to after the image's first use, which is not a use-case that any other test has previously covered. --- test/Makefile.sources |1 + test/solid-test.c | 348 + 2 files changed, 349 insertions(+), 0 deletions(-) create mode 100644 test/solid-test.c diff --git a/test/Makefile.sources b/test/Makefile.sources index 8b0e855..f09c3e4 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -23,6 +23,7 @@ TESTPROGRAMS = \ composite-traps-test \ region-contains-test \ glyph-test\ + solid-test\ stress-test \ blitters-test \ affine-test \ diff --git a/test/solid-test.c b/test/solid-test.c new file mode 100644 index 000..4745b1f --- /dev/null +++ b/test/solid-test.c @@ -0,0 +1,348 @@ +/* + * Copyright © 2015 RISC OS Open Ltd + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of the copyright holders not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. The copyright holders make no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Ben Avison (bavi...@riscosopen.org) + * + */ + +#include utils.h +#include stdlib.h +#include stdio.h + +#define WIDTH 32 +#define HEIGHT 32 + +static const pixman_op_t op_list[] = { +PIXMAN_OP_SRC, +PIXMAN_OP_OVER, +PIXMAN_OP_ADD, +PIXMAN_OP_CLEAR, +PIXMAN_OP_SRC, +PIXMAN_OP_DST, +PIXMAN_OP_OVER, +PIXMAN_OP_OVER_REVERSE, +PIXMAN_OP_IN, +PIXMAN_OP_IN_REVERSE, +PIXMAN_OP_OUT, +PIXMAN_OP_OUT_REVERSE, +PIXMAN_OP_ATOP, +PIXMAN_OP_ATOP_REVERSE, +PIXMAN_OP_XOR, +PIXMAN_OP_ADD, +PIXMAN_OP_MULTIPLY, +PIXMAN_OP_SCREEN, +PIXMAN_OP_OVERLAY, +PIXMAN_OP_DARKEN, +PIXMAN_OP_LIGHTEN, +PIXMAN_OP_HARD_LIGHT, +PIXMAN_OP_DIFFERENCE, +PIXMAN_OP_EXCLUSION, +#if 0 /* these use floating point math and are not always bitexact on different platforms */ +PIXMAN_OP_SATURATE, +PIXMAN_OP_DISJOINT_CLEAR, +PIXMAN_OP_DISJOINT_SRC, +PIXMAN_OP_DISJOINT_DST, +PIXMAN_OP_DISJOINT_OVER, +PIXMAN_OP_DISJOINT_OVER_REVERSE, +PIXMAN_OP_DISJOINT_IN, +PIXMAN_OP_DISJOINT_IN_REVERSE, +PIXMAN_OP_DISJOINT_OUT, +PIXMAN_OP_DISJOINT_OUT_REVERSE, +PIXMAN_OP_DISJOINT_ATOP, +PIXMAN_OP_DISJOINT_ATOP_REVERSE, +PIXMAN_OP_DISJOINT_XOR, +PIXMAN_OP_CONJOINT_CLEAR, +PIXMAN_OP_CONJOINT_SRC, +PIXMAN_OP_CONJOINT_DST, +PIXMAN_OP_CONJOINT_OVER, +PIXMAN_OP_CONJOINT_OVER_REVERSE, +PIXMAN_OP_CONJOINT_IN, +PIXMAN_OP_CONJOINT_IN_REVERSE, +PIXMAN_OP_CONJOINT_OUT, +PIXMAN_OP_CONJOINT_OUT_REVERSE, +PIXMAN_OP_CONJOINT_ATOP, +PIXMAN_OP_CONJOINT_ATOP_REVERSE, +PIXMAN_OP_CONJOINT_XOR, +PIXMAN_OP_COLOR_DODGE, +PIXMAN_OP_COLOR_BURN, +PIXMAN_OP_SOFT_LIGHT, +PIXMAN_OP_HSL_HUE, +PIXMAN_OP_HSL_SATURATION, +PIXMAN_OP_HSL_COLOR, +PIXMAN_OP_HSL_LUMINOSITY, +#endif +}; + +/* The first eight format in the list are by far the most widely + * used formats, so we test those more than the others + */ +#define N_MOST_LIKELY_FORMATS 8 + +static const pixman_format_code_t img_fmt_list[] = { +PIXMAN_a8r8g8b8, +PIXMAN_a8b8g8r8, +PIXMAN_x8r8g8b8, +PIXMAN_x8b8g8r8, +PIXMAN_r5g6b5, +PIXMAN_b5g6r5, +PIXMAN_a8, +PIXMAN_a1, +PIXMAN_r3g3b2, +PIXMAN_b8g8r8a8, +PIXMAN_b8g8r8x8, +PIXMAN_r8g8b8a8, +PIXMAN_r8g8b8x8, +PIXMAN_x14r6g6b6, +PIXMAN_r8g8b8, +PIXMAN_b8g8r8, +#if 0 /* These are going to use floating point in the near future */ +PIXMAN_x2r10g10b10, +PIXMAN_a2r10g10b10, +PIXMAN_x2b10g10r10
Re: [Pixman] [PATCH] test: Add new fuzz tester targeting solid images
On Wed, 06 May 2015 13:25:43 +0100, Pekka Paalanen ppaala...@gmail.com wrote: this is written based on blitters-test.c, right? That would have been interesting to mention, so new reviewers like me can compare with existing code you mirrored. Well, it's a new addition the family of fuzz testers - the others are affine-test blitters-test composite-traps-test glyph-test matrix-test region-contains-test rotate-test scaling-test I didn't consciously set out to borrow from blitters-test in particular, but out of that set it tests the widest range of Porter-Duff operators, and (jointly with glyph-test) the widest range of pixel formats, so looking back, that's obviously the one I lifted the operator and format arrays from. Aside from the immediate bug that prompted the creation of solid-test, I wanted to increase the proportion of solid source and mask images over those tested elsewhere, because they represent a sizeable proportion of fast paths and therefore a larger attack surface for bugs than was represented to date. Although blitters-test already does a fair amount of testing of this, the proportions are low (1/8 for source images) and it doesn't exercise pixman_image_create_solid_fill images at all. The only two fuzz testers that do use pixman_image_create_solid_fill are composite-traps-test and glyph-test: they don't use the full range of Porter-Duff operators, and they don't use pixman_image_create_solid_fill for mask images at all. I think a number of your comments arise from the expectation that the test was only to test what happens when you change the contents of a 1x1 bits image - hopefully my aim to improve testing of both types of solid images in general helps explain things. This test hits the case of fully transparent/opaque - arbitrary color change, but very rarely - fully transparent/opaque, never starting with an arbitrary color. I understand the two latter cases are not really testable: setting the fast path flags for an arbitrary color when you could use fully transparent/opaque is not detectable via correctness tests. Are there any cases where more symmetric test case selection would be useful? If not, nevermind. My thinking was that as long as we ensured that information about a solid colour wasn't being cached, then we'd guard against my bug or any similar ones being reintroduced. The two cases which are most likely to be treated differently from the others, depending upon the operator, are fully-transparent and fully-opaque, so as long as each of those featured reasonably frequently on at least one side of the before/after divide, I think that should detect any caching. Here's how I understand the code. Source image: - 50% chance for multi-pixel bits image - 25% chance for 1x1 bits image - 25% chance for solid image Mask image: - 50% chance to not be used - 12.5% chance for multi-pixel bits image without CA - 6.25% chance for 1x1 bits image without CA - 6.25% chance for solid image without CA - 12.5% chance for multi-pixel bits image with CA - 6.25% chance for 1x1 bits image with CA - 6.25% chance for solid image with CA Destination image: - always multi-pixel Both source and mask, when they are 1x1 bits images, have: - 50% chance to start fully opaque white - 50% chance to start fully transparent Subtle distinction here: because the 1x1 image may use any pixel format, the choices are between 0 and 2^bpp-1. Depending upon the format, there may not be any pixel value which corresponds to transparent, and the maximum value may not be opaque white (because the image may use a palette). and then switch to arbitrary random color for the actual test. The problems this test attempts to reveal happen only with 1x1 bits images, so shouldn't the chances for 1x1 bits images be higher? Well, 34% of the tests feature at least one 1x1 bits image. If you include the solid images - to measure the number of tests which would select fast paths which feature at least one solid component, that figure rises to 63%. That includes 13% where both the source and mask are solid, which is silly (it'd make more sense for the caller to multiply the constant colours and use an operation with a solid source and no mask image). So by increasing the ratios, while you'd cut the number of tests with multi-pixel source and mask (which are redundant with blitters-test) you'd raise the number of tests of unrealistic combinations. There's a danger that a more sophisticated way to choose combinations of images and operators to better reflect the share of different fast paths would just obfuscate the code and could just as easily be achieved by increasing the number of tests, so I didn't feel it was worth spending too much brainpower on it. Perhaps you'd be happier with a three-way split between op_n_bits, op_n_bits_bits and op_bits_n_bits as in the version I'm just about to post? That guarantees 100% of tests involve a solid image, and 67% of tests involve at least one 1x1 bits image.
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Thu, 23 Apr 2015 17:12:59 +0100, I wrote: I imagine most of the time, you'll have a source image of fixed size, and you'll either have a target destination size (in which case your task is to calculate the transform matrix coefficients) or you'll have a target scaling factor (in which case your task is to work out the destination size that can be achieved without running off the sides of the source image). Just to add to this, an example has come to mind that demonstrates why higher level software should really know about the pixman_fixed_e offset for nearest-neighbour scaling in order to make best use of Pixman. Assume we have a source image that's 101 pixels wide, and we know it has to be plotted at a scale factor of 50%. How many destination pixels do we tell Pixman to plot? In reality, given a transform matrix whose diagonal is (0.5, 0.5, 1), Pixman is capable out picking out source pixels with index 0, 2, 4, 6 ... 100 - that's 51 pixels - whilst still using a COVER_CLIP fast path. If it weren't for the pixman_fixed_e offset, it would pick out source pixels 1, 3, 5, 7 ... 99 instead - that's 50 pixels. Now, if the caller asks for 50 pixels and Pixman uses the offset, then we get to use the COVER_CLIP fast path, but we've trimmed two pixels off the right and none off the left. But if the caller asks for 51 pixels and Pixman doesn't use the offset, then Pixman realises it needs a value for out-of-bounds pixel index 101 so won't use the COVER_CLIP fast path. So, for the best combination of image quality and speed, the details of Pixman's rounding algorithm *can* be significant. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] test: Add new fuzz tester targeting solid images
This places a heavier emphasis on solid images than the other fuzz testers, and tests both single-pixel repeating bitmap images as well as those created using pixman_image_create_solid_fill(). In the former case, it also exercises the case where the bitmap contents are written to after the image's first use, which is not a use-case that any other test has previously covered. --- .gitignore|1 + test/Makefile.sources |1 + test/solid-test.c | 332 + 3 files changed, 334 insertions(+), 0 deletions(-) create mode 100644 test/solid-test.c diff --git a/.gitignore b/.gitignore index 0f11496..8bfb48c 100644 --- a/.gitignore +++ b/.gitignore @@ -77,6 +77,7 @@ test/scaling-crash-test test/scaling-helpers-test test/scaling-test test/screen-test +test/solid-test test/stress-test test/trap-crasher test/trap-test diff --git a/test/Makefile.sources b/test/Makefile.sources index 8b0e855..f09c3e4 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -23,6 +23,7 @@ TESTPROGRAMS = \ composite-traps-test \ region-contains-test \ glyph-test\ + solid-test\ stress-test \ blitters-test \ affine-test \ diff --git a/test/solid-test.c b/test/solid-test.c new file mode 100644 index 000..070652a --- /dev/null +++ b/test/solid-test.c @@ -0,0 +1,332 @@ +/* + * Copyright © 2015 RISC OS Open Ltd + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of the copyright holders not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. The copyright holders make no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Ben Avison (bavi...@riscosopen.org) + * + */ + +#include utils.h +#include stdlib.h +#include stdio.h + +#define WIDTH 32 +#define HEIGHT 32 + +static const pixman_op_t op_list[] = { +PIXMAN_OP_SRC, +PIXMAN_OP_OVER, +PIXMAN_OP_ADD, +PIXMAN_OP_CLEAR, +PIXMAN_OP_SRC, +PIXMAN_OP_DST, +PIXMAN_OP_OVER, +PIXMAN_OP_OVER_REVERSE, +PIXMAN_OP_IN, +PIXMAN_OP_IN_REVERSE, +PIXMAN_OP_OUT, +PIXMAN_OP_OUT_REVERSE, +PIXMAN_OP_ATOP, +PIXMAN_OP_ATOP_REVERSE, +PIXMAN_OP_XOR, +PIXMAN_OP_ADD, +PIXMAN_OP_MULTIPLY, +PIXMAN_OP_SCREEN, +PIXMAN_OP_OVERLAY, +PIXMAN_OP_DARKEN, +PIXMAN_OP_LIGHTEN, +PIXMAN_OP_HARD_LIGHT, +PIXMAN_OP_DIFFERENCE, +PIXMAN_OP_EXCLUSION, +#if 0 /* these use floating point math and are not always bitexact on different platforms */ +PIXMAN_OP_SATURATE, +PIXMAN_OP_DISJOINT_CLEAR, +PIXMAN_OP_DISJOINT_SRC, +PIXMAN_OP_DISJOINT_DST, +PIXMAN_OP_DISJOINT_OVER, +PIXMAN_OP_DISJOINT_OVER_REVERSE, +PIXMAN_OP_DISJOINT_IN, +PIXMAN_OP_DISJOINT_IN_REVERSE, +PIXMAN_OP_DISJOINT_OUT, +PIXMAN_OP_DISJOINT_OUT_REVERSE, +PIXMAN_OP_DISJOINT_ATOP, +PIXMAN_OP_DISJOINT_ATOP_REVERSE, +PIXMAN_OP_DISJOINT_XOR, +PIXMAN_OP_CONJOINT_CLEAR, +PIXMAN_OP_CONJOINT_SRC, +PIXMAN_OP_CONJOINT_DST, +PIXMAN_OP_CONJOINT_OVER, +PIXMAN_OP_CONJOINT_OVER_REVERSE, +PIXMAN_OP_CONJOINT_IN, +PIXMAN_OP_CONJOINT_IN_REVERSE, +PIXMAN_OP_CONJOINT_OUT, +PIXMAN_OP_CONJOINT_OUT_REVERSE, +PIXMAN_OP_CONJOINT_ATOP, +PIXMAN_OP_CONJOINT_ATOP_REVERSE, +PIXMAN_OP_CONJOINT_XOR, +PIXMAN_OP_COLOR_DODGE, +PIXMAN_OP_COLOR_BURN, +PIXMAN_OP_SOFT_LIGHT, +PIXMAN_OP_HSL_HUE, +PIXMAN_OP_HSL_SATURATION, +PIXMAN_OP_HSL_COLOR, +PIXMAN_OP_HSL_LUMINOSITY, +#endif +}; + +/* The first eight format in the list are by far the most widely + * used formats, so we test those more than the others + */ +#define N_MOST_LIKELY_FORMATS 8 + +static const pixman_format_code_t img_fmt_list[] = { +PIXMAN_a8r8g8b8, +PIXMAN_a8b8g8r8, +PIXMAN_x8r8g8b8, +PIXMAN_x8b8g8r8, +PIXMAN_r5g6b5, +PIXMAN_b5g6r5, +PIXMAN_a8, +PIXMAN_a1, +PIXMAN_r3g3b2
Re: [Pixman] [PATCH 10 11/37] in_8888_8 fast path
On Sun, 05 Oct 2014 20:03:42 +0100, Søren Sandmann soren.sandm...@gmail.com wrote: These can presumably also be used for the a8r8g8b8_sRGB format since all we care about is the alpha channel. Looks like it, yes. Updated versions of the three in__8 patches coming up. Also, is this actually showing up in the wild? Which applications? Checking my records, it was the Epiphany browser (default browser on Raspbian) that was responsible. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 05/32] pixman-image: Early detection of solid 1x1 repeating source images
On Thu, 07 Aug 2014 17:50:01 +0100, I wrote: Doesn't handle every pixel format, but for those that it does, enables early conversion of OVER to SRC, for example. --- pixman/pixman-image.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-image.c b/pixman/pixman-image.c index 1ff1a49..ad6e82e 100644 --- a/pixman/pixman-image.c +++ b/pixman/pixman-image.c [...] On Sun, 05 Oct 2014 20:03:42 +0100, Søren Sandmann soren.sandm...@gmail.com wrote: I don't think this works as written. What if someone writes a non-opaque pixel to the 1x1 image after it has been used once? Then the flags would be wrong. This appears to be a valid concern, and one which isn't exercised by any of the existing tests. I have written a new fuzz tester which does detect the behavioural change of the original version of this patch, and an updated patch which doesn't write the updated flags back to the pixman_image_t struct but re-evaluates them on every usage. I will post these in a moment, though you may wish to note that I've slightly changed the subject (the new fix is no longer in pixman-image.c, and in Pixman terminology, solid means an image has the same colour and alpha components at all spatial coordinates, rather than being a synonym for opaque). Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 05/37 v2] composite flags: Early detection of fully opaque 1x1 repeating source images
Doesn't handle every pixel format, but for those that it does, enables early conversion of OVER to SRC, for example. --- pixman/pixman.c | 28 +--- 1 files changed, 25 insertions(+), 3 deletions(-) diff --git a/pixman/pixman.c b/pixman/pixman.c index 9555cea..08293c8 100644 --- a/pixman/pixman.c +++ b/pixman/pixman.c @@ -325,6 +325,27 @@ _pixman_compute_composite_region32 (pixman_region32_t * region, return TRUE; } +static uint32_t +check_for_single_pixel_opaque_image (pixman_image_t *image) +{ +if (image-type == BITS +image-common.extended_format_code == PIXMAN_solid +(image-common.flags FAST_PATH_NO_ACCESSORS) != 0 +(((image-bits.format == PIXMAN_a8) +*(uint8_t *)image-bits.bits == 0xFF) || + ((image-bits.format == PIXMAN_b8g8r8a8 || image-bits.format == PIXMAN_r8g8b8a8) +(0xff ~ image-bits.bits[0]) == 0) || + ((image-bits.format == PIXMAN_a8r8g8b8 || image-bits.format == PIXMAN_a8b8g8r8 || image-bits.format == PIXMAN_a8r8g8b8_sRGB) +(0xff00 ~ image-bits.bits[0]) == 0)) + !image-common.alpha_map + image-common.filter != PIXMAN_FILTER_CONVOLUTION + image-common.filter != PIXMAN_FILTER_SEPARABLE_CONVOLUTION + !image-common.component_alpha) +return FAST_PATH_SAMPLES_OPAQUE | FAST_PATH_IS_OPAQUE; +else +return 0; +} + typedef struct { pixman_fixed_48_16_t x1; @@ -595,12 +616,13 @@ pixman_image_composite32 (pixman_op_t op, _pixman_image_validate (dest); src_format = src-common.extended_format_code; -info.src_flags = src-common.flags; +info.src_flags = src-common.flags | check_for_single_pixel_opaque_image (src); -if (mask !(mask-common.flags FAST_PATH_IS_OPAQUE)) +if (mask) +info.mask_flags = mask-common.flags | check_for_single_pixel_opaque_image (mask); +if (mask !(info.mask_flags FAST_PATH_IS_OPAQUE)) { mask_format = mask-common.extended_format_code; - info.mask_flags = mask-common.flags; } else { -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 11/37 v2] armv6: Add in_8888_8 fast path
This is used instead of the equivalent C fast path. lowlevel-blt-bench results, compared to no fast path at all: Before After Mean StdDev Mean StdDev Confidence Change L1 12.4 0.1 117.5 2.3 100.0% +851.2% L2 9.50.1 46.9 2.4 100.0% +393.8% M 9.60.0 61.9 0.9 100.0% +544.0% HT 7.90.0 26.6 0.5 100.0% +238.6% VT 7.70.0 24.2 0.4 100.0% +212.5% R 7.40.0 22.4 0.4 100.0% +204.5% RT 4.10.0 8.70.2 100.0% +109.4% --- pixman/pixman-arm-simd-asm.S | 111 ++ pixman/pixman-arm-simd.c |5 ++ 2 files changed, 116 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-simd-asm.S b/pixman/pixman-arm-simd-asm.S index 08d6709..6c77fd3 100644 --- a/pixman/pixman-arm-simd-asm.S +++ b/pixman/pixman-arm-simd-asm.S @@ -1291,3 +1291,114 @@ generate_composite_function \ over_n_0565_process_tail /**/ + +.macro in__8_init +SRC0.req Y +SRC1.req STRIDE_D +SRC2.req STRIDE_S +SRC3.req MASK +HALF.req STRIDE_M +TMP .req ORIG_W +line_saved_regs Y, STRIDE_D, STRIDE_S, ORIG_W +ldr SCRATCH, =0x00800080 +mov HALF, #0x80 +/* Set GE[3:0] to 0101 so SEL instructions do what we want */ +uadd8 SCRATCH, SCRATCH, SCRATCH +/* Offset the source pointer: we only need the alpha bytes */ +add SRC, SRC, #3 +.endm + +.macro in__8_cleanup +.unreq SRC0 +.unreq SRC1 +.unreq SRC2 +.unreq SRC3 +.unreq HALF +.unreq TMP +.endm + +.macro in__8_4pixels_head dst +ldr TMP, [DST], #4 +ldrbSRC0, [SRC], #12 +ldrbSRC3, [SRC], #-4 +ldrbSRC2, [SRC], #-4 +uxtb16 WKdst, TMP +uxtb16 TMP, TMP, ror #8 +ldrbSRC1, [SRC], #12 +smlabb SRC0, SRC0, WKdst, HALF +smlabt SRC3, SRC3, TMP, HALF +smlabt SRC2, SRC2, WKdst, HALF +smlabb SRC1, SRC1, TMP, HALF +orr WKdst, SRC0, SRC2, lsl #16 +/* There'd be a stall here if immediately followed by orr, so + * fill it with something like a preload if possible */ +.endm + +.macro in__8_4pixels_tail dst +orr TMP, SRC1, SRC3, lsl #16 +uxtab16 WKdst, WKdst, WKdst, ror #8 +uxtab16 TMP, TMP, TMP, ror #8 +mov WKdst, WKdst, ror #8 +sel WKdst, WKdst, TMP +.endm + +.macro in__8_process_head cond, numbytes, firstreg, unaligned_src, unaligned_mask, preload + .if numbytes == 1 +ldrbWK3, [DST], #1 +ldrbSRC0, [SRC], #4 + .elseif numbytes == 2 +ldrbWK3, [DST], #1 +ldrbSRC0, [SRC], #4 +ldrbTMP, [DST], #1 +ldrbSRC1, [SRC], #4 + .else + .if numbytes = 8 + .if numbytes == 16 +in__8_4pixels_head 0 +in__8_4pixels_tail 0 +in__8_4pixels_head 1 +.if preload +PF bic,SCRATCH, SRC, #31 +PF pld,[SCRATCH, #32*prefetch_distance] +.endif +in__8_4pixels_tail 1 + .endif +in__8_4pixels_head 2 +in__8_4pixels_tail 2 + .endif +in__8_4pixels_head 3 + .endif +.endm + +.macro in__8_process_tail cond, numbytes, firstreg + .if numbytes == 1 +smlabb WK3, SRC0, WK3, HALF +add WK3, WK3, WK3, lsr #8 +mov WK3, WK3, lsr #8 +strbWK3, [DST, #-1] + .elseif numbytes == 2 +smlabb WK3, SRC0, WK3, HALF +smlabb TMP, SRC1, TMP, HALF +add WK3, WK3, WK3, lsr #8 +add TMP, TMP, TMP, lsr #8 +mov WK3, WK3, lsr #8 +mov TMP, TMP, lsr #8 +strbWK3, [DST, #-2] +strbTMP, [DST, #-1] + .else +in__8_4pixels_tail 3 +pixst , numbytes, (4-numbytes/4), DST + .endif +.endm + +generate_composite_function \ +pixman_composite_in__8_asm_armv6, 32, 0, 8, \ +FLAG_DST_READWRITE | FLAG_BRANCH_OVER | FLAG_PROCESS_DOES_STORE | FLAG_SPILL_LINE_VARS, \ +3, /* prefetch distance */ \ +in__8_init, \ +nop_macro, /* newline */ \ +in__8_cleanup, \ +in__8_process_head, \ +in__8_process_tail + +/**/ diff --git a/pixman/pixman-arm-simd.c b/pixman/pixman-arm-simd.c index 31f960d..55f16ea 100644 --- a/pixman/pixman-arm-simd.c +++ b/pixman/pixman-arm-simd.c @@ -48,6 +48,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6, add_8_8, uint8_t, 1, uint8_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6, over__, uint32_t, 1, uint32_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (armv6,
[Pixman] [PATCH 10/37 v2] pixman-fast-path: Add in_8888_8 fast path
This is a C fast path, useful for reference or for platforms that don't have their own fast path for this operation. lowlevel-blt-bench results on ARMv6: Before After Mean StdDev Mean StdDev Confidence Change L1 12.4 0.1 24.4 0.3 100.0% +97.8% L2 9.50.1 14.1 0.2 100.0% +48.1% M 9.60.0 14.7 0.0 100.0% +53.1% HT 7.90.0 12.0 0.1 100.0% +52.3% VT 7.70.0 11.6 0.1 100.0% +49.8% R 7.40.0 10.8 0.1 100.0% +47.2% RT 4.10.0 6.10.1 100.0% +48.2% --- pixman/pixman-fast-path.c | 40 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c index c7ed0f0..e465642 100644 --- a/pixman/pixman-fast-path.c +++ b/pixman/pixman-fast-path.c @@ -262,6 +262,43 @@ fast_composite_in_8_8 (pixman_implementation_t *imp, } static void +fast_composite_in__8 (pixman_implementation_t *imp, + pixman_composite_info_t *info) +{ +PIXMAN_COMPOSITE_ARGS (info); +uint8_t *dst_line, *dst; +uint32_t*src_line, *src; +int dst_stride, src_stride; +int32_t w; +uint8_t s; +uint16_t t; + +PIXMAN_IMAGE_GET_LINE (src_image, src_x, src_y, uint32_t, src_stride, src_line, 1); +PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint8_t, dst_stride, dst_line, 1); + +while (height--) +{ +dst = dst_line; +dst_line += dst_stride; +src = src_line; +src_line += src_stride; +w = width; + +while (w--) +{ +s = *src++ 24; + +if (s == 0) +*dst = 0; +else if (s != 0xff) +*dst = MUL_UN8 (s, *dst, t); + +dst++; +} +} +} + +static void fast_composite_over_n_8_ (pixman_implementation_t *imp, pixman_composite_info_t *info) { @@ -1948,6 +1985,9 @@ static const pixman_fast_path_t c_fast_paths[] = PIXMAN_STD_FAST_PATH (SRC, a1r5g5b5, null, x1r5g5b5, fast_composite_src_memcpy), PIXMAN_STD_FAST_PATH (SRC, a8, null, a8, fast_composite_src_memcpy), PIXMAN_STD_FAST_PATH (IN, a8, null, a8, fast_composite_in_8_8), +PIXMAN_STD_FAST_PATH (IN, a8r8g8b8, null, a8, fast_composite_in__8), +PIXMAN_STD_FAST_PATH (IN, a8b8g8r8, null, a8, fast_composite_in__8), +PIXMAN_STD_FAST_PATH (IN, a8r8g8b8_sRGB, null, a8, fast_composite_in__8), PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, fast_composite_in_n_8_8), SIMPLE_NEAREST_FAST_PATH (SRC, x8r8g8b8, x8r8g8b8, _), -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 23/37 v2] armv6: Add optimised scanline fetchers and writeback for r5g6b5 and a8
This supports r5g6b5 source and desitination images, and a8 source images. lowlevel-blt-bench results for example operations which use these because they lack a dedicated fast path at the time of writing: in_reverse_8_ Before After Mean StdDev Mean StdDev Confidence Change L1 30.0 0.3 37.0 0.3 100.0% +23.2% L2 23.3 0.3 29.4 0.4 100.0% +26.1% M 24.0 0.0 31.3 0.1 100.0% +30.5% HT 12.8 0.1 16.1 0.1 100.0% +25.8% VT 11.9 0.1 14.8 0.1 100.0% +24.6% R 11.7 0.1 14.6 0.1 100.0% +24.5% RT 5.10.1 6.20.1 100.0% +20.2% in_0565_ Before After Mean StdDev Mean StdDev Confidence Change L1 22.0 0.1 28.3 0.2 100.0% +28.4% L2 16.6 0.2 23.6 0.3 100.0% +42.2% M 16.5 0.0 24.7 0.1 100.0% +49.5% HT 11.0 0.1 13.7 0.1 100.0% +24.4% VT 10.7 0.0 13.1 0.1 100.0% +22.0% R 10.3 0.0 12.6 0.1 100.0% +22.5% RT 5.30.1 5.70.1 100.0% +9.0% in_reverse__0565 Before After Mean StdDev Mean StdDev Confidence Change L1 16.6 0.1 20.9 0.1 100.0% +25.5% L2 13.1 0.1 17.7 0.3 100.0% +35.3% M 13.2 0.0 19.2 0.0 100.0% +45.3% HT 9.60.0 11.7 0.1 100.0% +21.8% VT 9.30.0 11.4 0.1 100.0% +22.4% R 9.00.0 10.9 0.1 100.0% +21.1% RT 4.70.1 5.20.1 100.0% +8.7% --- pixman/pixman-arm-common.h | 31 ++ pixman/pixman-arm-simd-asm.S | 94 ++ pixman/pixman-arm-simd.c | 55 3 files changed, 180 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-common.h b/pixman/pixman-arm-common.h index 3558c15..f4632b2 100644 --- a/pixman/pixman-arm-common.h +++ b/pixman/pixman-arm-common.h @@ -453,4 +453,35 @@ cputype##_combine_##name##_u (pixman_implementation_t *imp, \ pixman_composite_scanline_##name##_asm_##cputype (width, dest, src); \ } +/*/ + +#define PIXMAN_ARM_BIND_GET_SCANLINE(cputype, name) \ +void\ +pixman_get_scanline_##name##_asm_##cputype (int32_tw, \ +uint32_t *dst,\ +const uint32_t *src); \ +\ +uint32_t * \ +cputype##_get_scanline_##name (pixman_iter_t *iter, const uint32_t *mask) \ +{ \ +pixman_get_scanline_##name##_asm_##cputype (iter-width, iter-buffer, \ +(uint32_t *) iter-bits); \ +iter-bits += iter-stride; \ +return iter-buffer;\ +} + +#define PIXMAN_ARM_BIND_WRITE_BACK(cputype, name) \ +void \ +pixman_write_back_##name##_asm_##cputype (int32_tw, \ + uint32_t *dst, \ + const uint32_t *src); \ + \ +void \ +cputype##_write_back_##name (pixman_iter_t *iter) \ +{ \ +pixman_write_back_##name##_asm_##cputype (iter-width, \ + (uint32_t *)(iter-bits - iter-stride), \ + iter-buffer); \ +} + #endif diff --git a/pixman/pixman-arm-simd-asm.S b/pixman/pixman-arm-simd-asm.S index f61b715..b251187 100644 --- a/pixman/pixman-arm-simd-asm.S +++ b/pixman/pixman-arm-simd-asm.S @@ -388,6 +388,16 @@ generate_composite_function \ src_0565__process_head, \ src_0565__process_tail +generate_composite_function_single_scanline \ +pixman_get_scanline_r5g6b5_asm_armv6, 16, 0, 32, \ +FLAG_DST_WRITEONLY | FLAG_BRANCH_OVER, \ +3, /* prefetch distance */ \ +src_0565__init, \ +nop_macro, /* newline */ \ +
Re: [Pixman] [PATCH 04/32] pixman-general: Tighten up calculation of temporary buffer sizes
On Thu, 07 Aug 2014 17:50:00 +0100, I wrote: There's no need to align after the end of the temporary destination buffer, and each of the remaining aligns can only add a maximum of 15 bytes to the space requirement. This permits some edge cases to use the stack buffer where previously it would have deduced that a heap buffer was required. --- pixman/pixman-general.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c index 7cdea29..c8f272b 100644 --- a/pixman/pixman-general.c +++ b/pixman/pixman-general.c @@ -159,10 +159,10 @@ general_composite_rect (pixman_implementation_t *imp, mask_buffer = ALIGN (src_buffer + width * Bpp); dest_buffer = ALIGN (mask_buffer + width * Bpp); -if (ALIGN (dest_buffer + width * Bpp) +if (dest_buffer + width * Bpp scanline_buffer + sizeof (stack_scanline_buffer)) { - scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3); + scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3); if (!scanline_buffer) return; I must say I never expected this patch to be controversial. I was merely tinkering with some nearby code and saw this, which to me was an obvious bug. I'm going to stand behind this patch for now, as explained below... On Sun, 05 Oct 2014 20:03:42 +0100, Søren Sandmann soren.sandm...@gmail.com wrote: The idea behind aligning the final buffer is that it allows fetchers to assume that they can do a full 16-byte SIMD write to the buffer. We may possibly not do that, though. I assume you mean combiner rather than fetcher here; the fetchers write to the source and mask line buffers, and the combiner writes to the destination line buffer. I've been thinking about this, and while perhaps what you describe was the intention at one point, I reckon we can't permit a combiner to write beyond its buffer limits because of the dest iters in pixman-noop.c which force the combiner to write directly into the destination bitmap when it's a8r8g8b8 or x8r8g8b8 format. This is already a common use case, and will only become more so. If the combiner is beholden to write exactly the right number of bytes then there's no benefit to over-allocating the line buffer. The change to use 15 instead of 32 is alright I guess. Is this something that actually shows up in practice? Here's a worked example. For any size of stack buffer that you choose, there will be a few row lengths that do an unnecessary heap allocate and free that can be avoided by using this patch. Assume stack allocations are aligned to 8-byte boundaries. Assume Narrow pixel format (4 bytes per pixel). Let width = 2045 pixels. The size of stack_scanline_buffer is determined by a compile-time define, currently 8 kiB * 3 channels. Suppose stack_scanline_buffer = [0x18,0x106008), then: src_buffer = [0x100010,0x102004) mask_buffer = [0x102010,0x104004) dest_buffer = [0x104010,0x106004) In this case it is clear that the three small buffers fit within stack_scanline_buffer, but if the test is performed on the end address of dest_buffer rounded up to a 16-byte boundary - which would be 0x106010 - then you would incorrectly deduce that they don't. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] armv7: Re-use existing fast paths in more cases
There are a group of combiner types - SRC, OVER_REVERSE, IN, OUT and ADD - where the source alpha affects only the destination alpha component. This means that any fast path with a8r8g8b8 source and destination can also be applied to an equivalent operation with x8r8g8b8 source and destination just by updating the fast path table, and likewise with a8b8g8r8 and x8b8g8r8. The following operations are affected: add_x888_8_x888 (and bilinear scaled version of same) add_x888__x888 add_x888_n_x888 add_x888_x888 (and bilinear scaled version of same) --- pixman/pixman-arm-neon.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index 92519e3..2f669cb 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -349,18 +349,25 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (ADD, a8, a8, a8, neon_composite_add_8_8_8), PIXMAN_STD_FAST_PATH (ADD, r5g6b5, a8, r5g6b5, neon_composite_add_0565_8_0565), PIXMAN_STD_FAST_PATH (ADD, b5g6r5, a8, b5g6r5, neon_composite_add_0565_8_0565), +PIXMAN_STD_FAST_PATH (ADD, x8r8g8b8, a8, x8r8g8b8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8, x8r8g8b8, neon_composite_add__8_), +PIXMAN_STD_FAST_PATH (ADD, x8b8g8r8, a8, x8b8g8r8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, a8, x8b8g8r8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8, a8r8g8b8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, a8, a8b8g8r8, neon_composite_add__8_), +PIXMAN_STD_FAST_PATH (ADD, x8r8g8b8, a8r8g8b8, x8r8g8b8, neon_composite_add___), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, x8r8g8b8, neon_composite_add___), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, a8r8g8b8, neon_composite_add___), +PIXMAN_STD_FAST_PATH (ADD, x8r8g8b8, solid,x8r8g8b8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, solid,x8r8g8b8, neon_composite_add__n_), +PIXMAN_STD_FAST_PATH (ADD, x8b8g8r8, solid,x8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, solid,x8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, solid,a8r8g8b8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, solid,a8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8, null, a8, neon_composite_add_8_8), +PIXMAN_STD_FAST_PATH (ADD, x8r8g8b8, null, x8r8g8b8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, x8r8g8b8, neon_composite_add__), +PIXMAN_STD_FAST_PATH (ADD, x8b8g8r8, null, x8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, x8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, a8r8g8b8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, neon_composite_add__), @@ -416,6 +423,7 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = SIMPLE_BILINEAR_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, neon__), SIMPLE_BILINEAR_FAST_PATH (ADD, a8r8g8b8, x8r8g8b8, neon__), +SIMPLE_BILINEAR_FAST_PATH (ADD, x8r8g8b8, x8r8g8b8, neon__), SIMPLE_BILINEAR_A8_MASK_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, neon__8_), SIMPLE_BILINEAR_A8_MASK_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, neon__8_), @@ -432,6 +440,7 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = SIMPLE_BILINEAR_A8_MASK_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, neon__8_), SIMPLE_BILINEAR_A8_MASK_FAST_PATH (ADD, a8r8g8b8, x8r8g8b8, neon__8_), +SIMPLE_BILINEAR_A8_MASK_FAST_PATH (ADD, x8r8g8b8, x8r8g8b8, neon__8_), { PIXMAN_OP_NONE }, }; -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 2/5] armv7: Add in_8888_8 fast path
This is tuned for the Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 146.0 0.7 231.4 1.2 100.0% +58.5% L2 143.1 0.9 222.1 1.7 100.0% +55.3% M 110.9 0.0 129.0 0.5 100.0% +16.3% HT 57.3 0.6 73.0 0.3 100.0% +27.4% VT 46.6 0.5 61.6 0.4 100.0% +32.3% R 42.3 0.2 51.7 0.2 100.0% +22.2% RT 19.1 0.1 21.0 0.1 100.0% +9.9% --- pixman/pixman-arm-neon-asm.S | 35 +++ pixman/pixman-arm-neon.c |4 2 files changed, 39 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 7e949a3..2fecb5b 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -2757,6 +2757,41 @@ generate_composite_function \ /**/ +.macro pixman_composite_in__8_process_pixblock_head +/* src is in d0-d3 (deinterleaved) */ +/* destination pixel data is in d4 */ +vmull.u8q8, d3, d4 +.endm + +.macro pixman_composite_in__8_process_pixblock_tail +vrshr.u16 q9, q8, #8 +vraddhn.u16 d28, q8, q9 +/* result is in d28 */ +.endm + +.macro pixman_composite_in__8_process_pixblock_tail_head +vld4.8 {d0-d3}, [SRC]! +vrshr.u16 q9, q8, #8 +vld1.8 {d4}, [DST_R :64]! +cache_preload 8, 8 +vraddhn.u16 d28, q8, q9 +vmull.u8q8, d3, d4 +vst1.8 {d28}, [DST_W :64]! +.endm + +generate_composite_function \ +pixman_composite_in__8_asm_neon, 32, 0, 8, \ +FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ +8, /* number of pixels, processed in a single block */ \ +4, /* prefetch distance */ \ +default_init, \ +default_cleanup, \ +pixman_composite_in__8_process_pixblock_head, \ +pixman_composite_in__8_process_pixblock_tail, \ +pixman_composite_in__8_process_pixblock_tail_head + +/**/ + generate_composite_function_nearest_scanline \ pixman_scaled_nearest_scanline___OVER_asm_neon, 32, 0, 32, \ FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index 2f669cb..52ee9a4 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -66,6 +66,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, out_reverse_8_0565, uint8_t, 1, uint16_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, out_reverse_8_, uint8_t, 1, uint32_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, in__8, + uint32_t, 1, uint8_t, 1) PIXMAN_ARM_BIND_FAST_PATH_N_DST (SKIP_ZERO_SRC, neon, over_n_0565, uint16_t, 1) @@ -372,6 +374,8 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, a8r8g8b8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (IN, solid,null, a8, neon_composite_in_n_8), +PIXMAN_STD_FAST_PATH (IN, a8r8g8b8, null, a8, neon_composite_in__8), +PIXMAN_STD_FAST_PATH (IN, a8b8g8r8, null, a8, neon_composite_in__8), PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8r8g8b8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, r5g6b5, neon_composite_out_reverse_8_0565), -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 4/5] armv7: Add in_reverse_8888_8888 fast path
This is tuned for Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 48.9 0.2 100.2 0.5 100.0% +104.9% L2 47.4 0.1 99.4 0.4 100.0% +109.6% M 30.4 0.1 59.2 0.3 100.0% +94.6% HT 27.5 0.1 43.4 0.2 100.0% +57.7% VT 23.7 0.1 41.6 0.2 100.0% +75.3% R 22.1 0.1 35.5 0.0 100.0% +60.6% RT 12.3 0.1 16.1 0.0 100.0% +30.9% --- pixman/pixman-arm-neon-asm.S | 57 ++ pixman/pixman-arm-neon.c |6 2 files changed, 63 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 8554e0c..93d680b 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -2856,6 +2856,63 @@ generate_composite_function \ /**/ +.macro pixman_composite_in_reverse___process_pixblock_head +/* src is in d0-d3 (deinterleaved, though we only need alpha from d3) */ +/* destination pixel data is in d4-d7 (deinterleaved) */ +vmull.u8q10, d6, d3 +vmull.u8q11, d7, d3 +vmull.u8q9, d5, d3 +vmull.u8q8, d4, d3 +vrshr.u16 q15, q11, #8 +vrshr.u16 q14, q10, #8 +vrshr.u16 q13, q9, #8 +vrshr.u16 q12, q8, #8 +vraddhn.u16 d31, q11, q15 +vraddhn.u16 d30, q10, q14 +vraddhn.u16 d28, q8, q12 +vraddhn.u16 d29, q9, q13 +.endm + +.macro pixman_composite_in_reverse___process_pixblock_tail +/* result is in d28-d31 */ +.endm + +.macro pixman_composite_in_reverse___process_pixblock_tail_head +vld4.8 {d0-d3}, [SRC]! +vzip.8 d28, d30 +vld4.8 {d4-d7}, [DST_R :128]! +cache_preload 8, 8 +vzip.8 d29, d31 +vmull.u8q10, d6, d3 +vmull.u8q11, d7, d3 +vmull.u8q9, d5, d3 +vmull.u8q8, d4, d3 +vzip.8 d28, d29 +vzip.8 d30, d31 +vst1.8 {d28-d31}, [DST_W :128]! +vrshr.u16 q15, q11, #8 +vrshr.u16 q14, q10, #8 +vrshr.u16 q13, q9, #8 +vrshr.u16 q12, q8, #8 +vraddhn.u16 d31, q11, q15 +vraddhn.u16 d30, q10, q14 +vraddhn.u16 d28, q8, q12 +vraddhn.u16 d29, q9, q13 +.endm + +generate_composite_function \ +pixman_composite_in_reverse___asm_neon, 32, 0, 32, \ +FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ +8, /* number of pixels, processed in a single block */ \ +3, /* prefetch distance */ \ +default_init, \ +default_cleanup, \ +pixman_composite_in_reverse___process_pixblock_head, \ +pixman_composite_in_reverse___process_pixblock_tail, \ +pixman_composite_in_reverse___process_pixblock_tail_head + +/**/ + generate_composite_function_nearest_scanline \ pixman_scaled_nearest_scanline___OVER_asm_neon, 32, 0, 32, \ FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index ab8a58c..532e903 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -68,6 +68,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, out_reverse_8_, uint8_t, 1, uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, in__8, uint32_t, 1, uint8_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, in_reverse__, + uint32_t, 1, uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_N_DST (SKIP_ZERO_SRC, neon, over_n_0565, uint16_t, 1) @@ -382,6 +384,10 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (IN, solid,null, a8b8g8r8, neon_composite_in_n_), PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8r8g8b8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, neon_composite_over_reverse_n_), +PIXMAN_STD_FAST_PATH (IN_REVERSE, a8r8g8b8, null, x8r8g8b8, neon_composite_in_reverse__), +PIXMAN_STD_FAST_PATH (IN_REVERSE, a8b8g8r8, null, x8b8g8r8, neon_composite_in_reverse__), +PIXMAN_STD_FAST_PATH (IN_REVERSE, a8r8g8b8, null, a8r8g8b8, neon_composite_in_reverse__), +PIXMAN_STD_FAST_PATH (IN_REVERSE, a8b8g8r8, null, a8b8g8r8, neon_composite_in_reverse__), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, r5g6b5, neon_composite_out_reverse_8_0565), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, b5g6r5, neon_composite_out_reverse_8_0565), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, x8r8g8b8, neon_composite_out_reverse_8_), -- 1.7.5.4
[Pixman] [PATCH 5/5] armv7: Add src_1555_8888 fast path
This is tuned for Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 88.6 0.2 221.3 0.5 100.0% +149.7% L2 88.1 0.4 219.2 0.8 100.0% +148.9% M 87.9 0.1 178.2 0.1 100.0% +102.6% HT 59.7 0.4 72.0 0.2 100.0% +20.7% VT 53.2 0.4 69.8 0.2 100.0% +31.3% R 48.5 0.3 53.6 0.1 100.0% +10.6% RT 21.2 0.1 23.0 0.1 100.0% +8.5% --- pixman/pixman-arm-neon-asm.S | 51 ++ pixman/pixman-arm-neon.c |8 ++ 2 files changed, 59 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 93d680b..13c75c5 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -2913,6 +2913,57 @@ generate_composite_function \ /**/ +.macro pixman_composite_src_1555__process_pixblock_head +/* src is in d0-d1 */ +vshrn.i16 d31, q0, #8 +vshrn.u16 d29, q0, #2 +vshrn.i16 d30, q0, #7 +vsli.u16q0, q0, #5 +vshr.s8 d31, d31, #7 +vsri.8 d29, d29, #5 +vsri.8 d30, d30, #5 +vshrn.u16 d28, q0, #2 +.endm + +.macro pixman_composite_src_1555__process_pixblock_tail +vzip.8 d29, d31 +vzip.8 d28, d30 +vzip.8 d28, d29 +vzip.8 d30, d31 +/* result is in d28-d31 */ +.endm + +.macro pixman_composite_src_1555__process_pixblock_tail_head +vzip.8 d29, d31 +vzip.8 d28, d30 +vld1.16 {d0-d1}, [SRC]! +cache_preload 8, 8 +vzip.8 d28, d29 +vzip.8 d30, d31 +vst1.8 {d28-d31}, [DST_W :128]! +vshrn.i16 d31, q0, #8 +vshrn.u16 d29, q0, #2 +vshrn.i16 d30, q0, #7 +vsli.u16q0, q0, #5 +vshr.s8 d31, d31, #7 +vsri.8 d29, d29, #5 +vsri.8 d30, d30, #5 +vshrn.u16 d28, q0, #2 +.endm + +generate_composite_function \ +pixman_composite_src_1555__asm_neon, 16, 0, 32, \ +FLAG_DST_WRITEONLY, \ +8, /* number of pixels, processed in a single block */ \ +6, /* prefetch distance */ \ +default_init, \ +default_cleanup, \ +pixman_composite_src_1555__process_pixblock_head, \ +pixman_composite_src_1555__process_pixblock_tail, \ +pixman_composite_src_1555__process_pixblock_tail_head + +/**/ + generate_composite_function_nearest_scanline \ pixman_scaled_nearest_scanline___OVER_asm_neon, 32, 0, 32, \ FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index 532e903..9ab7259 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -46,6 +46,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, src__0565, uint32_t, 1, uint16_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, src_0565_, uint16_t, 1, uint32_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, src_1555_, + uint16_t, 1, uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, src_0888__rev, uint8_t, 3, uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_DST (neon, src_0888_0565_rev, @@ -286,6 +288,12 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (SRC, r5g6b5, null, x8r8g8b8, neon_composite_src_0565_), PIXMAN_STD_FAST_PATH (SRC, b5g6r5, null, a8b8g8r8, neon_composite_src_0565_), PIXMAN_STD_FAST_PATH (SRC, b5g6r5, null, x8b8g8r8, neon_composite_src_0565_), +PIXMAN_STD_FAST_PATH (SRC, x1r5g5b5, null, x8r8g8b8, neon_composite_src_1555_), +PIXMAN_STD_FAST_PATH (SRC, a1r5g5b5, null, x8r8g8b8, neon_composite_src_1555_), +PIXMAN_STD_FAST_PATH (SRC, x1b5g5r5, null, x8b8g8r8, neon_composite_src_1555_), +PIXMAN_STD_FAST_PATH (SRC, a1b5g5r5, null, x8b8g8r8, neon_composite_src_1555_), +PIXMAN_STD_FAST_PATH (SRC, a1r5g5b5, null, a8r8g8b8, neon_composite_src_1555_), +PIXMAN_STD_FAST_PATH (SRC, a1b5g5r5, null, a8b8g8r8, neon_composite_src_1555_), PIXMAN_STD_FAST_PATH (SRC, a8r8g8b8, null, x8r8g8b8, neon_composite_src__), PIXMAN_STD_FAST_PATH (SRC, x8r8g8b8, null, x8r8g8b8, neon_composite_src__), PIXMAN_STD_FAST_PATH (SRC, a8b8g8r8, null, x8b8g8r8, neon_composite_src__), -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH 3/5] armv7: Add in_n_8888 fast path
This is tuned for Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 104.6 0.5 119.4 0.1 100.0% +14.1% L2 106.8 0.6 121.4 0.1 100.0% +13.6% M 100.3 1.3 116.4 0.0 100.0% +16.0% HT 64.5 1.0 70.8 0.1 100.0% +9.8% VT 56.0 0.8 62.2 0.1 100.0% +11.1% R 54.1 0.9 55.2 0.0 100.0% +1.9% RT 24.6 0.5 26.6 0.0 100.0% +8.3% --- pixman/pixman-arm-neon-asm.S | 64 ++ pixman/pixman-arm-neon.c |4 ++ 2 files changed, 68 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 2fecb5b..8554e0c 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -2792,6 +2792,70 @@ generate_composite_function \ /**/ +.macro pixman_composite_in_n__init +add DUMMY, sp, #ARGS_STACK_OFFSET +vld1.8 {d0[]}, [DUMMY]! +vld1.8 {d1[]}, [DUMMY]! +vld1.8 {d2[]}, [DUMMY]! +vld1.8 {d3[]}, [DUMMY]! +.endm + +.macro pixman_composite_in_n__process_pixblock_head +/* src is in d0-d3 (deinterleaved) */ +/* destination pixel data is in d4-d7 (deinterleaved, though we only need alpha from d7) */ +vmull.u8q11, d3, d7 +vmull.u8q10, d2, d7 +vmull.u8q9, d1, d7 +vmull.u8q8, d0, d7 +vrshr.u16 q15, q11, #8 +vrshr.u16 q14, q10, #8 +vrshr.u16 q13, q9, #8 +vrshr.u16 q12, q8, #8 +vraddhn.u16 d31, q11, q15 +vraddhn.u16 d30, q10, q14 +vraddhn.u16 d28, q8, q12 +vraddhn.u16 d29, q9, q13 +.endm + +.macro pixman_composite_in_n__process_pixblock_tail +/* result is in d28-d31 */ +.endm + +.macro pixman_composite_in_n__process_pixblock_tail_head +vld4.8 {d4-d7}, [DST_R :128]! +cache_preload 8, 8 +vzip.8 d28, d30 +vmull.u8q11, d3, d7 +vzip.8 d29, d31 +vmull.u8q10, d2, d7 +vmull.u8q9, d1, d7 +vmull.u8q8, d0, d7 +vzip.8 d28, d29 +vzip.8 d30, d31 +vst1.8 {d28-d31}, [DST_W :128]! +vrshr.u16 q15, q11, #8 +vrshr.u16 q14, q10, #8 +vrshr.u16 q13, q9, #8 +vrshr.u16 q12, q8, #8 +vraddhn.u16 d31, q11, q15 +vraddhn.u16 d30, q10, q14 +vraddhn.u16 d28, q8, q12 +vraddhn.u16 d29, q9, q13 +.endm + +generate_composite_function \ +pixman_composite_in_n__asm_neon, 0, 0, 32, \ +FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ +8, /* number of pixels, processed in a single block */ \ +6, /* prefetch distance */ \ +pixman_composite_in_n__init, \ +default_cleanup, \ +pixman_composite_in_n__process_pixblock_head, \ +pixman_composite_in_n__process_pixblock_tail, \ +pixman_composite_in_n__process_pixblock_tail_head + +/**/ + generate_composite_function_nearest_scanline \ pixman_scaled_nearest_scanline___OVER_asm_neon, 32, 0, 32, \ FLAG_DST_READWRITE | FLAG_DEINTERLEAVE_32BPP, \ diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index 52ee9a4..ab8a58c 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -77,6 +77,8 @@ PIXMAN_ARM_BIND_FAST_PATH_N_DST (SKIP_ZERO_SRC, neon, over_reverse_n_, uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_N_DST (0, neon, in_n_8, uint8_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_N_DST (0, neon, in_n_, + uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_N_MASK_DST (SKIP_ZERO_SRC, neon, over_n_8_0565, uint8_t, 1, uint16_t, 1) @@ -376,6 +378,8 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (IN, solid,null, a8, neon_composite_in_n_8), PIXMAN_STD_FAST_PATH (IN, a8r8g8b8, null, a8, neon_composite_in__8), PIXMAN_STD_FAST_PATH (IN, a8b8g8r8, null, a8, neon_composite_in__8), +PIXMAN_STD_FAST_PATH (IN, solid,null, a8r8g8b8, neon_composite_in_n_), +PIXMAN_STD_FAST_PATH (IN, solid,null, a8b8g8r8, neon_composite_in_n_), PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8r8g8b8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, r5g6b5, neon_composite_out_reverse_8_0565), -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org
[Pixman] [PATCH] armv7: Re-use existing fast paths in more cases
There are a group of combiner types - SRC, OVER, IN_REVERSE, OUT_REVERSE and ADD - where the destination alpha component is only used (if at all) to determine the destination alpha component. This means that any such fast paths with an a8r8g8b8 destination can also be applied to an x8r8g8b8 destination just by updating the fast path table, and likewise with a8b8g8r8 and x8b8g8r8. The following operations are affected: over___x888 add_n_8_x888 add__8_x888 add___x888 add__n_x888 add__x888 out_reverse_8_x888 --- pixman/pixman-arm-neon.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index e5262b1..92519e3 100644 --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -331,6 +331,7 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, a8, b5g6r5, neon_composite_over__8_0565), PIXMAN_STD_FAST_PATH (OVER, r5g6b5, a8, r5g6b5, neon_composite_over_0565_8_0565), PIXMAN_STD_FAST_PATH (OVER, b5g6r5, a8, b5g6r5, neon_composite_over_0565_8_0565), +PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, x8r8g8b8, neon_composite_over___), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, a8r8g8b8, neon_composite_over___), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, r5g6b5, neon_composite_over__0565), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, b5g6r5, neon_composite_over__0565), @@ -341,17 +342,26 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER, x8r8g8b8, null, a8r8g8b8, neon_composite_src_x888_), PIXMAN_STD_FAST_PATH (OVER, x8b8g8r8, null, a8b8g8r8, neon_composite_src_x888_), PIXMAN_STD_FAST_PATH (ADD, solid,a8, a8, neon_composite_add_n_8_8), +PIXMAN_STD_FAST_PATH (ADD, solid,a8, x8r8g8b8, neon_composite_add_n_8_), PIXMAN_STD_FAST_PATH (ADD, solid,a8, a8r8g8b8, neon_composite_add_n_8_), +PIXMAN_STD_FAST_PATH (ADD, solid,a8, x8b8g8r8, neon_composite_add_n_8_), PIXMAN_STD_FAST_PATH (ADD, solid,a8, a8b8g8r8, neon_composite_add_n_8_), PIXMAN_STD_FAST_PATH (ADD, a8, a8, a8, neon_composite_add_8_8_8), PIXMAN_STD_FAST_PATH (ADD, r5g6b5, a8, r5g6b5, neon_composite_add_0565_8_0565), PIXMAN_STD_FAST_PATH (ADD, b5g6r5, a8, b5g6r5, neon_composite_add_0565_8_0565), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8, x8r8g8b8, neon_composite_add__8_), +PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, a8, x8b8g8r8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8, a8r8g8b8, neon_composite_add__8_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, a8, a8b8g8r8, neon_composite_add__8_), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, x8r8g8b8, neon_composite_add___), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, a8r8g8b8, a8r8g8b8, neon_composite_add___), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, solid,x8r8g8b8, neon_composite_add__n_), +PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, solid,x8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, solid,a8r8g8b8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, solid,a8b8g8r8, neon_composite_add__n_), PIXMAN_STD_FAST_PATH (ADD, a8, null, a8, neon_composite_add_8_8), +PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, x8r8g8b8, neon_composite_add__), +PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, x8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, a8r8g8b8, neon_composite_add__), PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, neon_composite_add__), PIXMAN_STD_FAST_PATH (IN, solid,null, a8, neon_composite_in_n_8), @@ -359,7 +369,9 @@ static const pixman_fast_path_t arm_neon_fast_paths[] = PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, neon_composite_over_reverse_n_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, r5g6b5, neon_composite_out_reverse_8_0565), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, b5g6r5, neon_composite_out_reverse_8_0565), +PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, x8r8g8b8, neon_composite_out_reverse_8_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, a8r8g8b8, neon_composite_out_reverse_8_), +PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, x8b8g8r8, neon_composite_out_reverse_8_), PIXMAN_STD_FAST_PATH (OUT_REVERSE, a8,null, a8b8g8r8, neon_composite_out_reverse_8_), PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, neon__), --
[Pixman] [PATCH 1/5] armv7: Use prefetch for small-width images too
After discovering that the ARMv6 optimised fast paths often out-performed the ARMv7 ones on a Cortex-A7, particularly on the RT benchmark, I found that the problem was due to the fact that the ARMv7 macros didn't attempt any sort of prefetch for small images (fewer than pixblock_size * 2 pixels across). Since a pixblock is chosen to be no larger than a cacheline, and is in many cases smaller, it seemed a reasonable compromise to avoid adding a lot of complexity by simply doing one prefetch for the start of a pixel row when starting to process the preceding one, and that is what this patch does. I compared the effect of using LDRB (which is what is currently used at the end of each long pixel row) against PLD for each of the source and destination buffers for a selection of common operations: src__, over__ and add__, and in each case PLD of both buffers was the most beneficial. PLDW didn't make any measurable difference. The overall effect of this patch on the three operations is as follows (L1, L2 and M tests can be ignored because they're known not to involve the use of short rows): src__ Before After Mean StdDev Mean StdDev Confidence Change HT 60.8 0.1 61.1 0.1 100.0% +0.6% VT 61.0 0.3 62.6 0.2 100.0% +2.6% R 45.5 0.2 46.2 0.2 100.0% +1.5% RT 19.8 0.0 21.4 0.0 100.0% +7.8% over__ Before After Mean StdDev Mean StdDev Confidence Change HT 40.2 0.1 40.7 0.4 100.0% +1.0% VT 35.5 0.2 37.9 0.3 100.0% +6.7% R 32.8 0.0 33.8 0.3 100.0% +3.0% RT 12.9 0.0 15.6 0.2 100.0% +21.4% add__ Before After Mean StdDev Mean StdDev Confidence Change HT 51.0 0.6 51.9 0.5 100.0% +1.7% VT 44.0 0.4 46.8 0.5 100.0% +6.3% R 39.6 0.5 41.0 0.4 100.0% +3.5% RT 15.2 0.2 18.0 0.2 100.0% +18.5% --- pixman/pixman-arm-neon-asm.h |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.h b/pixman/pixman-arm-neon-asm.h index bdcf6a9..ce06416 100644 --- a/pixman/pixman-arm-neon-asm.h +++ b/pixman/pixman-arm-neon-asm.h @@ -870,6 +870,15 @@ local skip1 * nor prefetch are used. */ 8: +.if src_bpp_shift = 0 +PF pld, [SRC, SRC_STRIDE, lsl #src_bpp_shift] +.endif +.if dst_r_bpp != 0 +PF pld, [DST_R, DST_STRIDE, lsl #dst_bpp_shift] +.endif +.if mask_bpp_shift = 0 +PF pld, [MASK, MASK_STRIDE, lsl #mask_bpp_shift] +.endif /* Process exactly pixblock_size pixels if needed */ tst W, #pixblock_size beq 1f -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 25/32] armv6: Add src_1555_8888 fast path
I hope this doesn't confuse threading too much... I've been working on ARMv7 with the aim of bringing it up to par with the ARMv6 features that I've been writing over the last couple of years, and I'm trying to drip feed them to the mailing list to avoid a big bang at the end this time. I've just added src_1555_ for ARMv7, and it makes sense to address Søren's review comment for the ARMv6 version while I am at it, even though many earlier ARMv6 patches in the series haven't been accepted yet. On Thu, 07 Aug 2014 17:50:21 +0100, I wrote: [...] diff --git a/pixman/pixman-arm-simd.c b/pixman/pixman-arm-simd.c index e6c5d81..f028794 100644 --- a/pixman/pixman-arm-simd.c +++ b/pixman/pixman-arm-simd.c [...] @@ -277,6 +279,9 @@ static const pixman_fast_path_t arm_simd_fast_paths[] = PIXMAN_STD_FAST_PATH (SRC, a8b8g8r8, null, b5g6r5, armv6_composite_src_x888_0565), PIXMAN_STD_FAST_PATH (SRC, x8b8g8r8, null, b5g6r5, armv6_composite_src_x888_0565), +PIXMAN_STD_FAST_PATH (SRC, a1r5g5b5, null, a8r8g8b8, armv6_composite_src_1555_), +PIXMAN_STD_FAST_PATH (SRC, a1b5g5r5, null, a8b8g8r8, armv6_composite_src_1555_), + PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, armv6_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, armv6_composite_over__), PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, armv6_composite_over__), On Sun, 05 Oct 2014 20:03:42 +0100, Søren Sandmann soren.sandm...@gmail.com wrote: Can these be used for x8r8g8b8/x8b8g8r8 targets as well? This is true. In fact, it can be used for x1r5g5b5/x1b5g5r5 sources as well if the destination is x8r8g8b8/x8b8g8r8. The ARMv6 fast path table is otherwise quite rigorous in including x888 type pixel formats whenever possible. The same cannot be said for the ARMv7 table; logically, the same rules should apply to both. So I will shortly be following the updated version of this patch with a couple more patches to improve the ARMv7 table in the same way. Following that will be a short patch series adding some of the unscaled fast paths which had been encountered on the Raspberry Pi 1 (ARMv6) but for which an ARMv7 implementation was missing. These are presented as a dependent series because without the first patch in that series to improve the ARMv7 prefetch logic for small images, the RT benchmark scores for these fast paths showed a sizeable regression compared to the ARMv6 version. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] lowlevel-blt-bench: use a8r8g8b8 for CA solid masks
When doing component alpha, use a mask format that has all the color channels instead of just a8. In case you're waiting for me... That first sentence doesn't qualify the fact that this is specific to solid masks. Other than that, looks good to me. Not sure whether that qualifies as a Reviewed-by or Acked-by, take your pick! Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/7] lowlevel-blt-bench test pattern parser
On Tue, 14 Apr 2015 09:36:38 +0100, Pekka Paalanen ppaala...@gmail.com wrote: On Mon, 13 Apr 2015 18:42:45 +0100 Ben Avison bavi...@riscosopen.org wrote: On Mon, 13 Apr 2015 12:31:35 +0100, Pekka Paalanen ppaala...@gmail.com wrote: None of the existing tests has a solid mask for CA tests. I'm not sure how much it makes sense. When you're in component alpha mode, mathematically the source and mask images are interchangeable. So you could implement over__n__ca by calling over_n___ca, and we already have a number of fast paths for operations of the latter form. (I'm not aware that Pixman is - yet - aware that it can make such substitutions, but I'm sure it could be added.) Hmm, yes, interesting, at least in the cases where the operator indeed allows it... I don't know the operators by heart so I don't know if there are any that wouldn't work like that. An idea to remember. Having thought about it some more, and checked the source code (it's clearer in pixman-combine-float.c because the maths isn't obfuscated by the need to work in fixed-point) I'm going to retract the claim. Or at least restrict its applicability. (This demonstrates the danger of there not being anyone around to contradict me!) The problem is that although alpha is considered independently for each colour component of the mask - so you just multiply the source red by the mask red and so on, I'd neglected the alpha component in the source image. Yes, because Pixman premultiplies the alpha into the RGB components, it doesn't directly affect the RGB output of the combined source+mask image, but there's a secondary output, a per-channel alpha, which affects the Porter-Duff operation. For example, with an OVER operation, the inverse of the per-channel alpha is multiplied into the existing value from the destination buffer before you add-saturate the combined source+mask into it. My statement about being able to exchange the source and mask images is true, but only if the alpha component is 1 in both images. For example, over_0565_n__ca or over_x888_n__ca where (in both cases) n.a == 1 would both be suitable candidates for exchanging the source and mask. For the avoidance of doubt, the Porter-Duff and PDF combiners are all just binary operators, defined in terms of one source and one destination image. Whenever Pixman talks about unified or component alpha operations, it's just concerned with how the source and mask images are combined into the effective source image for the fundamental combiner. It just happens to be more efficient in most cases to do both the source/mask and source/ destination calculations at the same time, rather than writing out the intermediate image into a buffer somewhere. So, can I take it that you gave your Reviewed-by for the whole series? Yes. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen ppaala...@gmail.com wrote: So the matrices are indexed as matrix[row][column] in Pixman? That's my understanding, yes. A short'ish comment somewhere to say why you are doing this offsetting would be nice, and that the offsetting is the reason to allocate a margin. Since you've been reworking the patches anyway, do you have enough information to add the comments yourself or do you want me to do them? One simple way round this is to apply a 0.5 pixel translation in the transformation matrix, so that the pattern becomes: d[0] =1.00 * s[0] d[1] =0.50 * s[0] + 0.50 * s[1] d[2] = 1.00 * s[1] d[3] = 0.50 * s[1] + 0.50 * s[2] but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't see why the fudge factor is needed at all, at least not in the affine case; it's described as being to compensate for rounding errors, but there is no rounding involved in fixed-point affine transforms. Maybe the rounding refers to the pixman_fixed_to_int() calls? I could imagine it is to guarantee that an xmin=0.5 gets really rounded to 0, and xmax=0.5 gets rounded to 1. There's a slightly better worked example in the later patch I was thinking of: http://lists.freedesktop.org/archives/pixman/2014-September/003380.html As I say, we're getting a bit ahead of ourselves here, as there were about 30 patches between the ones we're reworking at the moment and that one. Søren did give some comments on it at the end of his review here: http://lists.freedesktop.org/archives/pixman/2014-October/003457.html which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond the bitmap bounds if a fast path happened to over-read pixels and then multiply them by a 0 filter coefficient. I think we can probably cater for that better by checking whether a bitmap starts or ends at a page boundary, and whether we're reading the first and last pixels of the image if it does. To be honest, both cases you describe sound strange to me. Surely if I use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect d[0] = 0.5 * s[0] + 0.5 * s[1] when assuming the box filter (if I got my terminology right)... You're forgetting it's pixel-centres that are fed through the transform. To be honest, I think it's already quite a headache for an application to work out how to set up the transform in order to hit a cover scaled fast path. Assume the reasonable case that you want to plot the whole of an image of size x,y at a size m,n. You need to set the diagonals of the transform to floor(pixman_fixed_1*(x-1)/(m-1)) floor(pixman_fixed_1*(y-1)/(n-1)) 1 and then solve for / 0.5 \ / 0.5 \ T . | 0.5 | = | 0.5 | \ 1 / \ 1 / to find the translation coefficients that will ensure the top-left source pixel aligns exactly to the top-left destination pixel. To then require that the caller also knows that you need to nudge things along by 8*pixman_fixed_e as well feels like it's requiring too much knowledge of Pixman's internals to me. I didn't really like putting it in affine-bench to begin with for this reason, but it doesn't work without it. So I made do with removing it as quickly as I could - the original intention was for it to be in the same patch series, but obviously the patch series has grown too large and unwieldy for that the be the case any more! When I've seen timing breakdowns of which Pixman operations are being hit by applications, I've seen far more scaled plots with repeat type PAD than I'd really expect. My suspicions are that the bulk of these are down to how difficult it is for applications to set up the transform for maximum efficiency. I think my hope was that since we're processing images of size 1920 * 1080 pixels, you wouldn't have any particular pixels persisting in the cache from the previous iteration (unless perhaps if you're testing a high-factor enlargement on a platform that has caches that don't allocate on write). Right, so is there any need to flush_cache() at all? Possibly not. I was taking my cue from lowlevel-blt-bench again, where it flushes the cache between each type of test. And looking at the implementation of flush_cache() again now, I note that due to the action of allocate-on-write and physically mapped caches, it won't completely flush most caches, only one page-size worth! Perhaps just delete it then. +bench (op, t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 500, n, t1, pixman_image_composite_wrapper); +bench (op, t, src_image, mask_image, dest_image, src_x, src_y, n, UINT32_MAX, NULL, t2, pixman_image_composite_empty); [...] I think that part I understood. My comment was about having a {} block without any flow control statements for it. You are just using it to scope some variables, which suggests that the block should actually be a separate function. Oh right. Well, I think
Re: [Pixman] [PATCH 0/7] lowlevel-blt-bench test pattern parser
On Mon, 13 Apr 2015 12:31:35 +0100, Pekka Paalanen ppaala...@gmail.com wrote: Apart from restructuring, there is one significant(?) difference to Ben's patches: for a solid mask, Ben used a8r8g8b8, but my code uses a8. I had to remind myself how things hang together to check if this is significant or not... When lowlevel-blt-bench tests a solid image (source or mask), it achieves it not by calling pixman_image_create_solid_fill() but by creating a 1x1 pixel bitmap using pixman_image_create_bits(). However, the effect is effectively the same because compute_image_info() detects single-pixel bitmaps and records it in image-common.extended_format_code before we start doing fast path lookup. However, image-type remains as BITS rather than SOLID because the non-common parts of the pixman_image union retain their original meaning - for a native solid fill image, it's just the colour in three different formats, but for bitmap images it's a pointer to and sizes of the bitmap and colour LUT (if applicable), various helper function pointers and so on. A typical ARM assembly fast path is written to assume that any argument corresponding to a solid colour is already in an format where the red-blue ordering is the same as the destination image. The magic that ensures this is hidden in the wrapper C function generated by PIXMAN_ARM_BIND_FAST_PATH_N_DST and similar, where it calls _pixman_image_get_solid(). You'll see similar calls more explicitly in pixman-fast-path.c (architecture-neutral fast paths) and the equivalent sources files for other architectures. There are a few circumstances where you might not want to use _pixman_image_get_solid() - perhaps most obviously are the operations which use a solid image in pixel format a2r10g10b10 because you'd lose 2 bits of precision per colour component. There are a few example of those amongst those you picked out in special_patterns[]. It's worth noting that _pixman_image_get_solid() has optimised code to extract the colour from image-type==BITS images if the source/mask image is of format a8r8g8b8, x8r8g8b8 or a8. Other formats will still work, but more laboriously. Not that any of this should matter at all for the purposes of lowlevel-blt-bench. _pixman_image_get_solid() is only called once per call of pixman_image_composite(), so it is part of the overhead that should be being accounted for by the code wrapped in #if EXCLUDE_OVERHEAD. should we also add a condition, that if a test has CA flag and a solid mask, then that mask needs to be a8r8g8b8? That might be desirable, if only because lowlevel-blt-bench initialises all its images using memset(0xCC) so an a8 solid image would be converted by _pixman_image_get_solid() to 0xCC00 whereas an a8r8g8b8 would be 0x. When you're not in component alpha mode, only the alpha byte matters for the mask image, but in the case of component alpha operations, a fast path might decide that it can save itself a lot of multiplications if it spots that 3 constant mask components are already 0. None of the existing tests has a solid mask for CA tests. I'm not sure how much it makes sense. When you're in component alpha mode, mathematically the source and mask images are interchangeable. So you could implement over__n__ca by calling over_n___ca, and we already have a number of fast paths for operations of the latter form. (I'm not aware that Pixman is - yet - aware that it can make such substitutions, but I'm sure it could be added.) Ben, when we eventually get to the actual optimization patches, I expect I will reproduce the benchmarks myself while reviewing. Is that ok? That's definitely fine by me, thanks for offering. Fingers crossed it doesn't substantially change the results :-) What do you think of this series? In general, it looks like an improvement to me. I don't think the test programs have had this much attention in a while! One minor point in patch 6's log message, you copied my typo: operation_src[_mask]dst[_ca] should be operation_src[_mask]_dst[_ca] Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] test: Add a new benchmarker targeting affine operations
--- .gitignore|1 + test/Makefile.sources |1 + test/affine-bench.c | 394 + 3 files changed, 396 insertions(+), 0 deletions(-) create mode 100644 test/affine-bench.c diff --git a/.gitignore b/.gitignore index 0f11496..7da6b6f 100644 --- a/.gitignore +++ b/.gitignore @@ -46,6 +46,7 @@ demos/tri-test pixman/pixman-srgb.c pixman/pixman-version.h test/a1-trap-test +test/affine-bench test/affine-test test/alpha-loop test/alphamap diff --git a/test/Makefile.sources b/test/Makefile.sources index c20c34b..8b0e855 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -37,6 +37,7 @@ OTHERPROGRAMS = \ radial-perf-test\ check-formats \ scaling-bench \ + affine-bench\ $(NULL) # Utility functions diff --git a/test/affine-bench.c b/test/affine-bench.c new file mode 100644 index 000..720d066 --- /dev/null +++ b/test/affine-bench.c @@ -0,0 +1,394 @@ +/* + * Copyright © 2014 RISC OS Open Ltd + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of the copyright holders not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. The copyright holders make no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Ben Avison (bavi...@riscosopen.org) + */ + +#include stdio.h +#include stdlib.h +#include string.h +#include ctype.h +#include stdint.h +#include utils.h + +#ifdef HAVE_GETTIMEOFDAY +#include sys/time.h +#else +#include time.h +#endif + +#define WIDTH 1920 +#define HEIGHT 1080 +#define MAX_L2CACHE_SIZE (8 * 1024 * 1024) /* How much data to read to flush all cached data to RAM */ +#define PAGE_SIZE (4 * 1024) + +typedef struct box_48_16 box_48_16_t; + +struct box_48_16 +{ +pixman_fixed_48_16_tx1; +pixman_fixed_48_16_ty1; +pixman_fixed_48_16_tx2; +pixman_fixed_48_16_ty2; +}; + +static pixman_bool_t +compute_transformed_extents (pixman_transform_t *transform, + const pixman_box32_t *extents, + box_48_16_t *transformed) +{ +pixman_fixed_48_16_t tx1, ty1, tx2, ty2; +pixman_fixed_t x1, y1, x2, y2; +int i; + +x1 = pixman_int_to_fixed (extents-x1) + pixman_fixed_1 / 2; +y1 = pixman_int_to_fixed (extents-y1) + pixman_fixed_1 / 2; +x2 = pixman_int_to_fixed (extents-x2) - pixman_fixed_1 / 2; +y2 = pixman_int_to_fixed (extents-y2) - pixman_fixed_1 / 2; + +if (!transform) +{ +transformed-x1 = x1; +transformed-y1 = y1; +transformed-x2 = x2; +transformed-y2 = y2; + +return TRUE; +} + +tx1 = ty1 = INT64_MAX; +tx2 = ty2 = INT64_MIN; + +for (i = 0; i 4; ++i) +{ +pixman_fixed_48_16_t tx, ty; +pixman_vector_t v; + +v.vector[0] = (i 0x01)? x1 : x2; +v.vector[1] = (i 0x02)? y1 : y2; +v.vector[2] = pixman_fixed_1; + +if (!pixman_transform_point (transform, v)) +return FALSE; + +tx = (pixman_fixed_48_16_t)v.vector[0]; +ty = (pixman_fixed_48_16_t)v.vector[1]; + +if (tx tx1) +tx1 = tx; +if (ty ty1) +ty1 = ty; +if (tx tx2) +tx2 = tx; +if (ty ty2) +ty2 = ty; +} + +transformed-x1 = tx1; +transformed-y1 = ty1; +transformed-x2 = tx2; +transformed-y2 = ty2; + +return TRUE; +} + +static void +create_image (uint32_t width, + uint32_t height, + pixman_format_code_t format, + pixman_filter_tfilter, + const pixman_transform_t *t, + uint32_t **bits, + pixman_image_t **image) +{ +uint32_t stride = (width * PIXMAN_FORMAT_BPP(format) + 31) / 32 * 4; + +*bits = aligned_malloc (PAGE_SIZE, stride * height); +memset (*bits, 0xCC, stride * height); +*image
Re: [Pixman] [PATCH] lowlevel-blt-bench: Parse test name strings in general case
On Wed, 08 Apr 2015 12:21:03 +0100, Pekka Paalanen ppaala...@gmail.com wrote: But if I'll rework the lookup tables, I can rework this too. Would be my pleasure, even, getting acquainted with Pixman style. :-) I made some revisions to affine-bench.c (and a couple of tweaks to lowlevel-blt-bench.c and pixman.c too) while responding to your comments yesterday, but stopped short of posting it because I hoped someone might express a preference about the addition of OS-specific code for reading cache sizes etc. If you're planning on having a play, I'll report them as they are. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] pixman.c: Coding style
A few violations of coding style were identified in code copied from here into affine-bench. --- pixman/pixman.c | 18 ++ 1 files changed, 10 insertions(+), 8 deletions(-) diff --git a/pixman/pixman.c b/pixman/pixman.c index 9555cea..a07c577 100644 --- a/pixman/pixman.c +++ b/pixman/pixman.c @@ -325,18 +325,20 @@ _pixman_compute_composite_region32 (pixman_region32_t * region, return TRUE; } -typedef struct +typedef struct box_48_16 box_48_16_t; + +struct box_48_16 { -pixman_fixed_48_16_t x1; -pixman_fixed_48_16_t y1; -pixman_fixed_48_16_t x2; -pixman_fixed_48_16_t y2; -} box_48_16_t; +pixman_fixed_48_16_tx1; +pixman_fixed_48_16_ty1; +pixman_fixed_48_16_tx2; +pixman_fixed_48_16_ty2; +}; static pixman_bool_t -compute_transformed_extents (pixman_transform_t *transform, +compute_transformed_extents (pixman_transform_t *transform, const pixman_box32_t *extents, -box_48_16_t *transformed) +box_48_16_t *transformed) { pixman_fixed_48_16_t tx1, ty1, tx2, ty2; pixman_fixed_t x1, y1, x2, y2; -- 1.7.5.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 4/5] lowlevel-blt-bench: Parse test name strings in general case
Hi, Sorry for the delay in following up... On Mon, 16 Mar 2015 14:33:47 -, Pekka Paalanen ppaala...@gmail.com wrote: I understood these arrays should have been replaced by format_from_string/operator_from_string functions from patch 2 and similar new functions doing lookups in the same arrays. Hmm, but I see there are several alternative spellings for operators, and the formats follow a whole new convention. Some background to this might help. Pixman supports a lot of pixel formats; of particular significance is the fact that in many cases you can have the same bits-per-pixel, divided up into the same bitpattern of colour components, but with the red and blue components exchanged. Each of Pixman's operations act upon between 1 and 3 bitmaps, each of which may have any supported pixel format. But in practice, any given application tends to stick to the same colour component ordering for most or all of its images, so the actual number of operations you're likely to encounter in practice is 1/2 (for 2-image operations) or 1/4 (for 3-image operations) of what you might otherwise expect. Furthermore, mathematically, an operation on (for example) two a8r8g8b8 images is identical to one on two a8b8g8r8 images, so typically a fast path written for one will be listed in the fast path table under both image formats. Because of this, a naming convention has evolved in the source code where fast path names include the string as an indication that it can be applied to either a8r8g8b8 or a8b8g8r8, with the implicit assumption that the other image has the same colour component ordering. lowlevel-blt-bench is most useful when you're testing the effectiveness of a particular fast path, so it makes sense that its test pattern names reflect the names of the fast path function that you're interested in. However, there are a few tests, like src_0888__rev or rpixbuf where the limitations of this approach start to show. I suspected I would get objections if I changed the command line of lowlevel-blt-bench, but in introducing a new tool, affine-bench, I saw an opportunity to allow the pixel formats to be specified more directly, and deliberately made its syntax different. It is a matter for debate as to whether the two tools should use the same syntax or not, and if so, which one they should standardise on. I'd be a bit uncomfortable with saying that is an alias for a8r8g8b8 or 0565 for r5g6b5, because that's not true in both directions, as I have described. I'd probably choose the more verbose form if the consensus was that the two tools should match. Is there anyone who cares either way, though? Personally I'm not really much of a fan of this much nesting, especially where you have a number of parsing steps and each step nests a bit more. I'd refactor this code into a function and use early (error) returns instead of nesting. Quick-and-dirty code there - it was only a test harness after all. But easily fixed. Looks like there is also no complaint, if the given string is not a valid test name. It never did generate any sort of error if you specified a bad test string - you just wouldn't have seen any results. Another easy fix - updated patch following shortly. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman