On Fri, 21 Dec 2012 18:51:03 -0000 "Ben Avison" <bavi...@riscosopen.org> wrote:
> There is very little in common with the previous revision of this source > file, but I present it as a patch nevertheless. Can we have some more descriptive commit message for this and the other patches? Preferably benchmark results should be also here for the newly added or improved code. > diff --git a/pixman/pixman-arm-simd-asm.S b/pixman/pixman-arm-simd-asm.S > index b438001..8700da9 100644 > --- a/pixman/pixman-arm-simd-asm.S > +++ b/pixman/pixman-arm-simd-asm.S [...] > +.macro over_8888_8888_1pixel src, dst, offset, next > + /* src = destination component multiplier */ > + rsb WK&src, WK&src, #255 > + /* Split even/odd bytes of dst into SCRATCH/dst */ > + uxtb16 SCRATCH, WK&dst > + uxtb16 WK&dst, WK&dst, ror #8 > + /* Multiply through, adding 0.5 to the upper byte of result for > rounding */ > + mla SCRATCH, SCRATCH, WK&src, MASK > + mla WK&dst, WK&dst, WK&src, MASK > + /* Where we would have had a stall between the result of the first > MLA and the shifter input, > + * reload the complete source pixel */ > + ldr WK&src, [SRC, #offset] > + /* Multiply by 257/256 to approximate 256/255 */ > + uxtab16 SCRATCH, SCRATCH, SCRATCH, ror #8 > + /* In this stall, start processing the next pixel */ > + .if offset < -4 > + mov WK&next, WK&next, lsr #24 > + .endif > + uxtab16 WK&dst, WK&dst, WK&dst, ror #8 > + /* Recombine even/odd bytes of multiplied destination */ > + mov SCRATCH, SCRATCH, ror #8 > + sel WK&dst, SCRATCH, WK&dst > + /* Saturated add of source to multiplied destination */ > + uqadd8 WK&dst, WK&dst, WK&src > +.endm Looks like this over_8888_8888_1pixel macro uses one instruction more than the current code. Is it intended? -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman