On Mon, Sep 7, 2015 at 2:03 PM, Pekka Paalanen <ppaala...@gmail.com> wrote: > On Sun, 6 Sep 2015 18:27:08 +0300 > Oded Gabbay <oded.gab...@gmail.com> wrote: > >> This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all >> the functions it calls (combine1, combine4 and >> core_combine_over_u_pixel_vmx). >> >> The optimization is done by removing use of expand_alpha_1x128 and >> expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from >> pixman_combine32.h. >> >> Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores, >> 3.4GHz, RHEL 7.2 ppc64le gave the following results: >> >> reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills) >> >> Before After Change >> -------------------------------------------- >> L1 182.05 210.22 +15.47% >> L2 180.6 208.92 +15.68% >> M 180.52 208.22 +15.34% >> HT 130.17 178.97 +37.49% >> VT 145.82 184.22 +26.33% >> R 104.51 129.38 +23.80% >> RT 48.3 61.54 +27.41% >> Kops/s 430 504 +17.21% >> >> Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> >> --- >> pixman/pixman-vmx.c | 80 >> ++++++++++++----------------------------------------- >> 1 file changed, 18 insertions(+), 62 deletions(-) >> >> diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c >> index a9bd024..d9fc5d6 100644 >> --- a/pixman/pixman-vmx.c >> +++ b/pixman/pixman-vmx.c > >> @@ -646,19 +643,10 @@ static force_inline uint32_t >> combine1 (const uint32_t *ps, const uint32_t *pm) >> { >> uint32_t s = *ps; >> + uint32_t a = ALPHA_8(*pm); > > pm is dereferenced before checked for NULL. > >> >> if (pm) >> - { >> - vector unsigned int ms, mm; >> - >> - mm = unpack_32_1x128 (*pm); >> - mm = expand_alpha_1x128 (mm); >> - >> - ms = unpack_32_1x128 (s); >> - ms = pix_multiply (ms, mm); >> - >> - s = pack_1x128_32 (ms); >> - } >> + UN8x4_MUL_UN8(s, a); >> >> return s; >> } > > Thanks, > pq
Thanks for catching that! Oded _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman