On Tue, Sep 22, 2015 at 3:59 PM, Pekka Paalanen <ppaala...@gmail.com> wrote: > On Mon, 21 Sep 2015 14:22:53 +0300 > Oded Gabbay <oded.gab...@gmail.com> wrote: > >> On Thu, Sep 10, 2015 at 7:16 PM, Siarhei Siamashka >> <siarhei.siamas...@gmail.com> wrote: >> > >> > On Thu, 10 Sep 2015 12:27:18 +0300 >> > Oded Gabbay <oded.gab...@gmail.com> wrote: >> > >> > > On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay <oded.gab...@gmail.com> >> > > wrote: >> > > > >> > > > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka >> > > > <siarhei.siamas...@gmail.com> wrote: >> > > > > Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz, >> > > > > Gentoo ppc (32-bit userland) gave the following results: >> > > > > >> > > > > before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07 >> > > > > after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48 >> > > > > >> > > > > Signed-off-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> >> > > > > --- >> > > > > pixman/pixman-vmx.c | 54 >> > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > > 1 files changed, 54 insertions(+), 0 deletions(-) >> > > > > >> > > > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c >> > > > > index a9bd024..9e551b3 100644 >> > > > > --- a/pixman/pixman-vmx.c >> > > > > +++ b/pixman/pixman-vmx.c >> > > > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_8888 >> > > > > (pixman_implementation_t *imp, >> > > > > } >> > > > > >> > > > > static void >> > > > > +vmx_composite_over_n_8888 (pixman_implementation_t *imp, >> > > > > + pixman_composite_info_t *info) >> > > > > +{ >> > > > > + PIXMAN_COMPOSITE_ARGS (info); >> > > > > + uint32_t *dst_line, *dst; >> > > > > + uint32_t src, ia; >> > > > > + int i, w, dst_stride; >> > > > > + vector unsigned int vdst, vsrc, via; >> > > > > + >> > > > > + src = _pixman_image_get_solid (imp, src_image, >> > > > > dest_image->bits.format); >> > > > > + >> > > > > + if (src == 0) >> > > > > + return; >> > > > > + >> > > > > + PIXMAN_IMAGE_GET_LINE ( >> > > > > + dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, >> > > > > 1); >> > > > > + >> > > > > + vsrc = (vector unsigned int){src, src, src, src}; >> > > > > + via = negate (splat_alpha (vsrc)); >> > > > If we will use the over function (see my next comment), we need to >> > > > remove the negate() from the above statement, as it is done in the >> > > > over function. >> > > > >> > > > > + ia = ALPHA_8 (~src); >> > > > > + >> > > > > + while (height--) >> > > > > + { >> > > > > + dst = dst_line; >> > > > > + dst_line += dst_stride; >> > > > > + w = width; >> > > > > + >> > > > > + while (w && ((uintptr_t)dst & 15)) >> > > > > + { >> > > > > + uint32_t d = *dst; >> > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); >> > > > > + *dst++ = d; >> > > > > + w--; >> > > > > + } >> > > > > + >> > > > > + for (i = w / 4; i > 0; i--) >> > > > > + { >> > > > > + vdst = pix_multiply (load_128_aligned (dst), via); >> > > > > + save_128_aligned (dst, pix_add (vsrc, vdst)); >> > > > >> > > > Instead of the above two lines, I would simply use the over function >> > > > in vmx, which does exactly that. So: >> > > > vdst = over(vsrc, via, load_128_aligned(dst)) >> > > > save_128_aligned (dst, vdst); >> > > > >> > > > I prefer this as it reuses an existing function which helps >> > > > maintainability, and using it has no impact on performance. >> > > > >> > > > > + dst += 4; >> > > > > + } >> > > > > + >> > > > > + for (i = w % 4; --i >= 0;) >> > > > > + { >> > > > > + uint32_t d = dst[i]; >> > > > > + UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src); >> > > > > + dst[i] = d; >> > > > > + } >> > > > > + } >> > > > > +} >> > > > > + >> > > > > +static void >> > > > > vmx_composite_over_8888_8888 (pixman_implementation_t *imp, >> > > > > pixman_composite_info_t *info) >> > > > > { >> > > > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP >> > > > > (vmx_8888_8888_normal_OVER, >> > > > > >> > > > > static const pixman_fast_path_t vmx_fast_paths[] = >> > > > > { >> > > > > + PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, >> > > > > vmx_composite_over_n_8888), >> > > > > + PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, >> > > > > vmx_composite_over_n_8888), >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, >> > > > > vmx_composite_over_8888_8888), >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, >> > > > > vmx_composite_over_8888_8888), >> > > > > PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, >> > > > > vmx_composite_over_8888_8888), >> > > > > -- >> > > > > 1.7.8.6 >> > > > > > >> >> I took your advice and run benchmarks with the *non* trimmed traces. >> It takes 144 minutes to run a benchmark on POWER8, 3.4GHz 8 Cores (a >> raw machine, not VM). >> >> I compared with and without this patch and I got: >> >> image ocitysmap 659.69 -> 611.71 : 1.08x speedup >> image xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup >> >> So I guess we can merge this patch, because I prefer the non-trimmed >> results over the trimmed ones. >> The low-level-blt giving significant improvement is also a good sign. >> >> Therefore: >> Reviewed-by: Oded Gabbay <oded.gab...@gmail.com> > > Hi, > > this is only touching pixman-vmx.c and I don't see anything to complain > about here, so as long as you're happy, > Acked-by Pekka Paalanen <pekka.paala...@collabora.co.uk> > > Did you overrule your own comments? :-)
Oh, right. Now I see I had made some comments above about using existing helper function. Siarhei, Do you object that I will replace the lines above with call to the existing helper function, before pushing this patch ? Oded > > > Thanks, > pq _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman