From: Siarhei Siamashka <siarhei.siamas...@nokia.com> This patch series introduces support for creating specialized bilinear fast path functions which perform processing in a single pass without intermediate temporary buffers and also can make efficient use of SIMD optimizations. The performance critical code is implemented as scanline processing functions with main loop logic being reused via common macro template. Such scanline processing functions are simple enough to implement and at the same time large enough not to constrain optimization opportunities and possibilities to do loop unrolling for processing multiple pixels per iteration.
As a result, bilinear scaled 'src_8888_8888' operation (simple scaled copy of the image) becomes more than 2 times faster with SSE2 and more than 6 times faster with ARM NEON when compared to the general pixman compositing path. And single pass processing alone is providing some modest, but measurable speedup even without SIMD. I'm mostly exclusively interested in ARM NEON and I did not spend any extra time on tuning this SSE2 code. So SSE2 scaler may be actually not good enough. Nevertheless it is still faster than C. The disadvantage of this method is the high specialization, so that each particular type of compositing operation needs its own fast path code. But it does not prevent us from also adding universal SIMD optimized fetchers later. Anyway, adding specialized fast paths is the way to go when targeting best performance for some of the most common operations. I'll try to add more SIMD optimized bilinear fast path functions shortly, based on analyzing cairo-traces and profiling real use cases. The same patches are also available in the following branch: http://cgit.freedesktop.org/~siamashka/pixman/log/?h=sent/bilinear-scaling-simd-20110222 Siarhei Siamashka (7): Main loop template for fast single pass bilinear scaling test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds' C variant of bilinear scaled 'src_8888_8888' fast path C variant of bilinear scaled 'src_8888_8_8888' fast path C variant of bilinear scaled 'src_8888_n_8888' fast path SSE2 optimization for bilinear scaled 'src_8888_8888' ARM: NEON optimization for bilinear scaled 'src_8888_8888' pixman/pixman-arm-neon-asm.S | 197 +++++++++++++++++++ pixman/pixman-arm-neon.c | 45 +++++ pixman/pixman-fast-path.c | 304 +++++++++++++++++++++++++++++ pixman/pixman-fast-path.h | 432 ++++++++++++++++++++++++++++++++++++++++++ pixman/pixman-sse2.c | 112 +++++++++++ test/Makefile.am | 2 + test/scaling-helpers-test.c | 93 +++++++++ 7 files changed, 1185 insertions(+), 0 deletions(-) create mode 100644 test/scaling-helpers-test.c -- 1.7.3.4 _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman