This patch series tries to improve the performance of C code when working with 16bpp color depth by implementing optimized iterators for fetching and writing back r5g6b5 pixel data. It may be useful for some less common CPU architectures, which do not have CPU specific optimizations yet. Also it may be useful for evaluating the quality of the existing and future CPU specific optimizations (the bar is set higher when faster C implementation is used as a reference).
The cairo-perf-trace benchmark run for the image16 backend on Intel Core i7 processor with PIXMAN_DISABLE environment variable set to "mmx sse2": Speedups ======== image16 firefox-asteroids (7197.13 0.06%) -> (6811.95 0.02%) : 1.06x speedup image16 midori-zoomed (3397.77 0.67%) -> (3226.48 0.73%) : 1.05x speedup image16 firefox-talos-svg (37677.55 0.05%) -> (36367.96 0.04%) : 1.04x speedup Profiling logs for the run over all the benchmark traces show that the overall improvement is rather small and the time spent in the r5g6b5 iterators changes from 0.88% 6094 libpixman-1.so.0.29.1 [.] fetch_scanline_r5g6b5 0.77% 5294 libpixman-1.so.0.29.1 [.] store_scanline_r5g6b5 to 0.59% 4018 libpixman-1.so.0.29.1 [.] fast_fetch_r5g6b5 0.52% 3550 libpixman-1.so.0.29.1 [.] fast_write_back_r5g6b5 Everything is just dominated by the bilinear scaling, which is likely a bit overrepresented in the cairo benchmark traces. Complete profiling logs for the sake of completeness: === before === 27.16% 187079 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888 14.55% 100256 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8 11.90% 81931 libpixman-1.so.0.29.1 [.] combine_over_u 6.31% 43415 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_r5g6b5 3.76% 25838 libpixman-1.so.0.29.1 [.] radial_compute_color 3.14% 21605 libpixman-1.so.0.29.1 [.] fetch_scanline_a8 3.01% 20710 libpixman-1.so.0.29.1 [.] fast_composite_over_8888_0565 2.88% 19829 libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel 2.53% 17370 libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate 2.30% 15806 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8 1.98% 13604 libpixman-1.so.0.29.1 [.] fast_path_fill 1.68% 11559 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_8888 1.52% 10496 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8888_0565_ca 1.28% 8846 libpixman-1.so.0.29.1 [.] combine_in_reverse_u 0.91% 6270 libpixman-1.so.0.29.1 [.] bits_image_fetch_general 0.90% 6203 libc-2.15.so [.] __memcpy_ssse3_back 0.88% 6094 libpixman-1.so.0.29.1 [.] fetch_scanline_r5g6b5 0.77% 5294 libpixman-1.so.0.29.1 [.] store_scanline_r5g6b5 0.51% 3483 libpixman-1.so.0.29.1 [.] radial_get_scanline_narrow 0.46% 3163 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_0565 0.42% 2924 libcairo.so.2.11200.0 [.] cell_list_render_edge 0.42% 2889 libpixman-1.so.0.29.1 [.] pixman_transform_point === after === 27.31% 186552 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888 14.66% 100225 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8 11.69% 79916 libpixman-1.so.0.29.1 [.] combine_over_u 6.35% 43366 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_r5g6b5 3.78% 25829 libpixman-1.so.0.29.1 [.] radial_compute_color 3.20% 21848 libpixman-1.so.0.29.1 [.] fetch_scanline_a8 3.04% 20767 libpixman-1.so.0.29.1 [.] fast_composite_over_8888_0565 2.90% 19812 libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel 2.56% 17489 libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate 2.30% 15701 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8 1.98% 13499 libpixman-1.so.0.29.1 [.] fast_path_fill 1.69% 11537 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_8888 1.62% 11089 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8888_0565_ca 1.31% 8909 libpixman-1.so.0.29.1 [.] combine_in_reverse_u 0.92% 6303 libpixman-1.so.0.29.1 [.] bits_image_fetch_general 0.91% 6216 libc-2.15.so [.] __memcpy_ssse3_back 0.59% 4018 libpixman-1.so.0.29.1 [.] fast_fetch_r5g6b5 0.52% 3550 libpixman-1.so.0.29.1 [.] fast_write_back_r5g6b5 0.51% 3464 libpixman-1.so.0.29.1 [.] radial_get_scanline_narrow 0.48% 3266 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_0565 0.42% 2898 libpixman-1.so.0.29.1 [.] pixman_transform_point The same patches are also available here: http://cgit.freedesktop.org/~siamashka/pixman-g2d/log/?h=iterators-r5g6b5 Siarhei Siamashka (6): test: add "src_0565_8888" to lowlevel-blt-bench Change CONVERT_XXXX_TO_YYYY macros into inline functions Faster conversion from a8r8g8b8 to r5g6b5 in C code Added C variants of r5g6b5 fetch/write-back iterators Faster write-back for the C variant of r5g6b5 dest iterator Faster fetch for the C variant of r5g6b5 src/dest iterator pixman/pixman-bits-image.c | 2 +- pixman/pixman-fast-path.c | 258 +++++++++++++++++++++++++++++++++++--------- pixman/pixman-inlines.h | 30 +++--- pixman/pixman-mmx.c | 20 ++-- pixman/pixman-private.h | 53 +++++++-- pixman/pixman-sse2.c | 8 +- pixman/pixman.c | 2 +- test/lowlevel-blt-bench.c | 1 + 8 files changed, 279 insertions(+), 95 deletions(-) -- 1.7.8.6 _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman