Hi, > From: Taekyun Kim <tkq....@samsung.com> > > Too short scanlines can cause repeat handling overhead and optimized > pixman composite functions usually process a bunch of pixels in a > single loop iteration it might be beneficial to pre-extend source > scanlines. The temporary buffers will usually reside in cache, so > accessing them should be quite efficient.
Generally, this looks good, and it's certainly a big performance improvement. I do have some comments though. The main thing I dislike is that it allocates an image on the stack and then only fills out some of the fields. The problem is if a fast path some day starts making use of say the flags in the image, then this code will start to siliently fail. A solution might be to refactor the image allocation code such that we can support stack allocated images internally. This could be done by changing adding _pixman_image_init () and _pixman_bits_image_init() functions, and then changing _pixman_image_allocate() and _pixman_bits_image_create() to call those functions. We would need the corresponding _fini() functions also. Another possibility might be to call the combiner functions instead of calling the fast path. Ie., calling _pixman_implementation_combine_32 (pixman_implementation_t *imp, pixman_op_t op, uint32_t * dest, const uint32_t * src, const uint32_t * mask, int width); instead of the fast path. If we are going to composite scanline by scanline anyway, then it doesn't seem like there should be any advantage to the fast paths over the combiners. The important architectures have SIMD variations of the combiners already. The potential issue, as Siarhei has pointed out several times, is that there is currently a bit of overhead from going through all the implementation->implementation->implementation indirections. Maybe its time to fix that. A simple fix would be to change the _pixman_implementation_combine_* functions into _pixman_implementation_lookup_combine_* functions that would return the desired combiner instead of calling it. This would speed up the general path as well. Also, a few formatting nit-picks: > static void > fast_composite_tiled_repeat (pixman_implementation_t *imp, > pixman_composite_info_t *info) > @@ -1223,27 +1225,114 @@ fast_composite_tiled_repeat (pixman_implementation_t > *imp, > int32_t sx, sy; > int32_t width_remain; > int32_t num_pixels; > + int32_t src_width; > + int32_t i, j; > + pixman_image_t extended_src_image; > + uint32_t extended_src[REPEAT_MIN_WIDTH*2]; We generally put spaces around * when it is used as a binary operator. > + if (need_src_extension) > + { > + if (src_bpp == 32) > + { > + PIXMAN_IMAGE_GET_LINE (src_image, 0, sy, uint32_t, > src_stride, src_line, 1); > + > + for (i=0; i<src_width; ) > + { > + for (j=0; j<src_image->bits.width; j++, i++) There are some missing spaces here as well (i=0, j=0, i<src_width etc.) > + { > + extended_src[i] = src_line[j]; > + } Generally braces should only be used when the body is more than one line, or if the other branch of an if statement has them. They should not be used if the body is only one line. Thanks, Soren _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman