On Friday 20 August 2010 19:36:07 Xu, Samuel wrote: > We measured performance, and compared with original SSE2 intrinsic enabled > version(0.19.4), on ATOM, and get following findings using 480P flash > H.264 video playing workload: > 1) sse2_composite_src_x888_8888()'s cycle reduced 67%. This function's total > cycle ratio over whole system reduced from 5.6% to 1.9%
This is not directly related to your pixman patch, but looks like the right place to fix the performance problem in your flash use case is YUV->RGB conversion. Setting alpha channel to 0xFF there would be the most efficient and src_x888_8888 operation could be totally eliminated. That said, improving pixman performance in general is still welcome and is a nice thing to have (for the other common use cases at least). I see that you dropped ssse3 x8r8g8b8 fetcher optimization in your last patch and added ssse3 src_x888_8888 fast path instead. It's a bit sad, because other operations like over_8888_x888 with bilinear scaling could also benefit from the optimized fetcher for example. On the other hand, the lack of clarity regarding how to add SIMD optimized fetchers is avoided this way ;) > 2) whole system's C0 percentage reduced from 68.0% to 62.6% > Maybe it is not " dramatically", while we are glad to see those gain on > both perf and power. A peformance gain in the 4-5% ballpark looks like a major improvement to me. -- Best regards, Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman