On Wednesday 27 October 2010 02:52:26 Siarhei Siamashka wrote: > The slow path reporting code discovers some interesting things, for example > 'over_n_8_8' fast path seems to be needed for the Firefox browser when > opening http://pandaboard.org/ page: > > Oct 27 02:38:15 i7 firefox: pixman slow path: op=3 s=00010000|002E2A7F > m=08018000|002F0A7F d=08018000|002E0A7F - 99/45254 (30.818 MPix) > OVER > solid a8 a8 > -- src -- -- mask -- -- dest -- > NARROW_FORMAT NARROW_FORMAT NARROW_FORMAT > NO_ACCESSORS NO_ACCESSORS NO_ACCESSORS > NO_ALPHA_MAP NO_ALPHA_MAP NO_ALPHA_MAP > UNIFIED_ALPHA UNIFIED_ALPHA UNIFIED_ALPHA > NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_NORMAL_REPEAT > NO_PAD_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT > NO_REFLECT_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT > NEAREST_FILTER NEAREST_FILTER NEAREST_FILTER > NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER > AFFINE_TRANSFORM AFFINE_TRANSFORM AFFINE_TRANSFORM > ID_TRANSFORM ID_TRANSFORM ID_TRANSFORM > X_UNIT_POSITIVE X_UNIT_POSITIVE X_UNIT_POSITIVE > Y_UNIT_ZERO Y_UNIT_ZERO Y_UNIT_ZERO > IS_OPAQUE SAMPLES_COVER_CLIP
At least this one has been optimized for ARM NEON in pixman git master recently, along with some others. > Surely there are some other not yet optimized pixman usage cases which can > be encountered in the wild. And revisiting cairo traces may make sense too > in order to make sure that we have all the optimizations which could be > easily done. > > As there are no more comments/opinions, I'm going to prepare some more or > less final patches based on what we have now. They will be posted to the > mailing list shortly. The final variant of this code may need to wait because I don't quite like how it looks. And I also would like to test it more by using in it practice to hunt for some pixman slow paths to see whether it is effective. Anyway, I did run cairo-perf-trace benchmark with all the fast paths disabled just to see what kind of operations are used and how much. It may probably help when introducing optimizations for new platform or looking for the opportunities of improving performance of the existing optimizations. A short snippet of the most heavily used operations is listed at the end, and a full log is attached. Basically, all the operations fall into several groups ranging by complexity: 1. nonscaled operation - easy to implement, except maybe for some cases involving a1 format 2. nearest scaling without mask - also easy to implement because the main loop template code is now available in 'pixman-fast-path.h', with the possibility to override single scanline processing 3. two variants of nearest scaling with mask: a8 mask with SAMPLES_COVER_CLIP flag (most heavily used cases) and just a solid mask (the rest of the cases). Support for both of these is reasonably easy to add to the existing main loop template. 4. bilinear scaling, which eventually has to be SIMD optimized REFLECT repeat does not seem to be used anywhere (neither in pairo-perf-trace logs, nor in real applications on my typical linux desktop use). So is it even worth getting any optimizations in pixman, considering that it is more complex than the other types of repeat? Yes, that's somewhat similar to rotation which almost nobody uses, but at least it is easier to imagine some valid use cases for rotation. The other things not covered in this log are gradients. And gradients contribute a lot to the performance of some cairo traces. But they are another story. Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=1 s=20020888|000F6AFF m=00000000|00000000 d=20020888|000E4AFF - 52/18175 (1897.906 MPix) SRC x8r8g8b8 null x8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO IS_OPAQUE SAMPLES_OPAQUE SAMPLES_COVER_CLIP SAMPLES_OPAQUE Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=3 s=00010000|000E6A7F m=08018000|000F4A7F d=20020888|000E4AFF - 8/1168 (1279.996 MPix) OVER solid a8 x8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO Y_UNIT_ZERO IS_OPAQUE SAMPLES_COVER_CLIP SAMPLES_OPAQUE Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=3 s=00010000|000E6A7F m=08018000|000F4A7F d=20028888|000E4A7F - 3/638 (702.111 MPix) OVER solid a8 a8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO Y_UNIT_ZERO IS_OPAQUE SAMPLES_COVER_CLIP Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=3 s=00010000|000E6A7F m=20028888|000F497F d=20020888|000E4AFF - 7/203 (677.611 MPix) OVER solid a8r8g8b8 x8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP COMPONENT_ALPHA NO_ALPHA_MAP UNIFIED_ALPHA NO_ALPHA_MAP UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO Y_UNIT_ZERO IS_OPAQUE SAMPLES_COVER_CLIP SAMPLES_OPAQUE Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=3 s=00010000|000E6A7F m=01011000|000F4A7F d=20020888|000E4AFF - 21/1657 (610.733 MPix) OVER solid a1 x8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO Y_UNIT_ZERO IS_OPAQUE SAMPLES_COVER_CLIP SAMPLES_OPAQUE Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=1 s=20020888|000F6AFF m=00000000|00000000 d=20028888|000E4A7F - 198/202467 (564.478 MPix) SRC x8r8g8b8 null a8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO IS_OPAQUE SAMPLES_COVER_CLIP SAMPLES_OPAQUE Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=3 s=00010000|000E4A7F m=08018000|000F4A7F d=20028888|000E4A7F - 411/582133 (541.384 MPix) OVER solid a8 a8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA UNIFIED_ALPHA NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO Y_UNIT_ZERO SAMPLES_COVER_CLIP Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=12 s=20028888| 000F497F m=00000000|00000000 d=20028888|000E497F - 4/24 (511.706 MPix) ADD a8r8g8b8 null a8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS COMPONENT_ALPHA COMPONENT_ALPHA NO_ALPHA_MAP NO_ALPHA_MAP NO_NORMAL_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM ID_TRANSFORM ID_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO SAMPLES_COVER_CLIP Dec 1 02:18:31 i7 cairo-perf-trace: pixman slow path: op=3 s=20028888|000E9E7E m=00000000|00000000 d=20020888|000E4AFF - 243/247950 (485.485 MPix) OVER a8r8g8b8 null x8r8g8b8 -- src -- -- mask -- -- dest -- NARROW_FORMAT NARROW_FORMAT NO_ACCESSORS NO_ACCESSORS NO_ALPHA_MAP NO_ALPHA_MAP UNIFIED_ALPHA UNIFIED_ALPHA NO_NONE_REPEAT NO_NORMAL_REPEAT NO_PAD_REPEAT NO_PAD_REPEAT NO_REFLECT_REPEAT NO_REFLECT_REPEAT NEAREST_FILTER NEAREST_FILTER NO_CONVOLUTION_FILTER NO_CONVOLUTION_FILTER AFFINE_TRANSFORM AFFINE_TRANSFORM HAS_TRANSFORM ID_TRANSFORM SCALE_TRANSFORM X_UNIT_POSITIVE X_UNIT_POSITIVE Y_UNIT_ZERO Y_UNIT_ZERO SAMPLES_OPAQUE -- Best regards, Siarhei Siamashka
cairo-perf-trace-all-fast-path.txt.gz
Description: GNU Zip compressed data
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman