Re: EXA performance
Bernardo Innocenti wrote: > Aleph, could you post an oprofile of Sugar switching between zoom levels > in both 16bpp and 24bpp? Doing it manually by pressing F1 to F4 would be > good enough for me: I just want to get an idea of where we spend the time. > Of course, X, amd_drv, pixman and cairo need to be built with -g. Here are the oprofile results while switching between the 4 Sugar zoom levels for sixty seconds, with the journal displaying past activities at the 4th level. Unfortunately the oprofile callgraph feature doesn't work on the Geode, but I hope these data are of some use anyway. CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 2936 47.6082 python TIMER:0| samples| %| -- 1543 52.5545 libpython2.5.so.1.0 594 20.2316 libcairo.so.2.11.5 276 9.4005 libc-2.6.so 116 3.9510 libgobject-2.0.so.0.1200.13 79 2.6907 libglib-2.0.so.0.1200.13 68 2.3161 libpangoft2-1.0.so.0.1600.4 62 2.1117 libpthread-2.6.so 41 1.3965 libpango-1.0.so.0.1600.4 31 1.0559 libgtk-x11-2.0.so.0.1000.14 24 0.8174 libgdk-x11-2.0.so.0.1000.14 15 0.5109 libfreetype.so.6.3.15 13 0.4428 libhippocanvas-1.so.0.0.0 12 0.4087 libX11.so.6.2.0 10 0.3406 _gobject.so 9 0.3065 librsvg-2.so.2.16.1 8 0.2725 libgthread-2.0.so.0.1200.13 6 0.2044 libpangocairo-1.0.so.0.1600.4 5 0.1703 anon (tgid:2962 range:0xb7fc7000-0xb7fc8000) 5 0.1703 libm-2.6.so 5 0.1703 libXrender.so.1.3.0 5 0.1703 hippo.so 3 0.1022 libdbus-1.so.3.2.0 3 0.1022 _cairo.so 1 0.0341 libwnck-1.so.18.2.10 1 0.0341 libxml2.so.2.6.29 1 0.0341 _gtk.so 2645 42.8896 no-vmlinux 454 7.3618 Xorg TIMER:0| samples| %| -- 202 44.4934 libpixman-1.so.0.9.5 84 18.5022 amd_drv.so 82 18.0617 Xorg 42 9.2511 libexa.so 29 6.3877 libc-2.6.so 12 2.6432 libfb.so 2 0.4405 libextmod.so 1 0.2203 anon (tgid:2942 range:0xb7f83000-0xb7f84000) ... CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %image name app name symbol name 2645 42.8896 no-vmlinux no-vmlinux (no symbols) 1543 25.0203 libpython2.5.so.1.0 python (no symbols) 116 1.8810 libgobject-2.0.so.0.1200.13 python (no symbols) 114 1.8485 libpixman-1.so.0.9.5 Xorg pixman_rasterize_edges 791.2810 libglib-2.0.so.0.1200.13 python (no symbols) 681.1026 libpangoft2-1.0.so.0.1600.4 python (no symbols) 560.9081 amd_drv.so Xorg gp_color_bitmap_to_screen_blt 530.8594 libpixman-1.so.0.9.5 Xorg pixman_fill 470.7621 libcairo.so.2.11.5 python fbRasterizeEdges 460.7459 libc-2.6.so python _int_malloc 460.7459 libc-2.6.so python memcpy 410.6648 libpango-1.0.so.0.1600.4 python (no symbols) 390.6324 libcairo.so.2.11.5 python _PointDistanceSquaredToSegment 350.5675 libc-2.6.so python __ctype_b_loc 340.5513 libc-2.6.so bash __gconv_transform_utf8_internal 310.5027 libgtk-x11-2.0.so.0.1000.14 python (no symbols) 260.4216 bash bash (no symbols) 250.4054 libcairo.so.2.11.5 python _cairo_bentley_ottmann_tessellate_polygon 240.3892 libgdk-x11-2.0.so.0.1000.14 python (no symbols) 230.3730 libc-2.6.so bash mbrtowc 230.3730 libcairo.so.2.11.5 python _cairo_bo_event_queue_insert_if_intersect_below_current_y 200.3243 libc-2.6.so python _int_free 200.3243 libc-2.6.so python msort_with_tmp 200.3243 libpthread-2.6.sopython pthread_mutex_lock 190.3081 libcairo.so.2.11.5 python __divdi3 180.2919 libc-2.6.so python free 180.2919 libcairo.so.2.11.5 python _cairo_pixman_composite_solid_mask_nx8xmmx 180.2919 libcairo.so.2.11.5 python _cairo_pixman_render_sample_floor_y 180.2919 libcairo.so.2.11.5 python _cairo_uint64x64_128_mul 17
Re: EXA performance
Stefano Fedrigo wrote: > Here are results of two runs of the cairo's performance test suite, at 16 and > 24 bpp. Test hardware is a B4, xserver-1.4-branch, cairo 1.4.10. This time > 24 bpp is overall better. That's invaluable data, thank you! > Only cairo_paint() is slower at 24, but not cairo_paint_with_alpha(), which is > actually a lot faster. Someone told me that some paths in pixman actually convert 16bpp scanlines to a temporary 24bpp buffer, then do their work, then convert back to 16bpp. If true, this would explain the observed slowness at 16bpp. Aleph, could you post an oprofile of Sugar switching between zoom levels in both 16bpp and 24bpp? Doing it manually by pressing F1 to F4 would be good enough for me: I just want to get an idea of where we spend the time. Of course, X, amd_drv, pixman and cairo need to be built with -g. -- // Bernardo Innocenti - One Laptop Per Child \X/ http://www.codewiz.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: EXA performance
Jim Gettys wrote: > I'm worried about the fact that Cairo likes to do things 32 bits > and they then have to be converted and composited at 16 bits; Here are results of two runs of the cairo's performance test suite, at 16 and 24 bpp. Test hardware is a B4, xserver-1.4-branch, cairo 1.4.10. This time 24 bpp is overall better. Only cairo_paint() is slower at 24, but not cairo_paint_with_alpha(), which is actually a lot faster. I oprofiled the whole runs, but haven't found significant differences between 16 and 24 bpp. Probably a profiling of the single tests could be more useful. Output of cairo-perf-diff (nicer html format attached): old: cairo-perf-16 new: cairo-perf-24 Speedups xlib-rgb paint-with-alpha_similar_rgb_over-512 192.24 0.06% -> 47.68 0.16%: 4.04x speedup ███ xlib-rgb paint-with-alpha_similar_rgb_over-256 48.51 0.14% -> 12.39 0.56%: 3.95x speedup ███ xlib-rgb paint-with-alpha_image_rgb_over-256 63.25 0.26% -> 29.27 0.51%: 2.17x speedup █▏ xlib-rgb paint-with-alpha_image_rgb_over-512 250.25 1.01% -> 122.63 2.40%: 2.09x speedup █▏ xlib-rgb paint-with-alpha_similar_rgba_over-512 180.26 0.08% -> 86.75 0.10%: 2.08x speedup █▏ xlib-rgb paint-with-alpha_similar_rgba_over-256 45.66 0.13% -> 22.17 0.34%: 2.07x speedup █▏ xlib-rgb fill_similar_rgb_over-256 16.43 0.52% -> 9.66 0.95%: 1.71x speedup ▊ xlib-rgb paint-with-alpha_image_rgba_over-256 62.87 0.34% -> 39.25 0.48%: 1.61x speedup ▋ xlib-rgb paint-with-alpha_image_rgba_over-512 248.95 0.32% -> 159.74 0.45%: 1.56x speedup ▋ xlib-rgb stroke_similar_rgb_over-256 37.00 0.26% -> 25.80 0.40%: 1.44x speedup ▌ xlib-rgb fill_similar_rgb_over-1285.71 1.08% -> 4.05 1.88%: 1.42x speedup ▍ xlib-rgb fill_similar_rgba_over-256 17.66 0.52% -> 12.71 0.61%: 1.40x speedup ▍ xlib-rgb paint-with-alpha_linear_rgb_over-256 88.94 0.21% -> 65.39 0.27%: 1.36x speedup ▍ xlib-rgb paint-with-alpha_linear_rgba_over-256 88.53 0.24% -> 64.95 0.26%: 1.36x speedup ▍ xlib-rgb paint-with-alpha_linear_rgb_over-512 354.73 0.19% -> 265.89 0.47%: 1.34x speedup ▍ xlib-rgb paint-with-alpha_linear_rgba_over-512 352.53 0.24% -> 264.01 0.46%: 1.34x speedup ▍ xlib-rgb fill_image_rgb_over-256 24.06 0.52% -> 18.41 0.62%: 1.31x speedup ▍ xlib-rgbstroke_similar_rgba_over-256 37.82 0.26% -> 29.30 0.33%: 1.30x speedup ▎ xlib-rgb fill_similar_rgba_over-1286.06 0.50% -> 4.82 1.58%: 1.29x speedup ▎ xlib-rgbfill_image_rgba_over-256 26.38 0.46% -> 21.45 0.44%: 1.23x speedup ▎ xlib-rgb stroke_similar_rgb_over-128 15.97 0.47% -> 13.26 0.59%: 1.21x speedup ▎ xlib-rgb stroke_image_rgb_over-256 49.07 0.24% -> 41.01 0.32%: 1.20x speedup ▎ xlib-rgbfill_linear_rgb_over-256 35.22 0.31% -> 29.54 0.37%: 1.19x speedup ▎ xlib-rgb paint-with-alpha_radial_rgb_over-256 156.75 0.35% -> 133.54 0.14%: 1.17x speedup ▏ xlib-rgb paint-with-alpha_radial_rgb_over-512 627.70 0.15% -> 539.14 0.23%: 1.17x speedup ▏ xlib-rgb fill_similar_rgb_over-64 3.26 2.03% -> 2.80 2.55%: 1.16x speedup ▏ xlib-rgb paint-with-alpha_radial_rgba_over-256 152.81 0.36% -> 132.55 0.78%: 1.16x speedup ▏ xlib-rgb stroke_image_rgba_over-256 51.60 0.22% -> 44.51 0.35%: 1.16x speedup ▏ xlib-rgbstroke_similar_rgba_over-128 16.21 0.49% -> 14.10 0.46%: 1.15x speedup ▏ xlib-rgb fill_image_rgb_over-1288.01 0.79% -> 6.92 0.62%: 1.15x speedup ▏ xlib-rgb stroke_linear_rgb_over-256 67.40 0.22% -> 58.75 0.28%: 1.15x speedup ▏ xlib-rgb paint-with-alpha_radial_rgba_over-512 611.50 0.18% -> 536.93 0.32%: 1.14x speedup ▏ xlib-rgb fill_similar_rgb_source-256 31.79 0.26% -> 27.98 0.22%: 1.14x speedup ▏ xlib-rgb paint-with-alpha_similar_rgb_source-512 306.70 0.07% -> 270.02 0.08%: 1.14x speedup ▏ xlib-rgb paint-with-alpha_similar_rgb_source-256 77.35 0.12% -> 68.40 0.19%: 1.13x speedup ▏ xlib-rgbfill_linear_rgb_over-128 10.86 0.73% -> 9.69 0.84%: 1.12x speedup ▏ xlib-rgb stroke_similar_rgb_source-256 62.25 0.14% -> 55.66 0.20%: 1.12x speedup ▏ xlib-rgbfill_image_rgba_over-1288.54 0.66% -> 7.66 0.62%: 1.12x speedup ▏ xlib-rgb fill_linear_rgba_over-256 40.03 0.27% -> 35.93 0.31%: 1.12x speedup ▏ xlib-rgb stroke_image_rgb_over-128 19.33 0.44% -> 17.38 0.53%: 1.11x speedup ▏ xlib-rgb fill_similar_rgba_over-64 3.32 2.15% -> 2.99 2.47%: 1.11x speedup ▏ xlib-rgb paint-with-alpha_solid_rgba_source-512 157.75 0.14% -> 142.45 0.13%: 1.11x speedup ▏ xlib-rgb paint-with-alpha_solid_rgb_source-512 157.67 0.13% -> 142.39 0.13%: 1.11x speedup ▏ xlib-rgb paint-with-alpha_solid_rgba_source-256 39.69 0.24% -> 35.98 0.29%: 1.10x speedup ▏ xlib-rgbfill_similar_rgba_source-256 30.51 0.28% -> 27.70 0.25%: 1.10x speedup ▏ xlib-rgb fill_similar_rgb_source-1289
Re: EXA performance
It's not clear to me that micro-benchmarks are going to tell us much at all: I'm worried about the fact that Cairo likes to do things 32 bits and they then have to be converted and composited at 16 bits; the microbenchmarks won't tell us much about how common these conversion costs are. I think it is time for a tinderbox run with oprofile. - Jim On Thu, 2007-08-30 at 11:03 -0600, Jordan Crouse wrote: > On 30/08/07 12:48 +0200, Stefano Fedrigo wrote: > > Bernardo Innocenti wrote: > > > Performance of 16bpp VS 24bpp has been a hot topic recently. > > > We know 24bpp to be much faster for some operations (my > > > bench.py notably) and much slower for others (image puts), > > > but it's not clear which one is the overall winner > > > for our typical workload. > > > > > > Jim would like to see some numbers in order to make a > > > decision. Aleph, could you publish the results of x11perf > > > at least? The Cairo perfomance suite would be even more > > > interesting to see. > > > > Made two runs with x11perf, at 16 and 24 bpp. Xserver from > > server-1.4-branch git tree, on a B4 laptop. > > x11perfcomp output attached, sorted by relative performance. > > > > Some operations are faster at 24bpp, other ones better at 16bpp. > > Looking at these data 16bpp seems a better choice: a greater number of > > ops fare better than at 24bpp, but one has to consider what are the > > operations Sugar does more. > > I'm going to do some Cairo testing... > > Looking at these numbers, I get the impression that we might be CPU bound > in more places then we thought - we're moving twice the bytes in 24bpp mode, > and the lowest performing tests (which all move a bunch of bytes), > seem to generally reflect half the performance. Solid fills, which require > no byte moving at all are coming in at 1:1i and thats expected, since the > hardware will move just as fast for 16bpp and 24bpp, and the alpha fills are > coming at twice the performance because we don't have the additional > conversion hit in the driver composite code. > > If thats true, then yes, 16bpp will be better, because Sugar is all about > pixmaps (few solid fills). But massive profiling is needed to verify all > this. > > Jordan > > > > > 1: x11perf-16.log > > 2: x11perf-24.log > > 1 2 Operation > > - - > > 53.7 18.2 ( 0.34) PutImage XY 100x100 square > > 5.01.7 ( 0.34) ShmPutImage XY 500x500 square > > 2.30.8 ( 0.35) PutImage XY 500x500 square > >133.0 49.8 ( 0.37) ShmPutImage XY 100x100 square > > 2470.0 1000.0 ( 0.40) PutImage 100x100 square > >492.0 202.0 ( 0.41) Scroll 500x500 pixels > > 3200.0 1420.0 ( 0.44) PutImage XY 10x10 square > >799.0 360.0 ( 0.45) Copy 500x500 from pixmap to pixmap > > 78.0 35.8 ( 0.46) PutImage 500x500 square > > 1840.0 886.0 ( 0.48) 500x500 rectangle > >805.0 390.0 ( 0.48) 500x500 tiled rectangle (161x145 tile) > >858.0 414.0 ( 0.48) 500x500 tiled rectangle (216x208 tile) > >773.0 369.0 ( 0.48) Copy 500x500 from pixmap to window > >774.0 369.0 ( 0.48) Copy 500x500 from window to pixmap > >827.0 400.0 ( 0.48) GetImage 100x100 square > >193.0 91.8 ( 0.48) ShmPutImage 500x500 square > > 5120.0 2490.0 ( 0.49) ShmPutImage 100x100 square > > 17600.0 8800.0 ( 0.50) Copy 100x100 from pixmap to pixmap > >746.0 372.0 ( 0.50) Copy 500x500 from window to window > > 1.60.8 ( 0.50) GetImage XY 500x500 square > > 58300.029700.0 ( 0.51) 100x100 rectangle > > 4080.0 2070.0 ( 0.51) 500x500 wide rectangle outline > > 17200.0 8760.0 ( 0.51) Copy 100x100 from pixmap to window > > 17100.0 8760.0 ( 0.51) Copy 100x100 from window to pixmap > > 17300.0 8750.0 ( 0.51) Copy 100x100 from window to window > > 39.4 20.2 ( 0.51) GetImage XY 100x100 square > > 18500.0 9470.0 ( 0.51) Scroll 100x100 pixels > > 27100.014000.0 ( 0.52) 100x100 tiled rectangle (216x208 tile) > > 25400.013400.0 ( 0.53) 100x100 tiled rectangle (161x145 tile) > > 32.1 17.4 ( 0.54) GetImage 500x500 square > >130.0 71.7 ( 0.55) Copy 500x500 1-bit deep plane > >793.0 435.0 ( 0.55) Fill 300x300 tiled trapezoid (216x208 tile) > > 5360.0 2930.0 ( 0.55) ShmPutImage XY 10x10 square > > 1600.0 893.0 ( 0.56) 500-pixel solid circle > > 2720.0 1610.0 ( 0.59) Copy 100x100 1-bit deep plane > > 93100.057000.0 ( 0.61) 100-pixel line > > 19600.011900.0 ( 0.61) 500-pixel line > > 16300.010300.0 ( 0.63) 500-pixel line segment > > 8970.0 5780.0 ( 0.64) 500-pixel circle > >108.0 68.6 ( 0.64) 500x500 stippled rectangle (161x145 stip
Re: EXA performance
On 30/08/07 12:48 +0200, Stefano Fedrigo wrote: > Bernardo Innocenti wrote: > > Performance of 16bpp VS 24bpp has been a hot topic recently. > > We know 24bpp to be much faster for some operations (my > > bench.py notably) and much slower for others (image puts), > > but it's not clear which one is the overall winner > > for our typical workload. > > > > Jim would like to see some numbers in order to make a > > decision. Aleph, could you publish the results of x11perf > > at least? The Cairo perfomance suite would be even more > > interesting to see. > > Made two runs with x11perf, at 16 and 24 bpp. Xserver from > server-1.4-branch git tree, on a B4 laptop. > x11perfcomp output attached, sorted by relative performance. > > Some operations are faster at 24bpp, other ones better at 16bpp. > Looking at these data 16bpp seems a better choice: a greater number of > ops fare better than at 24bpp, but one has to consider what are the > operations Sugar does more. > I'm going to do some Cairo testing... Looking at these numbers, I get the impression that we might be CPU bound in more places then we thought - we're moving twice the bytes in 24bpp mode, and the lowest performing tests (which all move a bunch of bytes), seem to generally reflect half the performance. Solid fills, which require no byte moving at all are coming in at 1:1i and thats expected, since the hardware will move just as fast for 16bpp and 24bpp, and the alpha fills are coming at twice the performance because we don't have the additional conversion hit in the driver composite code. If thats true, then yes, 16bpp will be better, because Sugar is all about pixmaps (few solid fills). But massive profiling is needed to verify all this. Jordan > > 1: x11perf-16.log > 2: x11perf-24.log > 1 2 Operation > - - > 53.7 18.2 ( 0.34) PutImage XY 100x100 square > 5.01.7 ( 0.34) ShmPutImage XY 500x500 square > 2.30.8 ( 0.35) PutImage XY 500x500 square >133.0 49.8 ( 0.37) ShmPutImage XY 100x100 square > 2470.0 1000.0 ( 0.40) PutImage 100x100 square >492.0 202.0 ( 0.41) Scroll 500x500 pixels > 3200.0 1420.0 ( 0.44) PutImage XY 10x10 square >799.0 360.0 ( 0.45) Copy 500x500 from pixmap to pixmap > 78.0 35.8 ( 0.46) PutImage 500x500 square > 1840.0 886.0 ( 0.48) 500x500 rectangle >805.0 390.0 ( 0.48) 500x500 tiled rectangle (161x145 tile) >858.0 414.0 ( 0.48) 500x500 tiled rectangle (216x208 tile) >773.0 369.0 ( 0.48) Copy 500x500 from pixmap to window >774.0 369.0 ( 0.48) Copy 500x500 from window to pixmap >827.0 400.0 ( 0.48) GetImage 100x100 square >193.0 91.8 ( 0.48) ShmPutImage 500x500 square > 5120.0 2490.0 ( 0.49) ShmPutImage 100x100 square > 17600.0 8800.0 ( 0.50) Copy 100x100 from pixmap to pixmap >746.0 372.0 ( 0.50) Copy 500x500 from window to window > 1.60.8 ( 0.50) GetImage XY 500x500 square > 58300.029700.0 ( 0.51) 100x100 rectangle > 4080.0 2070.0 ( 0.51) 500x500 wide rectangle outline > 17200.0 8760.0 ( 0.51) Copy 100x100 from pixmap to window > 17100.0 8760.0 ( 0.51) Copy 100x100 from window to pixmap > 17300.0 8750.0 ( 0.51) Copy 100x100 from window to window > 39.4 20.2 ( 0.51) GetImage XY 100x100 square > 18500.0 9470.0 ( 0.51) Scroll 100x100 pixels > 27100.014000.0 ( 0.52) 100x100 tiled rectangle (216x208 tile) > 25400.013400.0 ( 0.53) 100x100 tiled rectangle (161x145 tile) > 32.1 17.4 ( 0.54) GetImage 500x500 square >130.0 71.7 ( 0.55) Copy 500x500 1-bit deep plane >793.0 435.0 ( 0.55) Fill 300x300 tiled trapezoid (216x208 tile) > 5360.0 2930.0 ( 0.55) ShmPutImage XY 10x10 square > 1600.0 893.0 ( 0.56) 500-pixel solid circle > 2720.0 1610.0 ( 0.59) Copy 100x100 1-bit deep plane > 93100.057000.0 ( 0.61) 100-pixel line > 19600.011900.0 ( 0.61) 500-pixel line > 16300.010300.0 ( 0.63) 500-pixel line segment > 8970.0 5780.0 ( 0.64) 500-pixel circle >108.0 68.6 ( 0.64) 500x500 stippled rectangle (161x145 stipple) > 9000.0 5800.0 ( 0.64) Char in 30-char aa core line (Charter 24) >242.0 154.0 ( 0.64) Fill 300x300 opaque stippled trapezoid > (161x145 stipple) > 77200.05.0 ( 0.65) 100-pixel line segment >107.0 69.1 ( 0.65) 500x500 opaque stippled rectangle (161x145 > stipple) > 755000.0 501000.0 ( 0.66) 10-pixel line >418.0 277.0 ( 0.66) 500x500 tiled rectangle (17x15 tile) > 74400.049500.0 ( 0.67) 100-pixel line segment (1 kid) > 2450.0 1630.0 ( 0.67) 100x100 opaque stippled rectangle (161x145 > stipple) > 5180.0
Re: EXA performance
> > I'm going to do some Cairo testing... > I look forward to that. Maybe some hacks would help me improve Measure activity's (which uses Cairo for all its drawing) response time too ? http://pastebin.com/m4db7bc4e is where all drawing is done. regards, Arjun -- Arjun Sarwal ( [EMAIL PROTECTED] ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: EXA performance
Bernardo Innocenti wrote: > Performance of 16bpp VS 24bpp has been a hot topic recently. > We know 24bpp to be much faster for some operations (my > bench.py notably) and much slower for others (image puts), > but it's not clear which one is the overall winner > for our typical workload. > > Jim would like to see some numbers in order to make a > decision. Aleph, could you publish the results of x11perf > at least? The Cairo perfomance suite would be even more > interesting to see. Made two runs with x11perf, at 16 and 24 bpp. Xserver from server-1.4-branch git tree, on a B4 laptop. x11perfcomp output attached, sorted by relative performance. Some operations are faster at 24bpp, other ones better at 16bpp. Looking at these data 16bpp seems a better choice: a greater number of ops fare better than at 24bpp, but one has to consider what are the operations Sugar does more. I'm going to do some Cairo testing... 1: x11perf-16.log 2: x11perf-24.log 1 2 Operation - - 53.7 18.2 ( 0.34) PutImage XY 100x100 square 5.01.7 ( 0.34) ShmPutImage XY 500x500 square 2.30.8 ( 0.35) PutImage XY 500x500 square 133.0 49.8 ( 0.37) ShmPutImage XY 100x100 square 2470.0 1000.0 ( 0.40) PutImage 100x100 square 492.0 202.0 ( 0.41) Scroll 500x500 pixels 3200.0 1420.0 ( 0.44) PutImage XY 10x10 square 799.0 360.0 ( 0.45) Copy 500x500 from pixmap to pixmap 78.0 35.8 ( 0.46) PutImage 500x500 square 1840.0 886.0 ( 0.48) 500x500 rectangle 805.0 390.0 ( 0.48) 500x500 tiled rectangle (161x145 tile) 858.0 414.0 ( 0.48) 500x500 tiled rectangle (216x208 tile) 773.0 369.0 ( 0.48) Copy 500x500 from pixmap to window 774.0 369.0 ( 0.48) Copy 500x500 from window to pixmap 827.0 400.0 ( 0.48) GetImage 100x100 square 193.0 91.8 ( 0.48) ShmPutImage 500x500 square 5120.0 2490.0 ( 0.49) ShmPutImage 100x100 square 17600.0 8800.0 ( 0.50) Copy 100x100 from pixmap to pixmap 746.0 372.0 ( 0.50) Copy 500x500 from window to window 1.60.8 ( 0.50) GetImage XY 500x500 square 58300.029700.0 ( 0.51) 100x100 rectangle 4080.0 2070.0 ( 0.51) 500x500 wide rectangle outline 17200.0 8760.0 ( 0.51) Copy 100x100 from pixmap to window 17100.0 8760.0 ( 0.51) Copy 100x100 from window to pixmap 17300.0 8750.0 ( 0.51) Copy 100x100 from window to window 39.4 20.2 ( 0.51) GetImage XY 100x100 square 18500.0 9470.0 ( 0.51) Scroll 100x100 pixels 27100.014000.0 ( 0.52) 100x100 tiled rectangle (216x208 tile) 25400.013400.0 ( 0.53) 100x100 tiled rectangle (161x145 tile) 32.1 17.4 ( 0.54) GetImage 500x500 square 130.0 71.7 ( 0.55) Copy 500x500 1-bit deep plane 793.0 435.0 ( 0.55) Fill 300x300 tiled trapezoid (216x208 tile) 5360.0 2930.0 ( 0.55) ShmPutImage XY 10x10 square 1600.0 893.0 ( 0.56) 500-pixel solid circle 2720.0 1610.0 ( 0.59) Copy 100x100 1-bit deep plane 93100.057000.0 ( 0.61) 100-pixel line 19600.011900.0 ( 0.61) 500-pixel line 16300.010300.0 ( 0.63) 500-pixel line segment 8970.0 5780.0 ( 0.64) 500-pixel circle 108.0 68.6 ( 0.64) 500x500 stippled rectangle (161x145 stipple) 9000.0 5800.0 ( 0.64) Char in 30-char aa core line (Charter 24) 242.0 154.0 ( 0.64) Fill 300x300 opaque stippled trapezoid (161x145 stipple) 77200.05.0 ( 0.65) 100-pixel line segment 107.0 69.1 ( 0.65) 500x500 opaque stippled rectangle (161x145 stipple) 755000.0 501000.0 ( 0.66) 10-pixel line 418.0 277.0 ( 0.66) 500x500 tiled rectangle (17x15 tile) 74400.049500.0 ( 0.67) 100-pixel line segment (1 kid) 2450.0 1630.0 ( 0.67) 100x100 opaque stippled rectangle (161x145 stipple) 5180.0 3480.0 ( 0.67) 500x500 rectangle outline 1680.0 1120.0 ( 0.67) GetImage XY 10x10 square 2230.0 1520.0 ( 0.68) 100x100 stippled rectangle (161x145 stipple) 13700.0 9340.0 ( 0.68) 100x100 tiled rectangle (17x15 tile) 10800.0 7330.0 ( 0.68) 500-pixel ellipse 7230.0 4920.0 ( 0.68) 500x50 wide vertical line segment 71400.049100.0 ( 0.69) 100-pixel line segment (2 kids) 2360.0 1620.0 ( 0.69) 500-pixel filled ellipse 2650.0 1830.0 ( 0.69) Fill 300x300 trapezoid 61300.042800.0 ( 0.70) 100-pixel double-dashed line 1490.0 1050.0 ( 0.70) Fill 100x100 opaque stippled trapezoid (161x145 stipple) 831.0 578.0 ( 0.70) Fill 300x300 tiled trapezoid (161x145 tile) 57300.040800.0 ( 0.71) 100-pixel double-dashed segment 69000.048900.0 ( 0.71) 100-pixel line segment (3 kids) 41400.029700.0 ( 0.72)
EXA performance
Aleph has recently been doing some performance measurements on the latest Xorg codebase, mostly concentrating on bench.py and other micro benchmarks. Performance of 16bpp VS 24bpp has been a hot topic recently. We know 24bpp to be much faster for some operations (my bench.py notably) and much slower for others (image puts), but it's not clear which one is the overall winner for our typical workload. Jim would like to see some numbers in order to make a decision. Aleph, could you publish the results of x11perf at least? The Cairo perfomance suite would be even more interesting to see. Chris also wanted to see timing of a real-world Sugar rendering... like switching between the 4 zoom levels. That may require some (perhaps trivial) Sugar surgery, so I'd leave it to one of the Python developers. Oprofile and/or sysprof output while executing all the above benchmarks would also provide invaluable information to find out *why* things are being slow. Aleph already has an environment in place with all the components rebuilt with debug symbols, so we can see the actual function calls in the output. Lastly, this thread may be generally interesting to whoever is seeking to improve EXA performance on the XO: http://thread.gmane.org/gmane.comp.freedesktop.xorg/20111/focus=20116 -- // Bernardo Innocenti \X/ http://www.codewiz.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel