Re: EXA performance

2007-09-05 Thread Stefano Fedrigo
Bernardo Innocenti wrote:
> Aleph, could you post an oprofile of Sugar switching between zoom levels
> in both 16bpp and 24bpp?  Doing it manually by pressing F1 to F4 would be
> good enough for me: I just want to get an idea of where we spend the time.
> Of course, X, amd_drv, pixman and cairo need to be built with -g.

Here are the oprofile results while switching between the 4 Sugar zoom levels
for sixty seconds, with the journal displaying past activities at the 4th level.
Unfortunately the oprofile callgraph feature doesn't work on the Geode, but I 
hope
these data are of some use anyway.


CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
  TIMER:0|
  samples|  %|
--
 2936 47.6082 python
	  TIMER:0|
	  samples|  %|
	--
	 1543 52.5545 libpython2.5.so.1.0
	  594 20.2316 libcairo.so.2.11.5
	  276  9.4005 libc-2.6.so
	  116  3.9510 libgobject-2.0.so.0.1200.13
	   79  2.6907 libglib-2.0.so.0.1200.13
	   68  2.3161 libpangoft2-1.0.so.0.1600.4
	   62  2.1117 libpthread-2.6.so
	   41  1.3965 libpango-1.0.so.0.1600.4
	   31  1.0559 libgtk-x11-2.0.so.0.1000.14
	   24  0.8174 libgdk-x11-2.0.so.0.1000.14
	   15  0.5109 libfreetype.so.6.3.15
	   13  0.4428 libhippocanvas-1.so.0.0.0
	   12  0.4087 libX11.so.6.2.0
	   10  0.3406 _gobject.so
	9  0.3065 librsvg-2.so.2.16.1
	8  0.2725 libgthread-2.0.so.0.1200.13
	6  0.2044 libpangocairo-1.0.so.0.1600.4
	5  0.1703 anon (tgid:2962 range:0xb7fc7000-0xb7fc8000)
	5  0.1703 libm-2.6.so
	5  0.1703 libXrender.so.1.3.0
	5  0.1703 hippo.so
	3  0.1022 libdbus-1.so.3.2.0
	3  0.1022 _cairo.so
	1  0.0341 libwnck-1.so.18.2.10
	1  0.0341 libxml2.so.2.6.29
	1  0.0341 _gtk.so
 2645 42.8896 no-vmlinux
  454  7.3618 Xorg
	  TIMER:0|
	  samples|  %|
	--
	  202 44.4934 libpixman-1.so.0.9.5
	   84 18.5022 amd_drv.so
	   82 18.0617 Xorg
	   42  9.2511 libexa.so
	   29  6.3877 libc-2.6.so
	   12  2.6432 libfb.so
	2  0.4405 libextmod.so
	1  0.2203 anon (tgid:2942 range:0xb7f83000-0xb7f84000)

...

CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %image name   app name symbol name
2645 42.8896  no-vmlinux   no-vmlinux   (no symbols)
1543 25.0203  libpython2.5.so.1.0  python   (no symbols)
116   1.8810  libgobject-2.0.so.0.1200.13 python   (no symbols)
114   1.8485  libpixman-1.so.0.9.5 Xorg pixman_rasterize_edges
791.2810  libglib-2.0.so.0.1200.13 python   (no symbols)
681.1026  libpangoft2-1.0.so.0.1600.4 python   (no symbols)
560.9081  amd_drv.so   Xorg gp_color_bitmap_to_screen_blt
530.8594  libpixman-1.so.0.9.5 Xorg pixman_fill
470.7621  libcairo.so.2.11.5   python   fbRasterizeEdges
460.7459  libc-2.6.so  python   _int_malloc
460.7459  libc-2.6.so  python   memcpy
410.6648  libpango-1.0.so.0.1600.4 python   (no symbols)
390.6324  libcairo.so.2.11.5   python   _PointDistanceSquaredToSegment
350.5675  libc-2.6.so  python   __ctype_b_loc
340.5513  libc-2.6.so  bash __gconv_transform_utf8_internal
310.5027  libgtk-x11-2.0.so.0.1000.14 python   (no symbols)
260.4216  bash bash (no symbols)
250.4054  libcairo.so.2.11.5   python   _cairo_bentley_ottmann_tessellate_polygon
240.3892  libgdk-x11-2.0.so.0.1000.14 python   (no symbols)
230.3730  libc-2.6.so  bash mbrtowc
230.3730  libcairo.so.2.11.5   python   _cairo_bo_event_queue_insert_if_intersect_below_current_y
200.3243  libc-2.6.so  python   _int_free
200.3243  libc-2.6.so  python   msort_with_tmp
200.3243  libpthread-2.6.sopython   pthread_mutex_lock
190.3081  libcairo.so.2.11.5   python   __divdi3
180.2919  libc-2.6.so  python   free
180.2919  libcairo.so.2.11.5   python   _cairo_pixman_composite_solid_mask_nx8xmmx
180.2919  libcairo.so.2.11.5   python   _cairo_pixman_render_sample_floor_y
180.2919  libcairo.so.2.11.5   python   _cairo_uint64x64_128_mul
17

Re: EXA performance

2007-08-31 Thread Bernardo Innocenti
Stefano Fedrigo wrote:

> Here are results of two runs of the cairo's performance test suite, at 16 and
> 24 bpp.  Test hardware is a B4, xserver-1.4-branch, cairo 1.4.10.  This time
> 24 bpp is overall better.

That's invaluable data, thank you!

> Only cairo_paint() is slower at 24, but not cairo_paint_with_alpha(), which is
> actually a lot faster.

Someone told me that some paths in pixman actually convert 16bpp scanlines
to a temporary 24bpp buffer, then do their work, then convert back to 16bpp.
If true, this would explain the observed slowness at 16bpp.

Aleph, could you post an oprofile of Sugar switching between zoom levels
in both 16bpp and 24bpp?  Doing it manually by pressing F1 to F4 would be
good enough for me: I just want to get an idea of where we spend the time.
Of course, X, amd_drv, pixman and cairo need to be built with -g.

-- 
   // Bernardo Innocenti - One Laptop Per Child
 \X/  http://www.codewiz.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: EXA performance

2007-08-31 Thread Stefano Fedrigo
Jim Gettys wrote:
> I'm worried about the fact that Cairo likes to do things 32 bits
> and they then have to be converted and composited at 16 bits;

Here are results of two runs of the cairo's performance test suite, at 16 and
24 bpp.  Test hardware is a B4, xserver-1.4-branch, cairo 1.4.10.  This time
24 bpp is overall better.
Only cairo_paint() is slower at 24, but not cairo_paint_with_alpha(), which is
actually a lot faster.

I oprofiled the whole runs, but haven't found significant differences between
16 and 24 bpp.  Probably a profiling of the single tests could be more useful.

Output of cairo-perf-diff (nicer html format attached):

old: cairo-perf-16
new: cairo-perf-24
Speedups

 xlib-rgb  paint-with-alpha_similar_rgb_over-512  192.24 0.06% ->  47.68 0.16%: 
 4.04x speedup
███
 xlib-rgb  paint-with-alpha_similar_rgb_over-256   48.51 0.14% ->  12.39 0.56%: 
 3.95x speedup
███
 xlib-rgb  paint-with-alpha_image_rgb_over-256   63.25 0.26% ->  29.27 0.51%:  
2.17x speedup
█▏
 xlib-rgb  paint-with-alpha_image_rgb_over-512  250.25 1.01% -> 122.63 2.40%:  
2.09x speedup
█▏
 xlib-rgb  paint-with-alpha_similar_rgba_over-512  180.26 0.08% ->  86.75 
0.10%:  2.08x speedup
█▏
 xlib-rgb  paint-with-alpha_similar_rgba_over-256   45.66 0.13% ->  22.17 
0.34%:  2.07x speedup
█▏
 xlib-rgb   fill_similar_rgb_over-256   16.43 0.52% ->   9.66 0.95%:  1.71x 
speedup
▊
 xlib-rgb  paint-with-alpha_image_rgba_over-256   62.87 0.34% ->  39.25 0.48%:  
1.61x speedup
▋
 xlib-rgb  paint-with-alpha_image_rgba_over-512  248.95 0.32% -> 159.74 0.45%:  
1.56x speedup
▋
 xlib-rgb stroke_similar_rgb_over-256   37.00 0.26% ->  25.80 0.40%:  1.44x 
speedup
▌
 xlib-rgb   fill_similar_rgb_over-1285.71 1.08% ->   4.05 1.88%:  1.42x 
speedup
▍
 xlib-rgb  fill_similar_rgba_over-256   17.66 0.52% ->  12.71 0.61%:  1.40x 
speedup
▍
 xlib-rgb  paint-with-alpha_linear_rgb_over-256   88.94 0.21% ->  65.39 0.27%:  
1.36x speedup
▍
 xlib-rgb  paint-with-alpha_linear_rgba_over-256   88.53 0.24% ->  64.95 0.26%: 
 1.36x speedup
▍
 xlib-rgb  paint-with-alpha_linear_rgb_over-512  354.73 0.19% -> 265.89 0.47%:  
1.34x speedup
▍
 xlib-rgb  paint-with-alpha_linear_rgba_over-512  352.53 0.24% -> 264.01 0.46%: 
 1.34x speedup
▍
 xlib-rgb fill_image_rgb_over-256   24.06 0.52% ->  18.41 0.62%:  1.31x 
speedup
▍
 xlib-rgbstroke_similar_rgba_over-256   37.82 0.26% ->  29.30 0.33%:  1.30x 
speedup
▎
 xlib-rgb  fill_similar_rgba_over-1286.06 0.50% ->   4.82 1.58%:  1.29x 
speedup
▎
 xlib-rgbfill_image_rgba_over-256   26.38 0.46% ->  21.45 0.44%:  1.23x 
speedup
▎
 xlib-rgb stroke_similar_rgb_over-128   15.97 0.47% ->  13.26 0.59%:  1.21x 
speedup
▎
 xlib-rgb   stroke_image_rgb_over-256   49.07 0.24% ->  41.01 0.32%:  1.20x 
speedup
▎
 xlib-rgbfill_linear_rgb_over-256   35.22 0.31% ->  29.54 0.37%:  1.19x 
speedup
▎
 xlib-rgb  paint-with-alpha_radial_rgb_over-256  156.75 0.35% -> 133.54 0.14%:  
1.17x speedup
▏
 xlib-rgb  paint-with-alpha_radial_rgb_over-512  627.70 0.15% -> 539.14 0.23%:  
1.17x speedup
▏
 xlib-rgb   fill_similar_rgb_over-64 3.26 2.03% ->   2.80 2.55%:  1.16x 
speedup
▏
 xlib-rgb  paint-with-alpha_radial_rgba_over-256  152.81 0.36% -> 132.55 0.78%: 
 1.16x speedup
▏
 xlib-rgb  stroke_image_rgba_over-256   51.60 0.22% ->  44.51 0.35%:  1.16x 
speedup
▏
 xlib-rgbstroke_similar_rgba_over-128   16.21 0.49% ->  14.10 0.46%:  1.15x 
speedup
▏
 xlib-rgb fill_image_rgb_over-1288.01 0.79% ->   6.92 0.62%:  1.15x 
speedup
▏
 xlib-rgb  stroke_linear_rgb_over-256   67.40 0.22% ->  58.75 0.28%:  1.15x 
speedup
▏
 xlib-rgb  paint-with-alpha_radial_rgba_over-512  611.50 0.18% -> 536.93 0.32%: 
 1.14x speedup
▏
 xlib-rgb fill_similar_rgb_source-256   31.79 0.26% ->  27.98 0.22%:  1.14x 
speedup
▏
 xlib-rgb  paint-with-alpha_similar_rgb_source-512  306.70 0.07% -> 270.02 
0.08%:  1.14x speedup
▏
 xlib-rgb  paint-with-alpha_similar_rgb_source-256   77.35 0.12% ->  68.40 
0.19%:  1.13x speedup
▏
 xlib-rgbfill_linear_rgb_over-128   10.86 0.73% ->   9.69 0.84%:  1.12x 
speedup
▏
 xlib-rgb   stroke_similar_rgb_source-256   62.25 0.14% ->  55.66 0.20%:  1.12x 
speedup
▏
 xlib-rgbfill_image_rgba_over-1288.54 0.66% ->   7.66 0.62%:  1.12x 
speedup
▏
 xlib-rgb   fill_linear_rgba_over-256   40.03 0.27% ->  35.93 0.31%:  1.12x 
speedup
▏
 xlib-rgb   stroke_image_rgb_over-128   19.33 0.44% ->  17.38 0.53%:  1.11x 
speedup
▏
 xlib-rgb  fill_similar_rgba_over-64 3.32 2.15% ->   2.99 2.47%:  1.11x 
speedup
▏
 xlib-rgb  paint-with-alpha_solid_rgba_source-512  157.75 0.14% -> 142.45 
0.13%:  1.11x speedup
▏
 xlib-rgb  paint-with-alpha_solid_rgb_source-512  157.67 0.13% -> 142.39 0.13%: 
 1.11x speedup
▏
 xlib-rgb  paint-with-alpha_solid_rgba_source-256   39.69 0.24% ->  35.98 
0.29%:  1.10x speedup
▏
 xlib-rgbfill_similar_rgba_source-256   30.51 0.28% ->  27.70 0.25%:  1.10x 
speedup
▏
 xlib-rgb fill_similar_rgb_source-1289

Re: EXA performance

2007-08-30 Thread Jim Gettys
It's not clear to me that micro-benchmarks are going to tell us much at
all: I'm worried about the fact that Cairo likes to do things 32 bits
and they then have to be converted and composited at 16 bits; the
microbenchmarks won't tell us much about how common these conversion
costs are.

I think it is time for a tinderbox run with oprofile.
- Jim


On Thu, 2007-08-30 at 11:03 -0600, Jordan Crouse wrote:
> On 30/08/07 12:48 +0200, Stefano Fedrigo wrote:
> > Bernardo Innocenti wrote:
> > > Performance of 16bpp VS 24bpp has been a hot topic recently.
> > > We know 24bpp to be much faster for some operations (my
> > > bench.py notably) and much slower for others (image puts),
> > > but it's not clear which one is the overall winner
> > > for our typical workload.
> > > 
> > > Jim would like to see some numbers in order to make a
> > > decision.  Aleph, could you publish the results of x11perf
> > > at least?  The Cairo perfomance suite would be even more
> > > interesting to see.
> > 
> > Made two runs with x11perf, at 16 and 24 bpp.  Xserver from
> > server-1.4-branch git tree, on a B4 laptop.
> > x11perfcomp output attached, sorted by relative performance.
> > 
> > Some operations are faster at 24bpp, other ones better at 16bpp.
> > Looking at these data 16bpp seems a better choice: a greater number of
> > ops fare better than at 24bpp, but one has to consider what are the
> > operations Sugar does more.
> > I'm going to do some Cairo testing...
> 
> Looking at these numbers, I get the impression that we might be CPU bound
> in more places then we thought - we're moving twice the bytes in 24bpp mode,
> and the lowest performing tests (which all move a bunch of bytes),
> seem to generally reflect half the performance.  Solid fills, which require
> no byte moving at all are coming in at 1:1i and thats expected, since the
> hardware will move just as fast for 16bpp and 24bpp, and the alpha fills are 
> coming at twice the performance because we don't have the additional
> conversion hit in the driver composite code. 
> 
> If thats true, then yes, 16bpp will be better, because Sugar is all about
> pixmaps (few solid fills).  But massive profiling is needed to verify all
> this.
> 
> Jordan
> 
> > 
> > 1: x11perf-16.log
> > 2: x11perf-24.log
> > 1  2   Operation
> >    -   -
> > 53.7   18.2 (  0.34)   PutImage XY 100x100 square 
> >  5.01.7 (  0.34)   ShmPutImage XY 500x500 square 
> >  2.30.8 (  0.35)   PutImage XY 500x500 square 
> >133.0   49.8 (  0.37)   ShmPutImage XY 100x100 square 
> >   2470.0 1000.0 (  0.40)   PutImage 100x100 square 
> >492.0  202.0 (  0.41)   Scroll 500x500 pixels 
> >   3200.0 1420.0 (  0.44)   PutImage XY 10x10 square 
> >799.0  360.0 (  0.45)   Copy 500x500 from pixmap to pixmap 
> > 78.0   35.8 (  0.46)   PutImage 500x500 square 
> >   1840.0  886.0 (  0.48)   500x500 rectangle 
> >805.0  390.0 (  0.48)   500x500 tiled rectangle (161x145 tile) 
> >858.0  414.0 (  0.48)   500x500 tiled rectangle (216x208 tile) 
> >773.0  369.0 (  0.48)   Copy 500x500 from pixmap to window 
> >774.0  369.0 (  0.48)   Copy 500x500 from window to pixmap 
> >827.0  400.0 (  0.48)   GetImage 100x100 square 
> >193.0   91.8 (  0.48)   ShmPutImage 500x500 square 
> >   5120.0 2490.0 (  0.49)   ShmPutImage 100x100 square 
> >  17600.0 8800.0 (  0.50)   Copy 100x100 from pixmap to pixmap 
> >746.0  372.0 (  0.50)   Copy 500x500 from window to window 
> >  1.60.8 (  0.50)   GetImage XY 500x500 square 
> >  58300.029700.0 (  0.51)   100x100 rectangle 
> >   4080.0 2070.0 (  0.51)   500x500 wide rectangle outline 
> >  17200.0 8760.0 (  0.51)   Copy 100x100 from pixmap to window 
> >  17100.0 8760.0 (  0.51)   Copy 100x100 from window to pixmap 
> >  17300.0 8750.0 (  0.51)   Copy 100x100 from window to window 
> > 39.4   20.2 (  0.51)   GetImage XY 100x100 square 
> >  18500.0 9470.0 (  0.51)   Scroll 100x100 pixels 
> >  27100.014000.0 (  0.52)   100x100 tiled rectangle (216x208 tile) 
> >  25400.013400.0 (  0.53)   100x100 tiled rectangle (161x145 tile) 
> > 32.1   17.4 (  0.54)   GetImage 500x500 square 
> >130.0   71.7 (  0.55)   Copy 500x500 1-bit deep plane 
> >793.0  435.0 (  0.55)   Fill 300x300 tiled trapezoid (216x208 tile) 
> >   5360.0 2930.0 (  0.55)   ShmPutImage XY 10x10 square 
> >   1600.0  893.0 (  0.56)   500-pixel solid circle 
> >   2720.0 1610.0 (  0.59)   Copy 100x100 1-bit deep plane 
> >  93100.057000.0 (  0.61)   100-pixel line 
> >  19600.011900.0 (  0.61)   500-pixel line 
> >  16300.010300.0 (  0.63)   500-pixel line segment 
> >   8970.0 5780.0 (  0.64)   500-pixel circle 
> >108.0   68.6 (  0.64)   500x500 stippled rectangle (161x145 stip

Re: EXA performance

2007-08-30 Thread Jordan Crouse
On 30/08/07 12:48 +0200, Stefano Fedrigo wrote:
> Bernardo Innocenti wrote:
> > Performance of 16bpp VS 24bpp has been a hot topic recently.
> > We know 24bpp to be much faster for some operations (my
> > bench.py notably) and much slower for others (image puts),
> > but it's not clear which one is the overall winner
> > for our typical workload.
> > 
> > Jim would like to see some numbers in order to make a
> > decision.  Aleph, could you publish the results of x11perf
> > at least?  The Cairo perfomance suite would be even more
> > interesting to see.
> 
> Made two runs with x11perf, at 16 and 24 bpp.  Xserver from
> server-1.4-branch git tree, on a B4 laptop.
> x11perfcomp output attached, sorted by relative performance.
> 
> Some operations are faster at 24bpp, other ones better at 16bpp.
> Looking at these data 16bpp seems a better choice: a greater number of
> ops fare better than at 24bpp, but one has to consider what are the
> operations Sugar does more.
> I'm going to do some Cairo testing...

Looking at these numbers, I get the impression that we might be CPU bound
in more places then we thought - we're moving twice the bytes in 24bpp mode,
and the lowest performing tests (which all move a bunch of bytes),
seem to generally reflect half the performance.  Solid fills, which require
no byte moving at all are coming in at 1:1i and thats expected, since the
hardware will move just as fast for 16bpp and 24bpp, and the alpha fills are 
coming at twice the performance because we don't have the additional
conversion hit in the driver composite code. 

If thats true, then yes, 16bpp will be better, because Sugar is all about
pixmaps (few solid fills).  But massive profiling is needed to verify all
this.

Jordan

> 
> 1: x11perf-16.log
> 2: x11perf-24.log
> 1  2   Operation
>    -   -
> 53.7   18.2 (  0.34)   PutImage XY 100x100 square 
>  5.01.7 (  0.34)   ShmPutImage XY 500x500 square 
>  2.30.8 (  0.35)   PutImage XY 500x500 square 
>133.0   49.8 (  0.37)   ShmPutImage XY 100x100 square 
>   2470.0 1000.0 (  0.40)   PutImage 100x100 square 
>492.0  202.0 (  0.41)   Scroll 500x500 pixels 
>   3200.0 1420.0 (  0.44)   PutImage XY 10x10 square 
>799.0  360.0 (  0.45)   Copy 500x500 from pixmap to pixmap 
> 78.0   35.8 (  0.46)   PutImage 500x500 square 
>   1840.0  886.0 (  0.48)   500x500 rectangle 
>805.0  390.0 (  0.48)   500x500 tiled rectangle (161x145 tile) 
>858.0  414.0 (  0.48)   500x500 tiled rectangle (216x208 tile) 
>773.0  369.0 (  0.48)   Copy 500x500 from pixmap to window 
>774.0  369.0 (  0.48)   Copy 500x500 from window to pixmap 
>827.0  400.0 (  0.48)   GetImage 100x100 square 
>193.0   91.8 (  0.48)   ShmPutImage 500x500 square 
>   5120.0 2490.0 (  0.49)   ShmPutImage 100x100 square 
>  17600.0 8800.0 (  0.50)   Copy 100x100 from pixmap to pixmap 
>746.0  372.0 (  0.50)   Copy 500x500 from window to window 
>  1.60.8 (  0.50)   GetImage XY 500x500 square 
>  58300.029700.0 (  0.51)   100x100 rectangle 
>   4080.0 2070.0 (  0.51)   500x500 wide rectangle outline 
>  17200.0 8760.0 (  0.51)   Copy 100x100 from pixmap to window 
>  17100.0 8760.0 (  0.51)   Copy 100x100 from window to pixmap 
>  17300.0 8750.0 (  0.51)   Copy 100x100 from window to window 
> 39.4   20.2 (  0.51)   GetImage XY 100x100 square 
>  18500.0 9470.0 (  0.51)   Scroll 100x100 pixels 
>  27100.014000.0 (  0.52)   100x100 tiled rectangle (216x208 tile) 
>  25400.013400.0 (  0.53)   100x100 tiled rectangle (161x145 tile) 
> 32.1   17.4 (  0.54)   GetImage 500x500 square 
>130.0   71.7 (  0.55)   Copy 500x500 1-bit deep plane 
>793.0  435.0 (  0.55)   Fill 300x300 tiled trapezoid (216x208 tile) 
>   5360.0 2930.0 (  0.55)   ShmPutImage XY 10x10 square 
>   1600.0  893.0 (  0.56)   500-pixel solid circle 
>   2720.0 1610.0 (  0.59)   Copy 100x100 1-bit deep plane 
>  93100.057000.0 (  0.61)   100-pixel line 
>  19600.011900.0 (  0.61)   500-pixel line 
>  16300.010300.0 (  0.63)   500-pixel line segment 
>   8970.0 5780.0 (  0.64)   500-pixel circle 
>108.0   68.6 (  0.64)   500x500 stippled rectangle (161x145 stipple) 
>   9000.0 5800.0 (  0.64)   Char in 30-char aa core line (Charter 24) 
>242.0  154.0 (  0.64)   Fill 300x300 opaque stippled trapezoid 
> (161x145 stipple) 
>  77200.05.0 (  0.65)   100-pixel line segment 
>107.0   69.1 (  0.65)   500x500 opaque stippled rectangle (161x145 
> stipple) 
> 755000.0   501000.0 (  0.66)   10-pixel line 
>418.0  277.0 (  0.66)   500x500 tiled rectangle (17x15 tile) 
>  74400.049500.0 (  0.67)   100-pixel line segment (1 kid) 
>   2450.0 1630.0 (  0.67)   100x100 opaque stippled rectangle (161x145 
> stipple) 
>   5180.0

Re: EXA performance

2007-08-30 Thread Arjun Sarwal
>
> I'm going to do some Cairo testing...
>

I look forward to that. Maybe some hacks would help me improve Measure
activity's (which uses Cairo for all its drawing) response time too ?

http://pastebin.com/m4db7bc4e
is where all drawing is done.

regards,
Arjun


-- 
Arjun Sarwal ( [EMAIL PROTECTED] )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: EXA performance

2007-08-30 Thread Stefano Fedrigo
Bernardo Innocenti wrote:
> Performance of 16bpp VS 24bpp has been a hot topic recently.
> We know 24bpp to be much faster for some operations (my
> bench.py notably) and much slower for others (image puts),
> but it's not clear which one is the overall winner
> for our typical workload.
> 
> Jim would like to see some numbers in order to make a
> decision.  Aleph, could you publish the results of x11perf
> at least?  The Cairo perfomance suite would be even more
> interesting to see.

Made two runs with x11perf, at 16 and 24 bpp.  Xserver from
server-1.4-branch git tree, on a B4 laptop.
x11perfcomp output attached, sorted by relative performance.

Some operations are faster at 24bpp, other ones better at 16bpp.
Looking at these data 16bpp seems a better choice: a greater number of
ops fare better than at 24bpp, but one has to consider what are the
operations Sugar does more.
I'm going to do some Cairo testing...

1: x11perf-16.log
2: x11perf-24.log
1  2   Operation
   -   -
53.7   18.2 (  0.34)   PutImage XY 100x100 square 
 5.01.7 (  0.34)   ShmPutImage XY 500x500 square 
 2.30.8 (  0.35)   PutImage XY 500x500 square 
   133.0   49.8 (  0.37)   ShmPutImage XY 100x100 square 
  2470.0 1000.0 (  0.40)   PutImage 100x100 square 
   492.0  202.0 (  0.41)   Scroll 500x500 pixels 
  3200.0 1420.0 (  0.44)   PutImage XY 10x10 square 
   799.0  360.0 (  0.45)   Copy 500x500 from pixmap to pixmap 
78.0   35.8 (  0.46)   PutImage 500x500 square 
  1840.0  886.0 (  0.48)   500x500 rectangle 
   805.0  390.0 (  0.48)   500x500 tiled rectangle (161x145 tile) 
   858.0  414.0 (  0.48)   500x500 tiled rectangle (216x208 tile) 
   773.0  369.0 (  0.48)   Copy 500x500 from pixmap to window 
   774.0  369.0 (  0.48)   Copy 500x500 from window to pixmap 
   827.0  400.0 (  0.48)   GetImage 100x100 square 
   193.0   91.8 (  0.48)   ShmPutImage 500x500 square 
  5120.0 2490.0 (  0.49)   ShmPutImage 100x100 square 
 17600.0 8800.0 (  0.50)   Copy 100x100 from pixmap to pixmap 
   746.0  372.0 (  0.50)   Copy 500x500 from window to window 
 1.60.8 (  0.50)   GetImage XY 500x500 square 
 58300.029700.0 (  0.51)   100x100 rectangle 
  4080.0 2070.0 (  0.51)   500x500 wide rectangle outline 
 17200.0 8760.0 (  0.51)   Copy 100x100 from pixmap to window 
 17100.0 8760.0 (  0.51)   Copy 100x100 from window to pixmap 
 17300.0 8750.0 (  0.51)   Copy 100x100 from window to window 
39.4   20.2 (  0.51)   GetImage XY 100x100 square 
 18500.0 9470.0 (  0.51)   Scroll 100x100 pixels 
 27100.014000.0 (  0.52)   100x100 tiled rectangle (216x208 tile) 
 25400.013400.0 (  0.53)   100x100 tiled rectangle (161x145 tile) 
32.1   17.4 (  0.54)   GetImage 500x500 square 
   130.0   71.7 (  0.55)   Copy 500x500 1-bit deep plane 
   793.0  435.0 (  0.55)   Fill 300x300 tiled trapezoid (216x208 tile) 
  5360.0 2930.0 (  0.55)   ShmPutImage XY 10x10 square 
  1600.0  893.0 (  0.56)   500-pixel solid circle 
  2720.0 1610.0 (  0.59)   Copy 100x100 1-bit deep plane 
 93100.057000.0 (  0.61)   100-pixel line 
 19600.011900.0 (  0.61)   500-pixel line 
 16300.010300.0 (  0.63)   500-pixel line segment 
  8970.0 5780.0 (  0.64)   500-pixel circle 
   108.0   68.6 (  0.64)   500x500 stippled rectangle (161x145 stipple) 
  9000.0 5800.0 (  0.64)   Char in 30-char aa core line (Charter 24) 
   242.0  154.0 (  0.64)   Fill 300x300 opaque stippled trapezoid (161x145 
stipple) 
 77200.05.0 (  0.65)   100-pixel line segment 
   107.0   69.1 (  0.65)   500x500 opaque stippled rectangle (161x145 
stipple) 
755000.0   501000.0 (  0.66)   10-pixel line 
   418.0  277.0 (  0.66)   500x500 tiled rectangle (17x15 tile) 
 74400.049500.0 (  0.67)   100-pixel line segment (1 kid) 
  2450.0 1630.0 (  0.67)   100x100 opaque stippled rectangle (161x145 
stipple) 
  5180.0 3480.0 (  0.67)   500x500 rectangle outline 
  1680.0 1120.0 (  0.67)   GetImage XY 10x10 square 
  2230.0 1520.0 (  0.68)   100x100 stippled rectangle (161x145 stipple) 
 13700.0 9340.0 (  0.68)   100x100 tiled rectangle (17x15 tile) 
 10800.0 7330.0 (  0.68)   500-pixel ellipse 
  7230.0 4920.0 (  0.68)   500x50 wide vertical line segment 
 71400.049100.0 (  0.69)   100-pixel line segment (2 kids) 
  2360.0 1620.0 (  0.69)   500-pixel filled ellipse 
  2650.0 1830.0 (  0.69)   Fill 300x300 trapezoid 
 61300.042800.0 (  0.70)   100-pixel double-dashed line 
  1490.0 1050.0 (  0.70)   Fill 100x100 opaque stippled trapezoid (161x145 
stipple) 
   831.0  578.0 (  0.70)   Fill 300x300 tiled trapezoid (161x145 tile) 
 57300.040800.0 (  0.71)   100-pixel double-dashed segment 
 69000.048900.0 (  0.71)   100-pixel line segment (3 kids) 
 41400.029700.0 (  0.72)  

EXA performance

2007-08-28 Thread Bernardo Innocenti
Aleph has recently been doing some performance measurements
on the latest Xorg codebase, mostly concentrating on
bench.py and other micro benchmarks.

Performance of 16bpp VS 24bpp has been a hot topic recently.
We know 24bpp to be much faster for some operations (my
bench.py notably) and much slower for others (image puts),
but it's not clear which one is the overall winner
for our typical workload.

Jim would like to see some numbers in order to make a
decision.  Aleph, could you publish the results of x11perf
at least?  The Cairo perfomance suite would be even more
interesting to see.

Chris also wanted to see timing of a real-world Sugar
rendering... like switching between the 4 zoom levels.
That may require some (perhaps trivial) Sugar surgery, so
I'd leave it to one of the Python developers.

Oprofile and/or sysprof output while executing all the above
benchmarks would also provide invaluable information to find
out *why* things are being slow.  Aleph already has an
environment in place with all the components rebuilt
with debug symbols, so we can see the actual function calls
in the output.

Lastly, this thread may be generally interesting to
whoever is seeking to improve EXA performance on the XO:

 http://thread.gmane.org/gmane.comp.freedesktop.xorg/20111/focus=20116


-- 
  // Bernardo Innocenti
\X/  http://www.codewiz.org/

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel