William Cohen wrote:

William Cohen wrote:

Looked at where the processor spends its time when browsing the web.

Hardware configuration:

 OLPC Beta 2 machine
 Linksys USB200M USB 10/100 for ethernet connection
 4GB memorex Mini Travel Drive for storage of image


Software configuration:

 /tmp/olpc-redhat-stream-development-build-299-20070308_1417-devel_ext3.img
 kernel-2.6.21-20070309.olpc1p.dc5079fafb767e4
 oprofile-0.9.2-3.fc6



Re ran the experiment on build 301 and installed the xorg-x11-server-debuginfo-1.1.99.3-0.10.2.olpc1.i386.rpm on the olpc machine, so I could take a look at where time is being spent in libfb.so.

I don't know what version of gcc and options were used to compile the packages. If somebody points me where to look at this, I could be more sure. It looks to me that the packages were compiled without usage of tunnning gcc to geode. The div and mod insn are expensive in geode. Usage of div or shifts are choosen in gcc expmed.c and this is directed by costs defined by -mtune or -march.

I already did gcc tunning to geode (pipeline description, code costs, i386 port parameter values) and submitted it to the gcc mainline. As I know Jakub Julinek was going to backport this code to redhat gcc. So I can guess that if the right compiler and options are used, it will make code faster (and several % smaller because -mtune=geode generates smaller code that any other tuning).

I somebody need a help to speed up some (critical) code for OLPC by choosing right options (like usage of mmx insn and vectorization and other numerous possibilities), I could help too. Please let me know. If I have an OLPC machine, I can do it.


# opreport -t 1 -l /usr/bin/Xorg
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        image name               symbol name
6514     68.1096  libfb.so                 fbFetchTransformed
613       6.4095  libfb.so                 fbFetchPixel_x8r8g8b8
446 4.6633 libfb.so fbCompositeSolidMask_nx8x0565mmx
252       2.6349  libfb.so                 fbStore_r5g6b5
169       1.7670  libfb.so                 fbRasterizeEdges
137       1.4325  libfb.so                 fbCompositeSrc_8888x0565mmx
113       1.1815  libfb.so                 fbCopyAreammx
99        1.0351  libfb.so                 mmxCombineOverU

The attached file is a portion of the output from opannotate. There is a group of MOD operations that are taking a significant portion of the time. The first column is the number of samples and the second column is the percentage.

398 6.1099 : x1 = MOD (x1, pict->pDrawable->width); 383 5.8796 : x2 = MOD (x2, pict->pDrawable->width); 336 5.1581 : y1 = MOD (y1, pict->pDrawable->height); 355 5.4498 : y2 = MOD (y2, pict->pDrawable->height);

Following this there are also some other expensive operations to compute r. and put it into buffer[i].

-Will



_______________________________________________
Devel mailing list
[email protected]
http://mailman.laptop.org/mailman/listinfo/devel

Reply via email to