William Cohen wrote:
William Cohen wrote:
Looked at where the processor spends its time when browsing the web.
Hardware configuration:
OLPC Beta 2 machine
Linksys USB200M USB 10/100 for ethernet connection
4GB memorex Mini Travel Drive for storage of image
Software configuration:
/tmp/olpc-redhat-stream-development-build-299-20070308_1417-devel_ext3.img
kernel-2.6.21-20070309.olpc1p.dc5079fafb767e4
oprofile-0.9.2-3.fc6
Re ran the experiment on build 301 and installed the
xorg-x11-server-debuginfo-1.1.99.3-0.10.2.olpc1.i386.rpm on the olpc
machine, so I could take a look at where time is being spent in libfb.so.
I don't know what version of gcc and options were used to compile the
packages. If somebody points me where to look at this, I could be more
sure. It looks to me that the packages were compiled without usage of
tunnning gcc to geode. The div and mod insn are expensive in geode.
Usage of div or shifts are choosen in gcc expmed.c and this is directed
by costs defined by -mtune or -march.
I already did gcc tunning to geode (pipeline description, code costs,
i386 port parameter values) and submitted it to the gcc mainline. As I
know Jakub Julinek was going to backport this code to redhat gcc. So I
can guess that if the right compiler and options are used, it will make
code faster (and several % smaller because -mtune=geode generates
smaller code that any other tuning).
I somebody need a help to speed up some (critical) code for OLPC by
choosing right options (like usage of mmx insn and vectorization and
other numerous possibilities), I could help too. Please let me know.
If I have an OLPC machine, I can do it.
# opreport -t 1 -l /usr/bin/Xorg
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % image name symbol name
6514 68.1096 libfb.so fbFetchTransformed
613 6.4095 libfb.so fbFetchPixel_x8r8g8b8
446 4.6633 libfb.so
fbCompositeSolidMask_nx8x0565mmx
252 2.6349 libfb.so fbStore_r5g6b5
169 1.7670 libfb.so fbRasterizeEdges
137 1.4325 libfb.so fbCompositeSrc_8888x0565mmx
113 1.1815 libfb.so fbCopyAreammx
99 1.0351 libfb.so mmxCombineOverU
The attached file is a portion of the output from opannotate. There is
a group of MOD operations that are taking a significant portion of the
time. The first column is the number of samples and the second column
is the percentage.
398 6.1099 : x1 = MOD (x1,
pict->pDrawable->width);
383 5.8796 : x2 = MOD (x2,
pict->pDrawable->width);
336 5.1581 : y1 = MOD (y1,
pict->pDrawable->height);
355 5.4498 : y2 = MOD (y2,
pict->pDrawable->height);
Following this there are also some other expensive operations to
compute r. and put it into buffer[i].
-Will
_______________________________________________
Devel mailing list
[email protected]
http://mailman.laptop.org/mailman/listinfo/devel