Hi everybody,
thanks for all the help. I think the problem I am seeing is the lack
of page coloring. I will try Joseph Martin's kernel patch asap - we
are very interested in making efficient use of the L2 cache as it is
so big (4 MB on some of our machines).
In particular, page coloring should be a very good idea for cluster
nodes where we do not care about the actual performance of the kernel
page allocator (just running one process a long time in a fixed page
setup), but the penalties for cache misses are very high. We easily
see a factor of three in MFlops numbers between L1 cache and memory.
BTW, we use the Compaq compiler which gives about 20% more MFlops than
the gnu compiler in L1 cache.
Thanks again
-Chris
--
Christoph Best [EMAIL PROTECTED]
John von Neumann Institute for Computing/DESY http://www.oche.de/~cbest