Re: [Haskell] Programming language shootout (completing the Haskell entry)

Jan-Willem Maessen - Sun Labs East Tue, 30 Mar 2004 06:49:27 -0800

As another data point, when I was writing the run-time system both for pH (on an 8-way Sun) and later for Eager Haskell (on x86) I experimented with simple prefetching of various sorts. Two things seemed to be moderately effective:

* The heap was organized into "chunks" which were parceled out among running threads. By writing into the pages of a chunk when it is allocated, the necessary TLB entries can be pulled in early. This seemed to be a universal win on both architectures. On a system with full software TLB miss handling it would be a wash, I expect.

* Writing a line or two ahead in the heap. The idea here is to get the appropriate lines into the cache (and dirty) before they actually saw heavy use. Alas, this can cause fenceposting problems when you get near the end of a chunk (it's not safe to write past a chunk boundary). A processor which buffers writes doesn't benefit enormously, either; we can buffer the last cache line in the heap until it has been completely written, and never fetch from memory at all. There were no big wins here in my experience.

One has to be careful about the semantics of prefetch instructions, by the way. At least on SPARC they probably don't do what you want; if my memory serves me correctly, there is a separate prefetch cache that's connected to the floating-point load/store unit, and what you actually want for most Haskell-ish code would be a non-blocking load instruction instead. On x86 the prefetch instructions are part of the SIMD floating-point instruction set, and might (or might not) have similar caveats.

The prefetch builtins for gcc are a pretty recent innovation, by the way. I would have loved to have them back when I was experimenting. Of course, you'd want to make sure you got an appropriate instruction out of the compiler.

Andrew Cheadle wrote:

Personally I think looking at 'cache-concious' layout of specific data
structures within the allocator and depth first vs breadth first copying
within the collector provide more realistic optimisation opportunities. I
believe there's been a fair amount of work on this, although specific papers
elude me at the mo.

I would agree with this---though the conclusion from the GC literature seems to be that no particular copying strategy is best for all applications.

-Jan-Willem Maessen


_______________________________________________
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] Programming language shootout (completing the Haskell entry)

Reply via email to