On May 13, 2009, at 6:58 PM, wren ng thornton wrote:

Jan-Willem Maessen wrote:
I wanted to clear up one misconception here...
wren ng thornton wrote:
> In heavily GCed languages like Haskell allocation and collection is > cheap, so we don't mind too much; but in Java and the like, both > allocation and collection are expensive so the idea of cheap throwaway > objects is foreign.
Not true!

I was speaking of Java, not Clojure. I believe the costs in Java are well documented, though I don't know enough about the JVM to know where the blame belongs. (All I know of Clojure is that it's a Lisp- like on the JVM :)

I think you're missing the point here: the code I refer to below *is in Java* and is running on a standard JVM; the "costs" you refer to simply don't exist! As Vladimir Ivanov points out, and as Rich Hickey is happy to observe in his talks on Clojure, the JVM handles allocation-intensive garbage-intensive programs very well.

If you look at the internals of Clojure, you'll discover they're using trees with *very* wide fanout (eg fanout-64 leaf trees for lists). Why? Because it's so cheap to allocate and GC these structures! By using shallow-but-wide trees we reduce the cost of indexing and accessing list elements. I suspect you'd still be hard-pressed to support this kind of allocation behavior in any of the present Haskell implementations, and Haskell implementations of the same kinds of structures have limited fanout to 2-4 elements or so.

I was under the impression that the reason datastructures in Haskell tend to be limited to 4-fanout had more to do with the cleanliness of the implementations--- pattern matching on 64-wide cells is quite ugly, as is dealing with the proliferation of corner cases for complex structures like finger trees, patricia trees, etc. The use of view patterns could clean this up significantly. On the other hand, we do have things like lazy ByteStrings and UVector which do have wide fanouts.

Hmm, I think neither of the data structures you name actually support both O(lg n) indexing and O(lg n) cons or append. That said, your point is well taken, so let's instead state it as a challenge:

Can you, oh Haskellers, implement a fast, wide-fanout (say >= 8) tree- based sequence implementation in Haskell, which supports at-least-log- time indexing and at-least-log-time cons with a large base for the logarithm? Can you do it without turning off array bounds checking (either by using unsafe operations or low-level peeking and poking) and without using an algebraic data type with O(f) constructors for fanout of f? You can turn off bounds checks if your program encodes static guarantees that indices cannot be out of bounds (there are a couple of libraries to do this).

The spirit here is "Work in Haskell with safe operations and no FFI except through safe libraries, but otherwise use any extensions you like."

I actually think this *is* doable, but it touches a few areas where Haskell doesn't presently do well (the bounds checking in particular is a challenge). I threw in the bounds checking when I realized that in fact the equivalent Java code is always bounds checked, and these bounds checks are then optimized away where possible. Actually, I'd *love* to see an *in*efficient solution to eliminating as many bounds checks as possible!

-Jan



--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to