Hash Consing
Dear all, I am coding a (very) small hash consing library for clojure. For those we don't happen to know what hash consing is, it is a way of allowing equal data structures to be shared in memory. This leverages the purity (as in immutability) of data structures to reduce memory footprint (no duplication of data for just a constant overhead per allocation) and makes hashing and equality tests faster. (You can use java non overloaded hashCode, and == to test equality, because of the sharing. So both operations are O(1), with a very small constant, whatever is the complexity of the hash consed data structures.) This can then be used, in combination with referential transparency to make memoized function faster. I have a java file and a clojure interface that seem to work, at least on my example. I plan to put something somewhere someday, but before spending too much time in making this releasable, i Have a few questions: - does something already exists in contrib that I have missed? - is someone else working on that? - are there any features that I need and that you must implement that you think of? My current plans are: - using a java concurrent hash map to soft referenced objects for the hash consing table. - only two fields in an hash consed object: the value it represents, and a generic field called cached_data used to store anything you want to memoize about this data structure. (Keeping this cached data a bit longer is the reason I plan to use soft references and not weak references) I am not a clojure expert and I am not a java coder at all, so don't hesitate to tell me if my plans are somehow wrong. Bets regards, Nicolas. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Hash Consing
On Jun 29, 3:09 pm, Nicolas Oury nicolas.o...@gmail.com wrote: Dear all, I am coding a (very) small hash consing library for clojure. For those we don't happen to know what hash consing is, it is a way of allowing equal data structures to be shared in memory. This leverages the purity (as in immutability) of data structures to reduce memory footprint (no duplication of data for just a constant overhead per allocation) and makes hashing and equality tests faster. (You can use java non overloaded hashCode, and == to test equality, because of the sharing. So both operations are O(1), with a very small constant, whatever is the complexity of the hash consed data structures.) I don't know anything about hash consing. Based on my limited understanding of the description I am just wondering if this is different from structural sharing that Clojure collections have. Quoting the docs[1]: In particular, the Clojure collections support efficient creation of 'modified' versions, by utilizing structural sharing, and make all of their performance bound guarantees for persistent use. Do you have something different in mind with hash consing? Regards, Parth [1] http://clojure.org/data_structures#toc12 This can then be used, in combination with referential transparency to make memoized function faster. I have a java file and a clojure interface that seem to work, at least on my example. I plan to put something somewhere someday, but before spending too much time in making this releasable, i Have a few questions: - does something already exists in contrib that I have missed? - is someone else working on that? - are there any features that I need and that you must implement that you think of? My current plans are: - using a java concurrent hash map to soft referenced objects for the hash consing table. - only two fields in an hash consed object: the value it represents, and a generic field called cached_data used to store anything you want to memoize about this data structure. (Keeping this cached data a bit longer is the reason I plan to use soft references and not weak references) I am not a clojure expert and I am not a java coder at all, so don't hesitate to tell me if my plans are somehow wrong. Bets regards, Nicolas. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Hash Consing
I'm pretty sure Clojure already does this, because of the built in equality by value. I'm pretty sure the hash keys work of off the value, too. Do you have any code you could post to github or something? That would help us determine if such a thing already exisits. I know this doesn't save you the pain of writing the library, but it could save you the trouble of maintaining it in the future. However, writing this library could be a very cool educational exercise. That, and I could have missed your intent entirely. Good luck Sean On Jun 29, 6:09 am, Nicolas Oury nicolas.o...@gmail.com wrote: Dear all, I am coding a (very) small hash consing library for clojure. For those we don't happen to know what hash consing is, it is a way of allowing equal data structures to be shared in memory. This leverages the purity (as in immutability) of data structures to reduce memory footprint (no duplication of data for just a constant overhead per allocation) and makes hashing and equality tests faster. (You can use java non overloaded hashCode, and == to test equality, because of the sharing. So both operations are O(1), with a very small constant, whatever is the complexity of the hash consed data structures.) This can then be used, in combination with referential transparency to make memoized function faster. I have a java file and a clojure interface that seem to work, at least on my example. I plan to put something somewhere someday, but before spending too much time in making this releasable, i Have a few questions: - does something already exists in contrib that I have missed? - is someone else working on that? - are there any features that I need and that you must implement that you think of? My current plans are: - using a java concurrent hash map to soft referenced objects for the hash consing table. - only two fields in an hash consed object: the value it represents, and a generic field called cached_data used to store anything you want to memoize about this data structure. (Keeping this cached data a bit longer is the reason I plan to use soft references and not weak references) I am not a clojure expert and I am not a java coder at all, so don't hesitate to tell me if my plans are somehow wrong. Bets regards, Nicolas. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Hash Consing
Hi, I'm interested in knowing how you solve garbage collection issues ? 2009/6/29 Nicolas Oury nicolas.o...@gmail.com: Dear all, I am coding a (very) small hash consing library for clojure. For those we don't happen to know what hash consing is, it is a way of allowing equal data structures to be shared in memory. This leverages the purity (as in immutability) of data structures to reduce memory footprint (no duplication of data for just a constant overhead per allocation) and makes hashing and equality tests faster. (You can use java non overloaded hashCode, and == to test equality, because of the sharing. So both operations are O(1), with a very small constant, whatever is the complexity of the hash consed data structures.) This can then be used, in combination with referential transparency to make memoized function faster. I have a java file and a clojure interface that seem to work, at least on my example. I plan to put something somewhere someday, but before spending too much time in making this releasable, i Have a few questions: - does something already exists in contrib that I have missed? - is someone else working on that? - are there any features that I need and that you must implement that you think of? My current plans are: - using a java concurrent hash map to soft referenced objects for the hash consing table. - only two fields in an hash consed object: the value it represents, and a generic field called cached_data used to store anything you want to memoize about this data structure. (Keeping this cached data a bit longer is the reason I plan to use soft references and not weak references) I am not a clojure expert and I am not a java coder at all, so don't hesitate to tell me if my plans are somehow wrong. Bets regards, Nicolas. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Hash Consing
Hi! I think it’s basically a generalized version of this: user (def hcons (memoize cons)) #'user/hcons user (identical? (cons 1 nil) (cons 1 nil)) false user (identical? (hcons 1 nil) (hcons 1 nil)) true In the case of cons, every two equal (value-wise) data structes have the same construction history, thus memoizing cons would make them object-identical. I wonder how this can be leveraged for the remaining data structures though. Vectors, sets and maps don't have construction histories unique to their values ... Kind regards, Achim Am 29.06.2009 um 13:11 schrieb Parth: On Jun 29, 3:09 pm, Nicolas Oury nicolas.o...@gmail.com wrote: Dear all, I am coding a (very) small hash consing library for clojure. For those we don't happen to know what hash consing is, it is a way of allowing equal data structures to be shared in memory. This leverages the purity (as in immutability) of data structures to reduce memory footprint (no duplication of data for just a constant overhead per allocation) and makes hashing and equality tests faster. (You can use java non overloaded hashCode, and == to test equality, because of the sharing. So both operations are O(1), with a very small constant, whatever is the complexity of the hash consed data structures.) I don't know anything about hash consing. Based on my limited understanding of the description I am just wondering if this is different from structural sharing that Clojure collections have. Quoting the docs[1]: In particular, the Clojure collections support efficient creation of 'modified' versions, by utilizing structural sharing, and make all of their performance bound guarantees for persistent use. Do you have something different in mind with hash consing? Regards, Parth [1] http://clojure.org/data_structures#toc12 This can then be used, in combination with referential transparency to make memoized function faster. I have a java file and a clojure interface that seem to work, at least on my example. I plan to put something somewhere someday, but before spending too much time in making this releasable, i Have a few questions: - does something already exists in contrib that I have missed? - is someone else working on that? - are there any features that I need and that you must implement that you think of? My current plans are: - using a java concurrent hash map to soft referenced objects for the hash consing table. - only two fields in an hash consed object: the value it represents, and a generic field called cached_data used to store anything you want to memoize about this data structure. (Keeping this cached data a bit longer is the reason I plan to use soft references and not weak references) I am not a clojure expert and I am not a java coder at all, so don't hesitate to tell me if my plans are somehow wrong. Bets regards, Nicolas. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Hash Consing
I don't know anything about hash consing. Based on my limited understanding of the description I am just wondering if this is different from structural sharing that Clojure collections have. The sharing only concerns data structure having the same creation point. As Achim explained, better than me: user (identical? (cons 1 nil) (cons 1 nil)) false In some sense, it corresponds to: user (def hcons (memoize cons)) #'user/hcons but is not limited to consing. In this sense, it acts more as: user (def hidentity (memoize identity)) Which is a bit more generic but less good for GC (it builds a structure and release it just after the call to hidentity). Creating specialized functions like hcons might be interesting too. I'm interested in knowing how you solve garbage collection issues ? The main difference with that is how it hanldes GC. The table is a concurrent hash table whose keys are references to the objects and values are Soft/Weak references to a HashConsed object. This HashConsed has a reference to the initial object and is the thing you want to hold too if you want sharing to work. When it is finalized, it removes its entries from the hash table, after some checks (to see it still owns this entry). I wonder how this can be leveraged for the remaining data structures though. Vectors, sets and maps don't have construction histories unique to their values ... I was thinking of doing something more coarse-grained, for list or trees. Where these complexed structures (vectors, set, map) are thought atomic and not hashed incrementally. (I would be happy to do otherwise, but it seems quite complicated.) Example of use to create, for example, a tree: (hash-cons {:head :a :children {set of hash consed nodes}}) This hashing is done in O(num of children), and then you get back a HashConsed structure, that is shared for all hash consed instances of the same tree. It can be used to test equality O(1), with identical?, and not O(size of the tree). Hashing is done in O(1) too. Of course, incremental hash consing of sets/map/vectors could give us adding/removing a children in a better time than O(num of children) but I don't have much clue on how to do that. Any ideas? Best, Nicolas. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Hash Consing
This hashing is done in O(num of children), and then you get back a HashConsed structure, that is shared for all hash consed instances of the same tree. It can be used to test equality O(1), with identical?, and not O(size of the tree). I've seen this referred to as interning -- indeed, that's what Java calls it for String: http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#intern() I used this technique to save heap in a CL app I wrote once -- there could be 2 million entries containing URLs submitted by a client, and most of those URLs were the same. Interning the strings before insertion (just as you do, by jamming them into a hash-table) was a big benefit, and made subsequent comparisons cheaper, too. -R --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---