Hash Consing

2009-06-29 Thread Nicolas Oury

Dear all,

I am coding a (very) small hash consing library for clojure.

For those we don't happen to know what hash consing is, it is a way of
allowing equal data structures to be shared in memory.
This leverages the purity (as in immutability) of data structures to
reduce memory footprint (no duplication of data for just a constant
overhead per allocation) and makes hashing and equality tests faster.
(You can use java non overloaded hashCode, and == to test equality,
because of the sharing. So both operations are O(1), with a very small
constant, whatever is the complexity of the hash consed data
structures.)
This can then be used, in combination with referential transparency to 
make memoized function faster.

I have a java file and a clojure interface that seem to work, at least
on my example. I plan to put something somewhere someday, but before
spending too much time in making this releasable, i Have a few
questions:

- does something already exists in contrib that I have missed?
- is someone else working on that?
- are there any features that I need and that you must implement that
you think of?

My current plans are:
- using a java concurrent hash map to soft referenced objects for the
hash consing table.
- only two fields in an hash consed object: the value it represents, 
  and a generic field called cached_data used to store anything you
want to memoize about this data structure. (Keeping this cached data a
bit longer is the reason I plan to use soft references and not weak
references)

I am not a clojure expert and I am not a java coder at all, so don't
hesitate to tell me if my plans are somehow wrong. 

Bets regards,

Nicolas.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Hash Consing

2009-06-29 Thread Parth



On Jun 29, 3:09 pm, Nicolas Oury nicolas.o...@gmail.com wrote:
 Dear all,

 I am coding a (very) small hash consing library for clojure.

 For those we don't happen to know what hash consing is, it is a way of
 allowing equal data structures to be shared in memory.
 This leverages the purity (as in immutability) of data structures to
 reduce memory footprint (no duplication of data for just a constant
 overhead per allocation) and makes hashing and equality tests faster.
 (You can use java non overloaded hashCode, and == to test equality,
 because of the sharing. So both operations are O(1), with a very small
 constant, whatever is the complexity of the hash consed data
 structures.)

I don't know anything about hash consing. Based on my
limited understanding of the description I am just wondering
if this is different from structural sharing that Clojure collections
have.

Quoting the docs[1]:

In particular, the Clojure collections support efficient creation of
'modified'
versions, by utilizing structural sharing, and make all of their
performance
bound guarantees for persistent use.


Do you have something different in mind with hash consing?

Regards,
Parth

[1] http://clojure.org/data_structures#toc12

 This can then be used, in combination with referential transparency to
 make memoized function faster.

 I have a java file and a clojure interface that seem to work, at least
 on my example. I plan to put something somewhere someday, but before
 spending too much time in making this releasable, i Have a few
 questions:

 - does something already exists in contrib that I have missed?
 - is someone else working on that?
 - are there any features that I need and that you must implement that
 you think of?

 My current plans are:
 - using a java concurrent hash map to soft referenced objects for the
 hash consing table.
 - only two fields in an hash consed object: the value it represents,
   and a generic field called cached_data used to store anything you
 want to memoize about this data structure. (Keeping this cached data a
 bit longer is the reason I plan to use soft references and not weak
 references)

 I am not a clojure expert and I am not a java coder at all, so don't
 hesitate to tell me if my plans are somehow wrong.

 Bets regards,

 Nicolas.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Hash Consing

2009-06-29 Thread Sean Devlin

I'm pretty sure Clojure already does this, because of the built in
equality by value.  I'm pretty sure the hash keys work of off the
value, too.

Do you have any code you could post to github or something?  That
would help us determine if such a thing already exisits.  I know this
doesn't save you the pain of writing the library, but it could save
you the trouble of maintaining it in the future.

However, writing this library could be a very cool educational
exercise.  That, and I could have missed your intent entirely.

Good luck
Sean

On Jun 29, 6:09 am, Nicolas Oury nicolas.o...@gmail.com wrote:
 Dear all,

 I am coding a (very) small hash consing library for clojure.

 For those we don't happen to know what hash consing is, it is a way of
 allowing equal data structures to be shared in memory.
 This leverages the purity (as in immutability) of data structures to
 reduce memory footprint (no duplication of data for just a constant
 overhead per allocation) and makes hashing and equality tests faster.
 (You can use java non overloaded hashCode, and == to test equality,
 because of the sharing. So both operations are O(1), with a very small
 constant, whatever is the complexity of the hash consed data
 structures.)
 This can then be used, in combination with referential transparency to
 make memoized function faster.

 I have a java file and a clojure interface that seem to work, at least
 on my example. I plan to put something somewhere someday, but before
 spending too much time in making this releasable, i Have a few
 questions:

 - does something already exists in contrib that I have missed?
 - is someone else working on that?
 - are there any features that I need and that you must implement that
 you think of?

 My current plans are:
 - using a java concurrent hash map to soft referenced objects for the
 hash consing table.
 - only two fields in an hash consed object: the value it represents,
   and a generic field called cached_data used to store anything you
 want to memoize about this data structure. (Keeping this cached data a
 bit longer is the reason I plan to use soft references and not weak
 references)

 I am not a clojure expert and I am not a java coder at all, so don't
 hesitate to tell me if my plans are somehow wrong.

 Bets regards,

 Nicolas.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Hash Consing

2009-06-29 Thread Laurent PETIT

Hi,

I'm interested in knowing how you solve garbage collection issues ?


2009/6/29 Nicolas Oury nicolas.o...@gmail.com:

 Dear all,

 I am coding a (very) small hash consing library for clojure.

 For those we don't happen to know what hash consing is, it is a way of
 allowing equal data structures to be shared in memory.
 This leverages the purity (as in immutability) of data structures to
 reduce memory footprint (no duplication of data for just a constant
 overhead per allocation) and makes hashing and equality tests faster.
 (You can use java non overloaded hashCode, and == to test equality,
 because of the sharing. So both operations are O(1), with a very small
 constant, whatever is the complexity of the hash consed data
 structures.)
 This can then be used, in combination with referential transparency to
 make memoized function faster.

 I have a java file and a clojure interface that seem to work, at least
 on my example. I plan to put something somewhere someday, but before
 spending too much time in making this releasable, i Have a few
 questions:

 - does something already exists in contrib that I have missed?
 - is someone else working on that?
 - are there any features that I need and that you must implement that
 you think of?

 My current plans are:
 - using a java concurrent hash map to soft referenced objects for the
 hash consing table.
 - only two fields in an hash consed object: the value it represents,
  and a generic field called cached_data used to store anything you
 want to memoize about this data structure. (Keeping this cached data a
 bit longer is the reason I plan to use soft references and not weak
 references)

 I am not a clojure expert and I am not a java coder at all, so don't
 hesitate to tell me if my plans are somehow wrong.

 Bets regards,

 Nicolas.


 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Hash Consing

2009-06-29 Thread Achim Passen
Hi!

I think it’s basically a generalized version of this:

user (def hcons (memoize cons))
#'user/hcons
user (identical? (cons 1 nil) (cons 1 nil))
false
user (identical? (hcons 1 nil) (hcons 1 nil))
true

In the case of cons, every two equal (value-wise) data structes have  
the same construction history, thus memoizing cons would make them  
object-identical.

I wonder how this can be leveraged for the remaining data structures  
though. Vectors, sets and maps don't have construction histories  
unique to their values ...

Kind regards,
Achim

Am 29.06.2009 um 13:11 schrieb Parth:

 On Jun 29, 3:09 pm, Nicolas Oury nicolas.o...@gmail.com wrote:
 Dear all,

 I am coding a (very) small hash consing library for clojure.

 For those we don't happen to know what hash consing is, it is a way  
 of
 allowing equal data structures to be shared in memory.
 This leverages the purity (as in immutability) of data structures  
 to
 reduce memory footprint (no duplication of data for just a constant
 overhead per allocation) and makes hashing and equality tests faster.
 (You can use java non overloaded hashCode, and == to test equality,
 because of the sharing. So both operations are O(1), with a very  
 small
 constant, whatever is the complexity of the hash consed data
 structures.)

 I don't know anything about hash consing. Based on my
 limited understanding of the description I am just wondering
 if this is different from structural sharing that Clojure collections
 have.

 Quoting the docs[1]:
 
 In particular, the Clojure collections support efficient creation of
 'modified'
 versions, by utilizing structural sharing, and make all of their
 performance
 bound guarantees for persistent use.
 

 Do you have something different in mind with hash consing?

 Regards,
 Parth

 [1] http://clojure.org/data_structures#toc12

 This can then be used, in combination with referential transparency  
 to
 make memoized function faster.

 I have a java file and a clojure interface that seem to work, at  
 least
 on my example. I plan to put something somewhere someday, but before
 spending too much time in making this releasable, i Have a few
 questions:

 - does something already exists in contrib that I have missed?
 - is someone else working on that?
 - are there any features that I need and that you must implement  
 that
 you think of?

 My current plans are:
 - using a java concurrent hash map to soft referenced objects for the
 hash consing table.
 - only two fields in an hash consed object: the value it represents,
   and a generic field called cached_data used to store anything you
 want to memoize about this data structure. (Keeping this cached  
 data a
 bit longer is the reason I plan to use soft references and not weak
 references)

 I am not a clojure expert and I am not a java coder at all, so don't
 hesitate to tell me if my plans are somehow wrong.

 Bets regards,

 Nicolas.
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Hash Consing

2009-06-29 Thread Nicolas Oury


 
 I don't know anything about hash consing. Based on my
 limited understanding of the description I am just wondering
 if this is different from structural sharing that Clojure collections
 have.

The sharing only concerns data structure having the same creation
point. 
As Achim explained, better than me:

user (identical? (cons 1 nil) (cons 1 nil)) false

In some sense, it corresponds to: 

user (def hcons (memoize cons))
 #'user/hcons

but is not limited to consing.

In this sense, it acts more as:

user (def hidentity (memoize identity))

Which is a bit more generic but less good for GC (it builds a structure
and release it just after the call to hidentity).

Creating specialized functions like hcons might be interesting too.

 I'm interested in knowing how you solve garbage collection issues ?

The main difference with that is how it hanldes GC. The table is a
concurrent hash table whose keys are references to the objects and
values are Soft/Weak references to a HashConsed object.

This HashConsed has a reference to the initial object and is the thing
you want to hold too if you want sharing to work.

When it is finalized, it removes its entries from the hash table, after
some checks (to see it still owns this entry).

 I wonder how this can be leveraged for the remaining data structures
 though. Vectors, sets and maps don't have construction histories
 unique to their values ... 

I was thinking of doing something more coarse-grained, for list or
trees. Where these complexed structures (vectors, set, map) are thought
atomic and not hashed incrementally.
(I would be happy to do otherwise, but it seems quite complicated.)

Example of use to create, for example, a tree:

 (hash-cons {:head :a :children {set of hash consed nodes}})

This hashing is done in O(num of children), and then you get back a
HashConsed structure, that is shared for all hash consed instances
of the same tree. It can be used to test equality O(1), with identical?,
and not O(size of the tree).
Hashing is done in O(1) too.

Of course, incremental hash consing of sets/map/vectors could give us
adding/removing a children in a better time than O(num of children) but
I don't have much clue on how to do that. Any ideas?

Best,

Nicolas.






--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Hash Consing

2009-06-29 Thread Richard Newman

 This hashing is done in O(num of children), and then you get back a
 HashConsed structure, that is shared for all hash consed instances
 of the same tree. It can be used to test equality O(1), with  
 identical?,
 and not O(size of the tree).

I've seen this referred to as interning -- indeed, that's what Java  
calls it for String:

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#intern()

I used this technique to save heap in a CL app I wrote once -- there  
could be 2 million entries containing URLs submitted by a client, and  
most of those URLs were the same. Interning the strings before  
insertion (just as you do, by jamming them into a hash-table) was a  
big benefit, and made subsequent comparisons cheaper, too.

-R

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---