Re: hash-map based on identity
On Sun, Mar 8, 2009 at 9:28 PM, Rich Hickey richhic...@gmail.com wrote: On Mar 8, 6:17 am, David Powell djpow...@djpowell.net wrote: Identity is tested first in equality, if identical, equal, full stop. That's what I'd assumed (it's what the JDK collections do), but looking at the code, to say, APersistentVectory, I can't see where the identity test is done? Am I looking in the wrong place? Clojure code routes through clojure.lang.Util.equiv()/equals() which do the check, but each collection should as well, for use outside of Clojure. Issue/patch welcome. I've created an issue, just so we don't lose track of this, but I'm not writing a patch at the moment. This is probably some nice low-hanging fruit for someone who has sent in their CA and is interested in starting to get patches into Clojure itself: http://code.google.com/p/clojure/issues/detail?id=92 --Chouser --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re[2]: hash-map based on identity
Identity is tested first in equality, if identical, equal, full stop. That's what I'd assumed (it's what the JDK collections do), but looking at the code, to say, APersistentVectory, I can't see where the identity test is done? Am I looking in the wrong place? -- Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
On Mar 8, 6:17 am, David Powell djpow...@djpowell.net wrote: Identity is tested first in equality, if identical, equal, full stop. That's what I'd assumed (it's what the JDK collections do), but looking at the code, to say, APersistentVectory, I can't see where the identity test is done? Am I looking in the wrong place? Clojure code routes through clojure.lang.Util.equiv()/equals() which do the check, but each collection should as well, for use outside of Clojure. Issue/patch welcome. Rich --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
java.util.IdentityHashMap On Sat, Mar 7, 2009 at 1:49 AM, Mark Engelberg mark.engelb...@gmail.comwrote: Is there a variation of hash-map which supports comparison of keys using identical? rather than = ? Ditto with sets. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
Here's why it would be useful to have the option of hash maps that use identity... Let's say my keys are rather long lists. Equality comparison will be slow. But if I know that the keys are always the exact same long lists (perhaps because I traversed the key sequence to find a key with a certain property, and then am updating with that precise key as one example), then I can save lots of time by using an identity comparison. FWIW, the Scheme I usually use (PLT Scheme) supports both eq? and equal? based hash tables, and I have found both to be useful (although I prefer equal? based hash tables as the default). On Sat, Mar 7, 2009 at 7:57 AM, David Powell djpow...@djpowell.net wrote: What objects are you trying to store? The ideal in the Clojure world, is that most objects are immutable. Immutable objects will implement value equality in .equals() ('=' in Clojure), and you shouldn't care about whether two objects are identical? because it won't affect anything. It's an efficiency issue. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
Depending on what you are doing perhaps you could use (hash x). This might not be very helpful if you need to update keys frequently. However if you're mostly doing lookups, then this would help quite a bit. On Sat, Mar 7, 2009 at 3:44 PM, Mark Engelberg mark.engelb...@gmail.comwrote: Here's why it would be useful to have the option of hash maps that use identity... Let's say my keys are rather long lists. Equality comparison will be slow. But if I know that the keys are always the exact same long lists (perhaps because I traversed the key sequence to find a key with a certain property, and then am updating with that precise key as one example), then I can save lots of time by using an identity comparison. FWIW, the Scheme I usually use (PLT Scheme) supports both eq? and equal? based hash tables, and I have found both to be useful (although I prefer equal? based hash tables as the default). On Sat, Mar 7, 2009 at 7:57 AM, David Powell djpow...@djpowell.net wrote: What objects are you trying to store? The ideal in the Clojure world, is that most objects are immutable. Immutable objects will implement value equality in .equals() ('=' in Clojure), and you shouldn't care about whether two objects are identical? because it won't affect anything. It's an efficiency issue. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
On Mar 7, 3:44 pm, Mark Engelberg mark.engelb...@gmail.com wrote: Here's why it would be useful to have the option of hash maps that use identity... Let's say my keys are rather long lists. Equality comparison will be slow. But if I know that the keys are always the exact same long lists (perhaps because I traversed the key sequence to find a key with a certain property, and then am updating with that precise key as one example), then I can save lots of time by using an identity comparison. Identity is tested first in equality, if identical, equal, full stop. So if you are using identical and unique collections as keys you'll find them without a value-by-value comparison. If they are not present, it's unlikely you'll get a matching hashCode and a matching count and a matching long prefix from a mismatch. It's an efficiency issue. I don't think so. Identity keys are most often needed e.g. when you are trying to save/restore object graphs and need to know that you've already seen a particular object perhaps equal to another non- identical object in the same graph, or otherwise tracking object identity for synchronization with an external system. Using IdentityHashMap as an implementation detail in these situations is perfectly fine. I don't see a compelling argument for clojure to offer identity-based collections. Rich --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
On Sat, Mar 7, 2009 at 2:57 PM, Rich Hickey richhic...@gmail.com wrote: Identity is tested first in equality, if identical, equal, full stop. So if you are using identical and unique collections as keys you'll find them without a value-by-value comparison. If they are not present, it's unlikely you'll get a matching hashCode and a matching count and a matching long prefix from a mismatch. But if the key is a long list, wouldn't it have to traverse the list just to come up with the hash code for that list? And doesn't it need that hash code just to know which bin to look in for matching keys? If so, then the performance hit occurs before you even get to comparing against other keys. My understanding is that eq?-based associative collections use a completely different hashing scheme, hashing on the address or some other unique/stable value associated with each object, and then use eq? (like identical?) to do the actual comparison when they get to the right bin. Another use case: Consider sorting a collection of lengthy lists, and you want to sort based on the length of the lists. You want to cache the length of those lists. Using an identity-based hash-map would be the right kind of cache for such a thing, because you're looking up the exact same lists and want to be efficient about it (assuming my comments above about hash performance is correct). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
On Mar 7, 2009, at 8:30 PM, Mark Engelberg wrote: On Sat, Mar 7, 2009 at 2:57 PM, Rich Hickey richhic...@gmail.com wrote: Identity is tested first in equality, if identical, equal, full stop. So if you are using identical and unique collections as keys you'll find them without a value-by-value comparison. If they are not present, it's unlikely you'll get a matching hashCode and a matching count and a matching long prefix from a mismatch. But if the key is a long list, wouldn't it have to traverse the list just to come up with the hash code for that list? And doesn't it need that hash code just to know which bin to look in for matching keys? If so, then the performance hit occurs before you even get to comparing against other keys. The core collection classes cache their hash codes. There is an issue to have them create them incrementally, amortizing the cost over all insertions, which will work for all but lists, which have a front-to- back hash algorithm but are built back-to-front. http://code.google.com/p/clojure/issues/detail?id=11 My understanding is that eq?-based associative collections use a completely different hashing scheme, hashing on the address or some other unique/stable value associated with each object, and then use eq? (like identical?) to do the actual comparison when they get to the right bin. Yes, so? It's still not an argument for another collection family, which will only engender confusion as people make similar incorrect assumptions about performance and equality. Another use case: Consider sorting a collection of lengthy lists, and you want to sort based on the length of the lists. You want to cache the length of those lists. Using an identity-based hash-map would be the right kind of cache for such a thing, because you're looking up the exact same lists and want to be efficient about it (assuming my comments above about hash performance is correct). This argument only applies if you never want to calculate the hash code by a walk, even once. I think that's a false economy, and again, not persuasive. With incremental hashcode calculation, even less so. Clojure has other data structures that address some of the deficiencies of lists, but even PersistentList is Counted, i.e. provides its count in constant time, as do maps, sets and vectors, array, vector and string seqs etc. Rich --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: hash-map based on identity
On Mar 7, 2009, at 8:30 PM, Mark Engelberg wrote: But if the key is a long list, wouldn't it have to traverse the list just to come up with the hash code for that list? Yes, but only once. Since Clojure lists are immutable, their hash code is calculated once and cached and the cache will never become invalid. --Steve smime.p7s Description: S/MIME cryptographic signature
hash-map based on identity
Is there a variation of hash-map which supports comparison of keys using identical? rather than = ? Ditto with sets. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---