Hi Mariano, On Tue, Dec 13, 2011 at 12:37 AM, Mariano Martinez Peck < marianop...@gmail.com> wrote:
> > > On Tue, Dec 13, 2011 at 8:43 AM, Michael Roberts <m...@mjr104.co.uk>wrote: > >> Hi Mariano, when I read this thread I was a bit confused that you wanted >> an IdentitySet that used #hash. From that statement it sounded like you >> just wanted a Set. This would allow any object to define its own hash and >> importantly what equality means with #=. So if you want to delegate that to >> the object use a Set. >> >> > No, I cannot use a Set because I cannot have repetitions. This that I am > asking is what we do in Fuel serializer while traversing the object graph. > Each analyzed object is put in an IdentityDictionary (but the question is > the same for IdentitySet) as key. To avoid cycles, I need to put each > object only once. Since graphs can be very big, such dict could have lots > of objects. > > >> However as the thread has gone on perhaps you want the identity >> relationship but you just wanted a bigger identity hash space? >> > > Yes, exactly. I were thinking if there could be a way (maybe...that's why > I am asking) of improve its performance considering that I could have much > more objects than 2^13. In other words, I wanted to see if I could avoid > colisions. > You can assume that certain properties of objects will not change during serialization, for example the class of objects, the basic size of objects. So you can construct a valid extended identity hash from these properties. For example fuelSerializationHash ^self identityHash + (self class identityHash bitShift: 12) + (self basicSize bitShift: 24) In general this idea may not work because of meta-primitives like changeClassTo:, which would change the hash. But in Fuel's case I think it's safe to assume that objects won't change class or size during serialization. HTH > > >> >> IdentitySets are fast because they bypass any delegation. Once you have >> seen the object in your traversal (common usage pattern) that's it. You >> grab it's identity which is a pretty low level thing to do. >> > > Ok...it is a tradeoff. If I use #identityHash it is fast because there is > no delegation and it is almost an immediate primitive. But I gues it will > be slow if there are lots of colisions. Not using #identityHash but > something else maybe could decrease maybe the amount of colisions, but > maybe with the delegation it will gets slower. > > I will try with Levente idea of what it is done in SystemTracer: use as a > hash the identityHash of the object mixed with the identityHash of its > class. Maybe that decreases the colisions and at the same time I don't pay > delegation (#class is special bytecode, so nothing, and ok..there are 2 > sends to #identityhash but I don't thinnk it is that much). > > Anyway, thanks for the interesting post, I always learn :) > > >> >> As for what happens with collections who knows. Depends. Relying on >> identity set semantics for a collection is easy. Set semantics is not so. >> Remember that both hash and equals are important to know if the set already >> contains the element. Depending on the collection implementation both of >> these could be composite in terms of the parts. Who knows where you end or >> how long it takes. I.e. if it is a function of the size of the collection >> and further collections are composite.... >> >> > yes... > > >> Cheers >> Mike >> >> >> >> On Tuesday, December 13, 2011, Carlo <snoob...@yahoo.ie> wrote: >> > Hi Mariano >> > I'm no expert either ;) >> > Without having access to the exact code it would look like either you >> have a collection that references itself (which would break all collection >> implementations) or maybe the tests have just slowed down to the point >> where you think it's 'crashed'. >> > Do you have anymore info or perhaps which methods you changed on >> IdentitySet? >> > Cheers >> > Carlo >> > On 13 Dec 2011, at 1:57 AM, Mariano Martinez Peck wrote: >> > >> > >> > On Tue, Dec 13, 2011 at 12:32 AM, Mariano Martinez Peck < >> marianop...@gmail.com> wrote: >> >> >> >> >> >> On Mon, Dec 12, 2011 at 1:56 PM, Carlo <snoob...@yahoo.ie> wrote: >> >>> >> >>> Hi >> >>> Wouldn't the fact that you use hash cause potential loops now? e.g. >> collection refers to another object that refers to first collection. --> >> aCollection>>hash references an item which causes this current collection's >> hash to be called again? >> >> >> >> Hi Carlo. I am still newbie with Collections but I think I am having >> exactly that problem. During my tests, it loops in Collection >> #hash >> when sending #hash to its elements. >> >> Sorry, but I couldn't undertand what is the cause of the problem? why >> it doesn't work while it does using #identityHash? could you elaborate? >> >> >> > >> > Well, now I understood, and I understand also why it doesn't happen >> with #identityHash. But what happens then with regular Dictionaries using >> #hash? why it doesn't happen there? >> > >> >> >> >> thanks >> >> >> >>> >> >>> identityHash is deterministic in this case. >> >>> Does this help? >> >>> Cheers >> >>> Carlo >> >>> On 12 Dec 2011, at 10:58 AM, Mariano Martinez Peck wrote: >> >>> Hi guys. I hope this is not a very stupid question. Background: in >> Fuel we have a IdentityDictionary where we put each of the objects we find >> while traversing the graph. We need to use IdentitySet because we cannot >> have repetitions (and to avoid loops) so we NEED to use #==. In such >> dictionary we put ALL objects of the graph, so it can be very big. Since >> IdentitySet uses #identityHash, it means it will be using those ONLY 12 >> bits in the object header. It means that we have 2^12 = 4096 different >> values. >> >>> >> >>> Question: having explained the previous, I wanted to be able to use >> #hash rather than #identityHash since several classes implement #hash and >> hence I thought that using #hash I could have less colisions and hence a >> better performance. I tried to make a subclass of IdentitySet that uses >> #hash rather than #identityHash but my image freezes. I also tried >> something like: >> >>> >> >>> set := PluggableSet new. >> >>> set hashBlock: [ :elem | elem hash ]. >> >>> set equalBlock: [ :a :b | a == b ]. >> >>> >> >>> But it doesn't work either. I works with simple tests in a workspace >> but when I run the full tests of Fuel, it enters in a loop in the method >> #hash of Collection.. >> >>> >> >>> Anyway, my question is, should that work? if not, what is the exact >> reason? >> >>> >> >>> Thanks in advance, >> >>> >> >>> -- >> >>> Mariano >> >>> http://marianopeck.wordpress.com >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Mariano >> >> http://marianopeck.wordpress.com >> >> >> > >> > >> > >> > -- >> > Mariano >> > http://marianopeck.wordpress.com >> > >> > >> > >> > > > > -- > Mariano > http://marianopeck.wordpress.com > > -- best, Eliot