Hi Mariano,

On Tue, Dec 13, 2011 at 12:37 AM, Mariano Martinez Peck <
marianop...@gmail.com> wrote:

>
>
> On Tue, Dec 13, 2011 at 8:43 AM, Michael Roberts <m...@mjr104.co.uk>wrote:
>
>> Hi Mariano, when I read this thread I was a bit confused that you wanted
>> an IdentitySet that used #hash. From that statement it sounded like you
>> just wanted a Set. This would allow any object to define its own hash and
>> importantly what equality means with #=. So if you want to delegate that to
>> the object use a Set.
>>
>>
> No, I cannot use a Set because I cannot have repetitions. This that I am
> asking is what we do in Fuel serializer while traversing the object graph.
> Each analyzed object is put in an IdentityDictionary (but the question is
> the same for IdentitySet) as key. To avoid cycles, I need to put each
> object only once. Since graphs can be very big, such dict could have lots
> of objects.
>
>
>> However as the thread has gone on perhaps you want the identity
>> relationship but you just wanted a bigger identity hash space?
>>
>
> Yes, exactly. I were thinking if there could be a way (maybe...that's why
> I am asking) of improve its performance considering that I could have much
> more objects than 2^13. In other words, I wanted to see if I could avoid
> colisions.
>

You can assume that certain properties of objects will not change during
serialization, for example the class of objects, the basic size of objects.
 So you can construct a valid extended identity hash from these properties.
 For example

fuelSerializationHash
    ^self identityHash + (self class identityHash bitShift: 12) + (self
basicSize bitShift: 24)

In general this idea may not work because of meta-primitives like
changeClassTo:, which would change the hash.  But in Fuel's case I think
it's safe to assume that objects won't change class or size during
serialization.

HTH



>
>
>>
>> IdentitySets are fast because they bypass any delegation. Once you have
>> seen the object in your traversal (common usage pattern) that's it. You
>> grab it's identity which is a pretty low level thing to do.
>>
>
> Ok...it is a tradeoff. If I use #identityHash it is fast because there is
> no delegation and it is almost an immediate primitive. But I gues it will
> be slow if there are lots of colisions. Not using #identityHash but
> something else maybe could decrease maybe the amount of colisions, but
> maybe with the delegation it will gets slower.
>
> I will try with Levente idea of what it is done in SystemTracer: use as a
> hash the identityHash of the object mixed with the identityHash of its
> class. Maybe that decreases the colisions and at the same time I don't pay
> delegation (#class is special bytecode, so nothing, and ok..there are 2
> sends to #identityhash but I don't thinnk it is that much).
>
> Anyway, thanks for the interesting post, I always learn :)
>
>
>>
>> As for what happens with collections who knows. Depends. Relying on
>> identity set semantics for a collection is easy. Set semantics is not so.
>> Remember that both hash and equals are important to know if the set already
>> contains the element. Depending on the collection implementation both of
>> these could be composite in terms of the parts. Who knows where you end or
>> how long it takes. I.e. if it is a function of the size of the collection
>> and further collections are composite....
>>
>>
> yes...
>
>
>> Cheers
>> Mike
>>
>>
>>
>> On Tuesday, December 13, 2011, Carlo <snoob...@yahoo.ie> wrote:
>> > Hi Mariano
>> > I'm no expert either ;)
>> > Without having access to the exact code it would look like either you
>> have a collection that references itself (which would break all collection
>> implementations) or maybe the tests have just slowed down to the point
>> where you think it's 'crashed'.
>> > Do you have anymore info or perhaps which methods you changed on
>> IdentitySet?
>> > Cheers
>> > Carlo
>> > On 13 Dec 2011, at 1:57 AM, Mariano Martinez Peck wrote:
>> >
>> >
>> > On Tue, Dec 13, 2011 at 12:32 AM, Mariano Martinez Peck <
>> marianop...@gmail.com> wrote:
>> >>
>> >>
>> >> On Mon, Dec 12, 2011 at 1:56 PM, Carlo <snoob...@yahoo.ie> wrote:
>> >>>
>> >>> Hi
>> >>> Wouldn't the fact that you use hash cause potential loops now? e.g.
>> collection refers to another object that refers to first collection. -->
>> aCollection>>hash references an item which causes this current collection's
>> hash to be called again?
>> >>
>> >> Hi Carlo. I am still newbie with Collections but I think I am having
>> exactly that problem. During my tests, it loops in Collection >> #hash
>> when sending #hash to its elements.
>> >> Sorry, but I couldn't undertand what is the cause of the problem?  why
>> it doesn't work while it does using #identityHash?  could you elaborate?
>> >>
>> >
>> > Well, now I understood, and I understand also why it doesn't happen
>> with #identityHash. But what happens then with regular Dictionaries using
>> #hash? why it doesn't happen there?
>> >
>> >>
>> >> thanks
>> >>
>> >>>
>> >>> identityHash is deterministic in this case.
>> >>> Does this help?
>> >>> Cheers
>> >>> Carlo
>> >>> On 12 Dec 2011, at 10:58 AM, Mariano Martinez Peck wrote:
>> >>> Hi guys. I hope this is not a very stupid question. Background: in
>> Fuel we have a IdentityDictionary where we put each of the objects we find
>> while traversing the graph. We need to use IdentitySet because we cannot
>> have repetitions (and to avoid loops) so we NEED to use #==. In such
>> dictionary we put ALL objects of the graph, so it can be very big. Since
>> IdentitySet uses #identityHash, it means it will be using those ONLY 12
>> bits in the object header. It means that we have 2^12 = 4096  different
>> values.
>> >>>
>> >>> Question:  having explained the previous, I wanted to be able to use
>> #hash rather than #identityHash since several classes implement #hash and
>> hence I thought that using #hash I could have less colisions and hence a
>> better performance. I tried to make a subclass of IdentitySet that uses
>> #hash rather than #identityHash but my image freezes. I also tried
>> something like:
>> >>>
>> >>> set := PluggableSet new.
>> >>>     set hashBlock: [ :elem | elem hash ].
>> >>>     set equalBlock: [ :a :b | a == b ].
>> >>>
>> >>> But it doesn't work either. I works with simple tests in a workspace
>> but when I run the full tests of Fuel, it enters in a loop in the method
>> #hash of Collection..
>> >>>
>> >>> Anyway, my question is, should that work? if not, what is the exact
>> reason?
>> >>>
>> >>> Thanks in advance,
>> >>>
>> >>> --
>> >>> Mariano
>> >>> http://marianopeck.wordpress.com
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Mariano
>> >> http://marianopeck.wordpress.com
>> >>
>> >
>> >
>> >
>> > --
>> > Mariano
>> > http://marianopeck.wordpress.com
>> >
>> >
>> >
>>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>


-- 
best,
Eliot

Reply via email to