Re: Default hash code generation strategy for new binary objects

Alexander Paschenko Wed, 28 Sep 2016 09:54:19 -0700

Denis,

That's not what I was asking about.
Currently DML implementation allows for dymanic instantiation of keys,
in other words, user does not have to provide value for object-typed
_key column - instead, he may supply just field values based on which
_key will be dynamically instantiated/binary built. And that's the
whole point of this discussion as I see it: what to do when we've
binary built classless key that we build ourselves inside SQL engine
and don't know how to compute hash code for it?


- Alex

2016-09-28 19:48 GMT+03:00 Denis Magda <dma...@gridgain.com>:
> Alexander,
>
> As I guess if we have a key without a class then it will be constructed using 
> a BinaryBuilder instance and it’s user responsibility to set the has code at 
> the end with BinaryBuilder.hasCode method. Sure, all this cases must be 
> well-documented in both Java Doc API and Apache Ignite documentation.
>
> —
> Denis
>
>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko 
>> <alexander.a.pasche...@gmail.com> wrote:
>>
>> Dmitry, Denis,
>>
>> OK, but I think it's necessary to address also the cases when there's
>> no actual class for the key, and its fields are simply declared in
>> XML. In this case, there are no fields to be marked transient. What do
>> we do then? List transient fields in XML separately?
>>
>> - Alex
>>
>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
>>> Agree with Denis.
>>>
>>>   - by default, all non-transient key fields should participate in the
>>>   hashcode generation
>>>   - when working on DDL, then the primary key fields should participate in
>>>   the hashcode
>>>   - we should add a resolver to override the default behavior (please
>>>   propose the interface in Jira)
>>>   - we should print out a warning, only once per type, the the hashcode
>>>   has been automatically generated based on which fields and which formula
>>>
>>> D.
>>>
>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dma...@gridgain.com> wrote:
>>>
>>>> Hi Alexander,
>>>>
>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>> one important thing. Binary objects were designed to address the following
>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>> necessity to deserialize an object on a server side every time it’s needed.
>>>> necessity to hold an object in both serialized and deserialized forms on
>>>> the server node.
>>>> necessity to restart the whole cluster each time an object version is
>>>> changed (new field is added or an old one is removed).
>>>> If it will be needed to perform step 3 for a default implementation of the
>>>> binary resolver just because the resolver has to consider new fields or
>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>> implementation should use the reflection coming over all the fields a key
>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>> wants to control the default resolver's logic then he can label all the
>>>> fields that mustn’t be of a final has code value with “transient”. This has
>>>> to be well-documented for sure.
>>>>
>>>> Makes sense?
>>>>
>>>> —
>>>> Denis
>>>>
>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>> alexander.a.pasche...@gmail.com> wrote:
>>>>>
>>>>> Hello Igniters,
>>>>>
>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>> this discussion isn't new - please see its beginning here:
>>>>>
>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>> at-the-same-partition-by-default-td8042.html
>>>>>
>>>>> Still, I'm creating this new thread to make getting to the final
>>>>> solution as simple and fast as possible.
>>>>>
>>>>> I remind everyone that the approach that has got the least critics was
>>>>> the one proposed by Vladimir Ozerov:
>>>>>
>>>>> <quote>
>>>>> I think we can do the following:
>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>> a
>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>> likely to have problems with normal GETs/PUTs.
>>>>> 5) Add another implementation which will use encoded string to calculate
>>>> a
>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>> Originally proposed by Yakov some time ago.
>>>>> </quote>
>>>>>
>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>> with all its bit shifts and additions.
>>>>>
>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>> Any objections?
>>>>>
>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>> exception on cache code absence and that we might generate it as the
>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>> random code only in context of SQL, but throw exception for keys
>>>>> without codes on ordinary put?
>>>>>
>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>> for the flag and its presence check).
>>>>>
>>>>> - Alex
>>>>
>>>>
>

Re: Default hash code generation strategy for new binary objects

Reply via email to