Re: Default hash code generation strategy for new binary objects

Alexander Paschenko Wed, 28 Sep 2016 15:48:22 -0700

But what if the user works from some kind of console and just types
the queries as text in full and does not bind params via JDBC or
something alike? What if there's no binary object? I don't see why we
should keep the user from usual cache gets in this case. I really like
the idea of supplying the values of distinct fields, thus freeing the
user of the need to mess with objects and builders, AND then just
calculating hash code as suggested before - say, via explicitly
listing participating fields in XML or by marking them with transient
keyword or some annotation.
Actually, I believe that's the only case when we need to generate any
hash codes - when the class is present, we can just get hash code from
its implementation of its method. When there's no class, we generate.
And all that is solely for SQL. For the rest - just throw an exception
when there's no hash code manually set for binary object. I don't see
why we should try to generate anything when the user already is using
Ignite in full, not just via limited interface of SQL.


2016-09-29 0:31 GMT+03:00 Denis Magda <[email protected]>:
> Hmm, this is a good question.
>
> If a user doesn’t provide a _key when an INSERT is executed for me it means 
> that he is not going to use the key later for cache.get/put, DELETE, UPDATE 
> and other possible operation simply because he doesn’t know how to 
> reconstruct the key back in his code. If he wants to use the primary key in 
> the rest of operations then he must provide it at INSERT time.
>
> Do we need this key only for a case when an object is being inserted into a 
> cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I 
> do remember that we provided the auto-generation for Spark module in a some 
> way that may be useful here.
>
> —
> Denis
>
>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko 
>> <[email protected]> wrote:
>>
>> Denis,
>>
>> That's not what I was asking about.
>> Currently DML implementation allows for dymanic instantiation of keys,
>> in other words, user does not have to provide value for object-typed
>> _key column - instead, he may supply just field values based on which
>> _key will be dynamically instantiated/binary built. And that's the
>> whole point of this discussion as I see it: what to do when we've
>> binary built classless key that we build ourselves inside SQL engine
>> and don't know how to compute hash code for it?
>>
>> - Alex
>>
>> 2016-09-28 19:48 GMT+03:00 Denis Magda <[email protected]>:
>>> Alexander,
>>>
>>> As I guess if we have a key without a class then it will be constructed 
>>> using a BinaryBuilder instance and it’s user responsibility to set the has 
>>> code at the end with BinaryBuilder.hasCode method. Sure, all this cases 
>>> must be well-documented in both Java Doc API and Apache Ignite 
>>> documentation.
>>>
>>> —
>>> Denis
>>>
>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko 
>>>> <[email protected]> wrote:
>>>>
>>>> Dmitry, Denis,
>>>>
>>>> OK, but I think it's necessary to address also the cases when there's
>>>> no actual class for the key, and its fields are simply declared in
>>>> XML. In this case, there are no fields to be marked transient. What do
>>>> we do then? List transient fields in XML separately?
>>>>
>>>> - Alex
>>>>
>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
>>>>> Agree with Denis.
>>>>>
>>>>>  - by default, all non-transient key fields should participate in the
>>>>>  hashcode generation
>>>>>  - when working on DDL, then the primary key fields should participate in
>>>>>  the hashcode
>>>>>  - we should add a resolver to override the default behavior (please
>>>>>  propose the interface in Jira)
>>>>>  - we should print out a warning, only once per type, the the hashcode
>>>>>  has been automatically generated based on which fields and which formula
>>>>>
>>>>> D.
>>>>>
>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <[email protected]> wrote:
>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in mind
>>>>>> one important thing. Binary objects were designed to address the 
>>>>>> following
>>>>>> disadvantages a regular serializer, like optimized marshaller, has:
>>>>>> necessity to deserialize an object on a server side every time it’s 
>>>>>> needed.
>>>>>> necessity to hold an object in both serialized and deserialized forms on
>>>>>> the server node.
>>>>>> necessity to restart the whole cluster each time an object version is
>>>>>> changed (new field is added or an old one is removed).
>>>>>> If it will be needed to perform step 3 for a default implementation of 
>>>>>> the
>>>>>> binary resolver just because the resolver has to consider new fields or
>>>>>> ignore old ones then such an implementation sucks. Overall, the default
>>>>>> implementation should use the reflection coming over all the fields a key
>>>>>> has ignoring the ones that are marked with “transient” keyword. If a user
>>>>>> wants to control the default resolver's logic then he can label all the
>>>>>> fields that mustn’t be of a final has code value with “transient”. This 
>>>>>> has
>>>>>> to be well-documented for sure.
>>>>>>
>>>>>> Makes sense?
>>>>>>
>>>>>> —
>>>>>> Denis
>>>>>>
>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko <
>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> Hello Igniters,
>>>>>>>
>>>>>>> As DML support is near, it's critical that we agree on how we generate
>>>>>>> hash codes for new keys in presence of binary marshaller. Actually,
>>>>>>> this discussion isn't new - please see its beginning here:
>>>>>>>
>>>>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored-
>>>>>> at-the-same-partition-by-default-td8042.html
>>>>>>>
>>>>>>> Still, I'm creating this new thread to make getting to the final
>>>>>>> solution as simple and fast as possible.
>>>>>>>
>>>>>>> I remind everyone that the approach that has got the least critics was
>>>>>>> the one proposed by Vladimir Ozerov:
>>>>>>>
>>>>>>> <quote>
>>>>>>> I think we can do the following:
>>>>>>> 1) Add "has hash code" flag as Denis suggested.
>>>>>>> 2) If object without a hash code is put to cache, throw an exception.
>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface.
>>>>>>> 4) Add default implementation which will auto-generate hash code. *Print
>>>>>> a
>>>>>>> warning when auto-generation occurs*, so that user is aware that he is
>>>>>>> likely to have problems with normal GETs/PUTs.
>>>>>>> 5) Add another implementation which will use encoded string to calculate
>>>>>> a
>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*.
>>>>>>> Originally proposed by Yakov some time ago.
>>>>>>> </quote>
>>>>>>>
>>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a
>>>>>>> list of the "fields" that participate in hash code evaluation, and
>>>>>>> with that list, we simply calculate hash code just like IDE does -
>>>>>>> with all its bit shifts and additions.
>>>>>>>
>>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach.
>>>>>>> Any objections?
>>>>>>>
>>>>>>> And an extra question I have: Vlad, you suggest that we both throw an
>>>>>>> exception on cache code absence and that we might generate it as the
>>>>>>> last resort. Do I understand you correctly that you suggest generating
>>>>>>> random code only in context of SQL, but throw exception for keys
>>>>>>> without codes on ordinary put?
>>>>>>>
>>>>>>> And yes, built-in hash codes for JDK types are supported as well as
>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633
>>>>>>> for the flag and its presence check).
>>>>>>>
>>>>>>> - Alex
>>>>>>
>>>>>>
>>>
>

Re: Default hash code generation strategy for new binary objects

Reply via email to