You need a hash code only for INSERT operation, right? — Denis
> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko > <alexander.a.pasche...@gmail.com> wrote: > > But what if the user works from some kind of console and just types > the queries as text in full and does not bind params via JDBC or > something alike? What if there's no binary object? I don't see why we > should keep the user from usual cache gets in this case. I really like > the idea of supplying the values of distinct fields, thus freeing the > user of the need to mess with objects and builders, AND then just > calculating hash code as suggested before - say, via explicitly > listing participating fields in XML or by marking them with transient > keyword or some annotation. > Actually, I believe that's the only case when we need to generate any > hash codes - when the class is present, we can just get hash code from > its implementation of its method. When there's no class, we generate. > And all that is solely for SQL. For the rest - just throw an exception > when there's no hash code manually set for binary object. I don't see > why we should try to generate anything when the user already is using > Ignite in full, not just via limited interface of SQL. > > 2016-09-29 0:31 GMT+03:00 Denis Magda <dma...@gridgain.com>: >> Hmm, this is a good question. >> >> If a user doesn’t provide a _key when an INSERT is executed for me it means >> that he is not going to use the key later for cache.get/put, DELETE, UPDATE >> and other possible operation simply because he doesn’t know how to >> reconstruct the key back in his code. If he wants to use the primary key in >> the rest of operations then he must provide it at INSERT time. >> >> Do we need this key only for a case when an object is being inserted into a >> cache? If it’s so I would auto-generate a key using ‘long’ as a key type. I >> do remember that we provided the auto-generation for Spark module in a some >> way that may be useful here. >> >> — >> Denis >> >>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko >>> <alexander.a.pasche...@gmail.com> wrote: >>> >>> Denis, >>> >>> That's not what I was asking about. >>> Currently DML implementation allows for dymanic instantiation of keys, >>> in other words, user does not have to provide value for object-typed >>> _key column - instead, he may supply just field values based on which >>> _key will be dynamically instantiated/binary built. And that's the >>> whole point of this discussion as I see it: what to do when we've >>> binary built classless key that we build ourselves inside SQL engine >>> and don't know how to compute hash code for it? >>> >>> - Alex >>> >>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dma...@gridgain.com>: >>>> Alexander, >>>> >>>> As I guess if we have a key without a class then it will be constructed >>>> using a BinaryBuilder instance and it’s user responsibility to set the has >>>> code at the end with BinaryBuilder.hasCode method. Sure, all this cases >>>> must be well-documented in both Java Doc API and Apache Ignite >>>> documentation. >>>> >>>> — >>>> Denis >>>> >>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko >>>>> <alexander.a.pasche...@gmail.com> wrote: >>>>> >>>>> Dmitry, Denis, >>>>> >>>>> OK, but I think it's necessary to address also the cases when there's >>>>> no actual class for the key, and its fields are simply declared in >>>>> XML. In this case, there are no fields to be marked transient. What do >>>>> we do then? List transient fields in XML separately? >>>>> >>>>> - Alex >>>>> >>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: >>>>>> Agree with Denis. >>>>>> >>>>>> - by default, all non-transient key fields should participate in the >>>>>> hashcode generation >>>>>> - when working on DDL, then the primary key fields should participate in >>>>>> the hashcode >>>>>> - we should add a resolver to override the default behavior (please >>>>>> propose the interface in Jira) >>>>>> - we should print out a warning, only once per type, the the hashcode >>>>>> has been automatically generated based on which fields and which formula >>>>>> >>>>>> D. >>>>>> >>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda <dma...@gridgain.com> wrote: >>>>>> >>>>>>> Hi Alexander, >>>>>>> >>>>>>> Vladimir’s proposal sounds reasonable to me. However we must keep in >>>>>>> mind >>>>>>> one important thing. Binary objects were designed to address the >>>>>>> following >>>>>>> disadvantages a regular serializer, like optimized marshaller, has: >>>>>>> necessity to deserialize an object on a server side every time it’s >>>>>>> needed. >>>>>>> necessity to hold an object in both serialized and deserialized forms on >>>>>>> the server node. >>>>>>> necessity to restart the whole cluster each time an object version is >>>>>>> changed (new field is added or an old one is removed). >>>>>>> If it will be needed to perform step 3 for a default implementation of >>>>>>> the >>>>>>> binary resolver just because the resolver has to consider new fields or >>>>>>> ignore old ones then such an implementation sucks. Overall, the default >>>>>>> implementation should use the reflection coming over all the fields a >>>>>>> key >>>>>>> has ignoring the ones that are marked with “transient” keyword. If a >>>>>>> user >>>>>>> wants to control the default resolver's logic then he can label all the >>>>>>> fields that mustn’t be of a final has code value with “transient”. This >>>>>>> has >>>>>>> to be well-documented for sure. >>>>>>> >>>>>>> Makes sense? >>>>>>> >>>>>>> — >>>>>>> Denis >>>>>>> >>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko < >>>>>>> alexander.a.pasche...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hello Igniters, >>>>>>>> >>>>>>>> As DML support is near, it's critical that we agree on how we generate >>>>>>>> hash codes for new keys in presence of binary marshaller. Actually, >>>>>>>> this discussion isn't new - please see its beginning here: >>>>>>>> >>>>>>>> http://apache-ignite-developers.2346864.n4.nabble. >>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored- >>>>>>> at-the-same-partition-by-default-td8042.html >>>>>>>> >>>>>>>> Still, I'm creating this new thread to make getting to the final >>>>>>>> solution as simple and fast as possible. >>>>>>>> >>>>>>>> I remind everyone that the approach that has got the least critics was >>>>>>>> the one proposed by Vladimir Ozerov: >>>>>>>> >>>>>>>> <quote> >>>>>>>> I think we can do the following: >>>>>>>> 1) Add "has hash code" flag as Denis suggested. >>>>>>>> 2) If object without a hash code is put to cache, throw an exception. >>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface. >>>>>>>> 4) Add default implementation which will auto-generate hash code. >>>>>>>> *Print >>>>>>> a >>>>>>>> warning when auto-generation occurs*, so that user is aware that he is >>>>>>>> likely to have problems with normal GETs/PUTs. >>>>>>>> 5) Add another implementation which will use encoded string to >>>>>>>> calculate >>>>>>> a >>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * 31 + {b}")*. >>>>>>>> Originally proposed by Yakov some time ago. >>>>>>>> </quote> >>>>>>>> >>>>>>>> After that, Sergi suggested that instead of a "formula" we keep just a >>>>>>>> list of the "fields" that participate in hash code evaluation, and >>>>>>>> with that list, we simply calculate hash code just like IDE does - >>>>>>>> with all its bit shifts and additions. >>>>>>>> >>>>>>>> I'm planning on settling down with this combined Vlad-Sergi approach. >>>>>>>> Any objections? >>>>>>>> >>>>>>>> And an extra question I have: Vlad, you suggest that we both throw an >>>>>>>> exception on cache code absence and that we might generate it as the >>>>>>>> last resort. Do I understand you correctly that you suggest generating >>>>>>>> random code only in context of SQL, but throw exception for keys >>>>>>>> without codes on ordinary put? >>>>>>>> >>>>>>>> And yes, built-in hash codes for JDK types are supported as well as >>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of IGNITE-3633 >>>>>>>> for the flag and its presence check). >>>>>>>> >>>>>>>> - Alex >>>>>>> >>>>>>> >>>> >>