Alexey, in your opinion, what will be faster, the binary array comparison or field-by-field comparison?
On Thu, Sep 29, 2016 at 8:39 AM, Alexey Goncharuk < alexey.goncha...@gmail.com> wrote: > Folks, let me point out a few obvious (or not) things > > A set of fields participating in hashCode and equals is impossible to > change without cluster restart. Imagine a new client adding a field F to > key structure (A, B) so that new key is (A, B, F). In this case key (1, 2, > 0) will be treated as a different key w/respect to (1, 2, 3), while the new > client expects them to be the same. I think the set of fields defining the > key _must_ be used in both hash code and equals calculation and cannot > change over time. Having said that, we can use binary array comparison as > object equality test. Presence of new 'garbage' fields in key looks useless > to me. > > So, we should support: > 1) All the things Dmitriy pointed out, in other words - multi-field key > with affinity key fields > 2) Ability to instantiate the key using a text-only environment > 3) Ability to define the structure of a key using XML or DDL > > I think all of these can be done via combination of HashCodeResolver > interface + default resolver which can read type configuration. > > 2016-09-29 3:54 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > Guys, > > > > We need to look at 3 cases: > > > > a) key is just one field > > b) key is multiple fields > > c) key is one or multiple fields, with possibility of an alternate > affinity > > key > > > > For (a) and (b), whenever a type is defined in XML, and further in DML, a > > user will specify which fields are part of the key. In that case, we > should > > just grab those fields and calculate the hashcode automatically. > > > > For (c), a user should specify in XML, and further in DML, which fields > > should be used for affinity, in addition to the key fields. In that case, > > we again should grab those fields and calculate the hashcodes for the > > primary key and the affinity key. > > > > I really am not sure if there are other ways of doing it. Am I missing > > something? > > > > D. > > > > > > On Wed, Sep 28, 2016 at 5:14 PM, Denis Magda <dma...@gridgain.com> > wrote: > > > > > Let me show the picture I have in my mind: > > > > > > Primary key is a must for all INSERT and MERGE operations. If it’s not > > set > > > then an INSERT/MERGE fails. > > > If a primary key is a boxed/unboxed primitive (int, Integer, String, > > Date, > > > etc.) then the key value is used for hashCode calculation. At the same > > time > > > the key will be an affinity key. > > > If a primary key is a custom object then it’s value can be passed as a > > > param directly from Java, .Net, C++ in a way like “INSERT (_key, > field1, > > > field2) VALUES (?, val1, val2)”. In this scenario we will call hashCode > > > directly on the key's value. In addition, we will be able to get an > > > affinity key since we know key’s type class descriptor > > > (BinaryClassDescriptor). > > > If a primary key is still a custom key and we want to insert its value > > > from an SQL console, PHP, Tableu, etc. then we can’t pass the key’s > value > > > as is. Here we’re trying to apply a workaround by listing key's fields > in > > > INSERT/MERGE and the task is to properly re-construct the key on our > side > > > using only specific fields. > > > > > > Is my understanding correct for all the bullets above? > > > > > > If so then, yes, I would agree that we need to list these fields in a > > > configuration and the default hash code resolver will use them as well. > > > Moreover, we have to pin point an affinity field. So, the question is > > what > > > the configuration we should use. > > > > > > Community, any other thoughts/ideas? > > > > > > — > > > Denis > > > > > > > On Sep 28, 2016, at 4:16 PM, Alexander Paschenko < > > > alexander.a.pasche...@gmail.com> wrote: > > > > > > > > Also MERGE. > > > > > > > > 2016-09-29 2:10 GMT+03:00 Denis Magda <dma...@gridgain.com>: > > > >> You need a hash code only for INSERT operation, right? > > > >> > > > >> — > > > >> Denis > > > >> > > > >>> On Sep 28, 2016, at 3:47 PM, Alexander Paschenko < > > > alexander.a.pasche...@gmail.com> wrote: > > > >>> > > > >>> But what if the user works from some kind of console and just types > > > >>> the queries as text in full and does not bind params via JDBC or > > > >>> something alike? What if there's no binary object? I don't see why > we > > > >>> should keep the user from usual cache gets in this case. I really > > like > > > >>> the idea of supplying the values of distinct fields, thus freeing > the > > > >>> user of the need to mess with objects and builders, AND then just > > > >>> calculating hash code as suggested before - say, via explicitly > > > >>> listing participating fields in XML or by marking them with > transient > > > >>> keyword or some annotation. > > > >>> Actually, I believe that's the only case when we need to generate > any > > > >>> hash codes - when the class is present, we can just get hash code > > from > > > >>> its implementation of its method. When there's no class, we > generate. > > > >>> And all that is solely for SQL. For the rest - just throw an > > exception > > > >>> when there's no hash code manually set for binary object. I don't > see > > > >>> why we should try to generate anything when the user already is > using > > > >>> Ignite in full, not just via limited interface of SQL. > > > >>> > > > >>> 2016-09-29 0:31 GMT+03:00 Denis Magda <dma...@gridgain.com>: > > > >>>> Hmm, this is a good question. > > > >>>> > > > >>>> If a user doesn’t provide a _key when an INSERT is executed for me > > it > > > means that he is not going to use the key later for cache.get/put, > > DELETE, > > > UPDATE and other possible operation simply because he doesn’t know how > to > > > reconstruct the key back in his code. If he wants to use the primary > key > > in > > > the rest of operations then he must provide it at INSERT time. > > > >>>> > > > >>>> Do we need this key only for a case when an object is being > inserted > > > into a cache? If it’s so I would auto-generate a key using ‘long’ as a > > key > > > type. I do remember that we provided the auto-generation for Spark > module > > > in a some way that may be useful here. > > > >>>> > > > >>>> — > > > >>>> Denis > > > >>>> > > > >>>>> On Sep 28, 2016, at 9:53 AM, Alexander Paschenko < > > > alexander.a.pasche...@gmail.com> wrote: > > > >>>>> > > > >>>>> Denis, > > > >>>>> > > > >>>>> That's not what I was asking about. > > > >>>>> Currently DML implementation allows for dymanic instantiation of > > > keys, > > > >>>>> in other words, user does not have to provide value for > > object-typed > > > >>>>> _key column - instead, he may supply just field values based on > > which > > > >>>>> _key will be dynamically instantiated/binary built. And that's > the > > > >>>>> whole point of this discussion as I see it: what to do when we've > > > >>>>> binary built classless key that we build ourselves inside SQL > > engine > > > >>>>> and don't know how to compute hash code for it? > > > >>>>> > > > >>>>> - Alex > > > >>>>> > > > >>>>> 2016-09-28 19:48 GMT+03:00 Denis Magda <dma...@gridgain.com>: > > > >>>>>> Alexander, > > > >>>>>> > > > >>>>>> As I guess if we have a key without a class then it will be > > > constructed using a BinaryBuilder instance and it’s user responsibility > > to > > > set the has code at the end with BinaryBuilder.hasCode method. Sure, > all > > > this cases must be well-documented in both Java Doc API and Apache > Ignite > > > documentation. > > > >>>>>> > > > >>>>>> — > > > >>>>>> Denis > > > >>>>>> > > > >>>>>>> On Sep 28, 2016, at 9:33 AM, Alexander Paschenko < > > > alexander.a.pasche...@gmail.com> wrote: > > > >>>>>>> > > > >>>>>>> Dmitry, Denis, > > > >>>>>>> > > > >>>>>>> OK, but I think it's necessary to address also the cases when > > > there's > > > >>>>>>> no actual class for the key, and its fields are simply declared > > in > > > >>>>>>> XML. In this case, there are no fields to be marked transient. > > > What do > > > >>>>>>> we do then? List transient fields in XML separately? > > > >>>>>>> > > > >>>>>>> - Alex > > > >>>>>>> > > > >>>>>>> 2016-09-28 4:15 GMT+03:00 Dmitriy Setrakyan < > > dsetrak...@apache.org > > > >: > > > >>>>>>>> Agree with Denis. > > > >>>>>>>> > > > >>>>>>>> - by default, all non-transient key fields should participate > in > > > the > > > >>>>>>>> hashcode generation > > > >>>>>>>> - when working on DDL, then the primary key fields should > > > participate in > > > >>>>>>>> the hashcode > > > >>>>>>>> - we should add a resolver to override the default behavior > > > (please > > > >>>>>>>> propose the interface in Jira) > > > >>>>>>>> - we should print out a warning, only once per type, the the > > > hashcode > > > >>>>>>>> has been automatically generated based on which fields and > which > > > formula > > > >>>>>>>> > > > >>>>>>>> D. > > > >>>>>>>> > > > >>>>>>>> On Tue, Sep 27, 2016 at 5:42 PM, Denis Magda < > > dma...@gridgain.com> > > > wrote: > > > >>>>>>>> > > > >>>>>>>>> Hi Alexander, > > > >>>>>>>>> > > > >>>>>>>>> Vladimir’s proposal sounds reasonable to me. However we must > > > keep in mind > > > >>>>>>>>> one important thing. Binary objects were designed to address > > the > > > following > > > >>>>>>>>> disadvantages a regular serializer, like optimized > marshaller, > > > has: > > > >>>>>>>>> necessity to deserialize an object on a server side every > time > > > it’s needed. > > > >>>>>>>>> necessity to hold an object in both serialized and > deserialized > > > forms on > > > >>>>>>>>> the server node. > > > >>>>>>>>> necessity to restart the whole cluster each time an object > > > version is > > > >>>>>>>>> changed (new field is added or an old one is removed). > > > >>>>>>>>> If it will be needed to perform step 3 for a default > > > implementation of the > > > >>>>>>>>> binary resolver just because the resolver has to consider new > > > fields or > > > >>>>>>>>> ignore old ones then such an implementation sucks. Overall, > the > > > default > > > >>>>>>>>> implementation should use the reflection coming over all the > > > fields a key > > > >>>>>>>>> has ignoring the ones that are marked with “transient” > keyword. > > > If a user > > > >>>>>>>>> wants to control the default resolver's logic then he can > label > > > all the > > > >>>>>>>>> fields that mustn’t be of a final has code value with > > > “transient”. This has > > > >>>>>>>>> to be well-documented for sure. > > > >>>>>>>>> > > > >>>>>>>>> Makes sense? > > > >>>>>>>>> > > > >>>>>>>>> — > > > >>>>>>>>> Denis > > > >>>>>>>>> > > > >>>>>>>>>> On Sep 26, 2016, at 12:40 PM, Alexander Paschenko < > > > >>>>>>>>> alexander.a.pasche...@gmail.com> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>> Hello Igniters, > > > >>>>>>>>>> > > > >>>>>>>>>> As DML support is near, it's critical that we agree on how > we > > > generate > > > >>>>>>>>>> hash codes for new keys in presence of binary marshaller. > > > Actually, > > > >>>>>>>>>> this discussion isn't new - please see its beginning here: > > > >>>>>>>>>> > > > >>>>>>>>>> http://apache-ignite-developers.2346864.n4.nabble. > > > >>>>>>>>> com/All-BinaryObjects-created-by-BinaryObjectBuilder-stored- > > > >>>>>>>>> at-the-same-partition-by-default-td8042.html > > > >>>>>>>>>> > > > >>>>>>>>>> Still, I'm creating this new thread to make getting to the > > final > > > >>>>>>>>>> solution as simple and fast as possible. > > > >>>>>>>>>> > > > >>>>>>>>>> I remind everyone that the approach that has got the least > > > critics was > > > >>>>>>>>>> the one proposed by Vladimir Ozerov: > > > >>>>>>>>>> > > > >>>>>>>>>> <quote> > > > >>>>>>>>>> I think we can do the following: > > > >>>>>>>>>> 1) Add "has hash code" flag as Denis suggested. > > > >>>>>>>>>> 2) If object without a hash code is put to cache, throw an > > > exception. > > > >>>>>>>>>> 3) Add *BinaryEqualsHashCodeResolver *interface. > > > >>>>>>>>>> 4) Add default implementation which will auto-generate hash > > > code. *Print > > > >>>>>>>>> a > > > >>>>>>>>>> warning when auto-generation occurs*, so that user is aware > > > that he is > > > >>>>>>>>>> likely to have problems with normal GETs/PUTs. > > > >>>>>>>>>> 5) Add another implementation which will use encoded string > to > > > calculate > > > >>>>>>>>> a > > > >>>>>>>>>> hash code. E.g. *new BinaryEqualsHashCodeResolver("{a} * > 31 + > > > {b}")*. > > > >>>>>>>>>> Originally proposed by Yakov some time ago. > > > >>>>>>>>>> </quote> > > > >>>>>>>>>> > > > >>>>>>>>>> After that, Sergi suggested that instead of a "formula" we > > keep > > > just a > > > >>>>>>>>>> list of the "fields" that participate in hash code > evaluation, > > > and > > > >>>>>>>>>> with that list, we simply calculate hash code just like IDE > > > does - > > > >>>>>>>>>> with all its bit shifts and additions. > > > >>>>>>>>>> > > > >>>>>>>>>> I'm planning on settling down with this combined Vlad-Sergi > > > approach. > > > >>>>>>>>>> Any objections? > > > >>>>>>>>>> > > > >>>>>>>>>> And an extra question I have: Vlad, you suggest that we both > > > throw an > > > >>>>>>>>>> exception on cache code absence and that we might generate > it > > > as the > > > >>>>>>>>>> last resort. Do I understand you correctly that you suggest > > > generating > > > >>>>>>>>>> random code only in context of SQL, but throw exception for > > keys > > > >>>>>>>>>> without codes on ordinary put? > > > >>>>>>>>>> > > > >>>>>>>>>> And yes, built-in hash codes for JDK types are supported as > > > well as > > > >>>>>>>>>> items 1-2 from Vlad's list (there's already fixed issue of > > > IGNITE-3633 > > > >>>>>>>>>> for the flag and its presence check). > > > >>>>>>>>>> > > > >>>>>>>>>> - Alex > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>> > > > >>>> > > > >> > > > > > > > > >