I was thinking about protecting users from doing stupid things, but ok, we can do a broadcast as well.
Sergi 2015-08-08 3:46 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > Sergi, > > I personally don't like that for certain types of queries we will be > throwing an exception. > > After analyzing the approaches you suggested, I can think of cases where A > performs better than B, as well as when B performs better than A. > > However, if you prefer B, I don't mind us taking that approach. As you have > mentioned yourself, in case of non-collocated non-affinity-ID queries, you > would require a broadcast which is a performance hit. I still vote that we > take this performance hit and do the broadcast (optimized with batching, of > course), and execute the query instead of throwing an exception. > > D. > > On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <sergi.vlady...@gmail.com> > wrote: > > > Alexey, > > > > 1. Yes, in my plan it should work exactly like that: if both keys in join > > are affinity keys, then we are fully collocated, if only one then we can > > run join remotely as described, if none of them we will fail to run the > > query. > > > > 2. I mean we don't have values for these affinity keys in our local query > > result to map requests to remote nodes. > > Example: > > Lest say we have 4 partitioned tables: > > - Organization(id) with affinity key `id`. > > - Person(id, orgId, name) with affinity key `orgId` (it means that it > will > > be collocated with `Organization`) > > - Manufacturer(id) with affinity key `id`. > > - Purchase(id, personId, manufId) with affinity key `manufId` (collocated > > with `Manufacturer`) > > > > As you can see `Purchase` has a reference to a `Person` and we may want > to > > join them by this reference in a query like this: > > > > SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId > > WHERE > > pu.id = ? > > > > as you can see neither `pe.id` nor `pu.personId` is an affinity key > here. > > But if the `Person` has affinity key `id` and thus is not collocated with > > `Organization` > > we can run query on `Purchase`, take value of `personId` and find the > > affinity node to get the needed `Person`. > > > > Of course it is a restriction but there are multiple ways to workaround > it, > > so I don't think it is really a problem: > > > > 1. Use primary key as affinity key if table is used in such joins. This > way > > `Person` still can be joined to `Organization` > > (less effective though) and `Pusrchase` can be joined to `Person` as > well. > > 2. Use denormalization: instread of having `Purchase.personId` store > > `Person` object itself there. > > 3. Introduce another entity which can duplicate data from `Person` but > have > > collocation needed for this failing query: > > For our example it can be an entity PersonForPurchace(id, manufId, name) > > with the same affinity key `manufId` as `Purchase`. > > This way our query can be rewritten in fully collocated style: > > > > SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id = > > pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ? > > > > Sergi > > > > > > > > > > > > 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com>: > > > > > Sergi, > > > > > > Questions about plan "B" :) > > > 1) It is possible to throw exception on query prepare state (fail fast) > > > when > > > we don't know remote affinity key? > > > 2) Could you provide an example when we don't know remote affinity > key? I > > > think we always have some default affinity (no?)? > > > > > > -- > > > Alexey Kuznetsov > > > GridGain Systems > > > www.gridgain.com > > > > > >