Re: Distributive SQL Joins

Sergi Vladykin Mon, 10 Aug 2015 23:28:02 -0700

I was thinking about protecting users from doing stupid things,
but ok, we can do a broadcast as well.


Sergi

2015-08-08 3:46 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> Sergi,
>
> I personally don't like that for certain types of queries we will be
> throwing an exception.
>
> After analyzing the approaches you suggested, I can think of cases where A
> performs better than B, as well as when B performs better than A.
>
> However, if you prefer B, I don't mind us taking that approach. As you have
> mentioned yourself, in case of non-collocated non-affinity-ID queries, you
> would require a broadcast which is a performance hit. I still vote that we
> take this performance hit and do the broadcast (optimized with batching, of
> course), and execute the query instead of throwing an exception.
>
> D.
>
> On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <sergi.vlady...@gmail.com>
> wrote:
>
> > Alexey,
> >
> > 1. Yes, in my plan it should work exactly like that: if both keys in join
> > are affinity keys, then we are fully collocated, if only one then we can
> > run join remotely as described, if none of them we will fail to run the
> > query.
> >
> > 2. I mean we don't have values for these affinity keys in our local query
> > result to map requests to remote nodes.
> > Example:
> > Lest say we have 4 partitioned tables:
> > - Organization(id) with affinity key `id`.
> > - Person(id, orgId, name) with affinity key `orgId` (it means that it
> will
> > be collocated with `Organization`)
> > - Manufacturer(id) with affinity key `id`.
> > - Purchase(id, personId, manufId) with affinity key `manufId` (collocated
> > with `Manufacturer`)
> >
> > As you can see `Purchase` has a reference to a `Person` and we may want
> to
> > join them by this reference in a query like this:
> >
> > SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId
> > WHERE
> > pu.id = ?
> >
> > as you can see neither `pe.id` nor `pu.personId` is an affinity key
> here.
> > But if the `Person` has affinity key `id` and thus is not collocated with
> > `Organization`
> > we can run query on `Purchase`, take value of `personId` and find the
> > affinity node to get the needed `Person`.
> >
> > Of course it is a restriction but there are multiple ways to workaround
> it,
> > so I don't think it is really a problem:
> >
> > 1. Use primary key as affinity key if table is used in such joins. This
> way
> > `Person` still can be joined to `Organization`
> > (less effective though) and `Pusrchase` can be joined to `Person` as
> well.
> > 2. Use denormalization: instread of having `Purchase.personId` store
> > `Person` object itself there.
> > 3. Introduce another entity which can duplicate data from `Person` but
> have
> > collocation needed for this failing query:
> > For our example it can be an entity PersonForPurchace(id, manufId, name)
> > with the same affinity key `manufId` as `Purchase`.
> > This way our query can be rewritten in fully collocated style:
> >
> > SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
> > pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?
> >
> > Sergi
> >
> >
> >
> >
> >
> > 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
> >
> > > Sergi,
> > >
> > > Questions about plan "B" :)
> > > 1) It is possible to throw exception on query prepare state (fail fast)
> > > when
> > > we don't know remote affinity key?
> > > 2) Could you provide an example when we don't know remote affinity
> key? I
> > > think we always have some default affinity (no?)?
> > >
> > > --
> > > Alexey Kuznetsov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>

Re: Distributive SQL Joins

Reply via email to