Re: Calcite based SQL query engine. Local queries

Alexey Goncharuk Thu, 07 Nov 2019 00:32:06 -0800

Denis, Stephen,

Running a local query in a broadcast closure won't work on changing
topology. We specifically added an affinityCall method to the compute API
in order to pin a partition to prevent its moving and eviction throughout
the task execution. Therefore, the query inside an affinityCall is always
executed against some partitions (otherwise the query may give incorrect
results when topology is changed).


I support Igor's question and think that the 'local' flag for the query
should be deprecated and eventually removed. A 'local' query can always be
expressed as a query agains a set of partitions. If those partitions are
located on the same node - good, we get fast and correct results. If not -
we may either raise an exception and ask user to remap the query, or
fallback to a distributed query execution.

Given that the Calcite prototype is in its early stages, it's likely its
first version will be available in 3.x, and it's a good chance to get rid
of wrong API pieces.

--AG

пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
[email protected]>:

> A common use case is where you want to work on many rows of data across
> the grid. You’d broadcast a closure, running the same code on every node
> with just the local data. SQL doesn’t work in isolation — it’s often used
> as a filter for future computations.
>
> Regards,
> Stephen
>
> > On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[email protected]> wrote:
> >
> > Denis,
> >
> > I am mostly concerned about gathering use cases. It would be great to
> > critically assess such cases to identify why it cannot be solved by
> > using distributed SQL. Also it sounds similar to some kind of "hints",
> > but very limited and with all hints drawbacks (impossibility to use
> > full strength of CBO). We can provide better "hints" support with new
> > engine as well.
> >
> > пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[email protected]>:
> >>
> >> Ivan,
> >>
> >> I was involved in a couple of such use cases personally, so, that's not
> my
> >> imagination ;) Even more, as far as I remember, the primary reason why
> we
> >> improved our affinityRuns ensuring no partition is purged from a node
> until
> >> a task is completed is because many users were running local SQL from
> >> compute tasks and needed a guarantee that SQL will always return a
> correct
> >> result set.
> >>
> >> -
> >> Denis
> >>
> >>
> >> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[email protected]>
> wrote:
> >>
> >>> Denis,
> >>>
> >>> Would be nice to see real use-cases of affinity call + local SQL
> >>> combination. Generally, new engine will be able to infer collocation
> >>> resulting in the same collocated execution automatically.
> >>>
> >>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[email protected]>:
> >>>>
> >>>> Hi Igor,
> >>>>
> >>>> Local queries feature is broadly used together with affinity-based
> >>> compute
> >>>> tasks:
> >>>>
> >>>
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> >>>>
> >>>> The use case is as follows. The user knows that all required data
> needed
> >>>> for computation is collocated, and SQL is used as an advanced API for
> >>> data
> >>>> retrieval from the computation code. The affinity task ensures that
> >>>> partitions won't be discarded from the node(s) if the topology changes
> >>>> during the task execution and, thus, it's safe to run SQL locally
> >>> skipping
> >>>> distributed phases.
> >>>>
> >>>> The combination of affinity compute tasks with local SQL is a real and
> >>>> valuable use case, and this is what we need to support with Calcite.
> Do
> >>> you
> >>>> see any challenges?
> >>>>
> >>>> -
> >>>> Denis
> >>>>
> >>>>
> >>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> <[email protected]
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Hi Igor!
> >>>>>
> >>>>> IMO we need to maintain the backward compatibility between old and
> new
> >>>>> query engines as much as possible. And therefore we shouldn't change
> >>> the
> >>>>> behavior of local queries.
> >>>>>
> >>>>> So, for local queries Calcite's planner shouldn't consider the
> >>>>> distribution trait at all.
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Kind Regards
> >>>>> Roman Kondakov
> >>>>>
> >>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> >>>>>> Hi Igniters,
> >>>>>>
> >>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
> >>> need
> >>>>> local queries at all and, if so, what semantic they should have?».
> >>>>>>
> >>>>>> Current planing flow consists of next steps:
> >>>>>>
> >>>>>> 1) Parsing SQL to AST
> >>>>>> 2) Validating AST (against Schema)
> >>>>>> 3) Optimizing (Building execution graph)
> >>>>>> 4) Splitting (into query fragments which executes on target nodes)
> >>>>>> 5) Mapping (query fragments to nodes/partitions)
> >>>>>>
> >>>>>> At last step we check that all Fragment sources (a table or result)
> >>> have
> >>>>> the same distribution (in other words all sources have to be
> >>> co-located)
> >>>>>>
> >>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> >>>>> co-located, an Exchange is produced otherwise. But if we force local
> >>>>> execution we cannot produce Exchanges, that means we may face two
> >>>>> non-co-located caches inside a single query fragment (result of local
> >>> query
> >>>>> planning is a single query fragment). So, we cannot pass the check.
> >>>>>>
> >>>>>> Should we throw an exception or omit the check for local query
> >>> planning
> >>>>> or prohibit local queries at all?
> >>>>>>
> >>>>>> Your thoughts?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Igor
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Ivan Pavlukhin
> >>>
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>

Re: Calcite based SQL query engine. Local queries

Reply via email to