Re: Re: Quicksql

Haisheng Yuan Tue, 10 Dec 2019 18:54:24 -0800

As far as I know, users still need to register tables from other data sources 
before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
logical expressions with several transformation rules. The query on different 
data source will then be registered as temp spark tables (with filter or join 
pushed in), the whole query is rewritten as SQL text over these temp tables and 
submitted to Spark.


- Haisheng

------------------------------------------------------------------
发件人：Rui Wang<amaliu...@apache.org>
日　期：2019年12月11日 06:24:45
收件人：<dev@calcite.apache.org>
主　题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <jh...@apache.org> wrote:

> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
>
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
>
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
>
> Julian
>
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> <zolyfar...@yahoo.com.invalid> wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> >
> >
> > —Z
> >
> > > On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> > >
> > > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >.
> > >
> > > Julian
> > >
> > >
> > >
> > >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org>
> wrote:
> > >>
> > >> I recently contacted one of the active contributors asking about the
> > >> purpose of the project and here's his reply:
> > >>
> > >> From my understanding, Quicksql is a data virtualization platform. It
> can
> > >>> query multiple data sources altogether and in a distributed way;
> Say, you
> > >>> can write a SQL with a MySql table join with an Elasticsearch table.
> > >>> Quicksql can recognize that, and then generate Spark code, in which
> it will
> > >>> fetch the MySQL/ES data as a temporary table separately, and then
> join them
> > >>> in Spark. The execution is in Spark so it is totally distributed.
> The user
> > >>> doesn't need to aware of where the table is from.
> > >>>
> > >>
> > >> I understand that the Spark convention Calcite has attempts to
> achieve the
> > >> same goal, but it isn't fully implemented yet.
> > >>
> > >>
> > >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote:
> > >>
> > >>> Anyone know anything about Quicksql? It seems to be quite a popular
> > >>> project, and they have an internal fork of Calcite.
> > >>>
> > >>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> > >>>
> > >>>
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>> <
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>>>
> > >>>
> > >>> Julian
> > >>>
> > >>>
> > >
> >
>

Re: Re: Quicksql

Reply via email to