Re: Quicksql

Haisheng Yuan Wed, 11 Dec 2019 19:05:54 -0800

Nope, it doesn't use any adapters. It just submits partial SQL query to 
different engines.


If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table 
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan" <panj...@apache.org> wrote: 
> Hi Haisheng,
> 
> 
> > The query on different data source will then be registered as temp spark 
> > tables (with filter or join pushed in), the whole query is rewritten as SQL 
> > text over these temp tables and submitted to Spark.
> 
> 
> Does it mean QuickSQL also need adaptors to make query executed on different 
> data source? 
> 
> 
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created 
> > Calcite I was thinking about virtualization + in-memory materialized views. 
> > Not only the Spark convention but any of the “engine” conventions (Drill, 
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> 
> Basically, i like and agree with Julian’s statement. It is a great idea which 
> personally hope Calcite move towards.
> 
> 
> Give my best wishes to Calcite community. 
> 
> 
> Thanks,
> Trista
> 
> 
>  Juan Pan
> 
> 
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
> 
> 
> On 12/11/2019 10:53，Haisheng Yuan<h.y...@alibaba-inc.com> wrote：
> As far as I know, users still need to register tables from other data sources 
> before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
> logical expressions with several transformation rules. The query on different 
> data source will then be registered as temp spark tables (with filter or join 
> pushed in), the whole query is rewritten as SQL text over these temp tables 
> and submitted to Spark.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人：Rui Wang<amaliu...@apache.org>
> 日　期：2019年12月11日 06:24:45
> 收件人：<dev@calcite.apache.org>
> 主　题：Re: Quicksql
> 
> The co-routine model sounds fitting into Streaming cases well.
> 
> I was thinking how should Enumerable interface work with streaming cases
> but now I should also check Interpreter.
> 
> 
> -Rui
> 
> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <jh...@apache.org> wrote:
> 
> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
> 
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
> 
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
> 
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
> 
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
> 
> Julian
> 
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> <zolyfar...@yahoo.com.invalid> wrote:
> 
> What is the ultimate goal of the Calcite Interpreter?
> 
> To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> 
> 
> —Z
> 
> On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> .
> 
> Julian
> 
> 
> 
> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org>
> wrote:
> 
> I recently contacted one of the active contributors asking about the
> purpose of the project and here's his reply:
> 
> From my understanding, Quicksql is a data virtualization platform. It
> can
> query multiple data sources altogether and in a distributed way;
> Say, you
> can write a SQL with a MySql table join with an Elasticsearch table.
> Quicksql can recognize that, and then generate Spark code, in which
> it will
> fetch the MySQL/ES data as a temporary table separately, and then
> join them
> in Spark. The execution is in Spark so it is totally distributed.
> The user
> doesn't need to aware of where the table is from.
> 
> 
> I understand that the Spark convention Calcite has attempts to
> achieve the
> same goal, but it isn't fully implemented yet.
> 
> 
> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote:
> 
> Anyone know anything about Quicksql? It seems to be quite a popular
> project, and they have an internal fork of Calcite.
> 
> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> 
> 
> 
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> <
> 
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> 
> 
> Julian
> 
> 
> 
> 
> 
> 
>

Re: Quicksql

Reply via email to