Re: Quicksql

Francis Du Mon, 02 Mar 2020 01:26:20 -0800

Hi everyone:

Allow me to introduce my good friend Siyuan Liu, who is the leader of
Quicksql project.


I CC to him and ask him to introduce the project to us.Here is the
documentation link for

Quicksql [1].

[1].  https://quicksql.readthedocs.io/en/latest/

Regards,
Francis

Juan Pan <panj...@apache.org> 于2019年12月23日周一 上午11:44写道：

> Thanks Gelbana,
>
>
> Very appreciated your explanation, which sheds me some light on exploring
> Calcite. :)
>
>
> Best wishes,
> Trista
>
>
>  Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/22/2019 05:58，Muhammad Gelbana<m.gelb...@gmail.com> wrote：
> I am curious how to join the tables from different datasources.
> Based on Calcite's conventions concept, the Join operator and its input
> operators should all have the same convention. If they don't, the
> convention different from the Join operator's convention will have to
> register a converter rule. This rule should produce an operator that only
> converts from that convention to the Join operator's convention.
>
> This way the Join operator will be able to handle the data obtained from
> its input operators because it understands the data structure.
>
> Thanks,
> Gelbana
>
>
> On Wed, Dec 18, 2019 at 5:08 AM Juan Pan <panj...@apache.org> wrote:
>
> Some updates.
>
>
> Recently i took a look at their doc and source code, and found this
> project uses SQL parsing and Relational algebra of Calcite to get query
> plan, and also translates to spark SQL for joining different datasources,
> or corresponding query for single datasource.
>
>
> Although it copies many classes from Calcite, the idea of QuickSQL seems
> some of interests, and code is succinct.
>
>
> Best,
> Trista
>
>
> Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/13/2019 17:16，Juan Pan<panj...@apache.org> wrote：
> Yes, indeed.
>
>
> Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/12/2019 18:00，Alessandro Solimando<alessandro.solima...@gmail.com>
> wrote：
> Adapters must be needed by data sources not supporting SQL, I think this is
> what Juan Pan was asking for.
>
> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan <hy...@apache.org> wrote:
>
> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan" <panj...@apache.org> wrote:
> Hi Haisheng,
>
>
> The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
>
>
> Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
>
>
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
>
>
> Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
>
>
> Give my best wishes to Calcite community.
>
>
> Thanks,
> Trista
>
>
> Juan Pan
>
>
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
>
>
> On 12/11/2019 10:53，Haisheng Yuan<h.y...@alibaba-inc.com> wrote：
> As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Rui Wang<amaliu...@apache.org>
> 日 期：2019年12月11日 06:24:45
> 收件人：<dev@calcite.apache.org>
> 主 题：Re: Quicksql
>
> The co-routine model sounds fitting into Streaming cases well.
>
> I was thinking how should Enumerable interface work with streaming cases
> but now I should also check Interpreter.
>
>
> -Rui
>
> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <jh...@apache.org> wrote:
>
> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
>
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
>
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
>
> Julian
>
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> <zolyfar...@yahoo.com.invalid> wrote:
>
> What is the ultimate goal of the Calcite Interpreter?
>
> To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
> <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
>
>
> —Z
>
> On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote:
>
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized
> views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
>
> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
>
>
>
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
>
>
>
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> .
>
> Julian
>
>
>
> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org>
> wrote:
>
> I recently contacted one of the active contributors asking about the
> purpose of the project and here's his reply:
>
> From my understanding, Quicksql is a data virtualization platform. It
> can
> query multiple data sources altogether and in a distributed way;
> Say, you
> can write a SQL with a MySql table join with an Elasticsearch table.
> Quicksql can recognize that, and then generate Spark code, in which
> it will
> fetch the MySQL/ES data as a temporary table separately, and then
> join them
> in Spark. The execution is in Spark so it is totally distributed.
> The user
> doesn't need to aware of where the table is from.
>
>
> I understand that the Spark convention Calcite has attempts to
> achieve the
> same goal, but it isn't fully implemented yet.
>
>
> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote:
>
> Anyone know anything about Quicksql? It seems to be quite a popular
> project, and they have an internal fork of Calcite.
>
> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
>
>
>
>
>
>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> <
>
>
>
>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>
>
> Julian
>
>
>
>
>
>
>
>
>
>

Re: Quicksql

Reply via email to