Re: Quicksql

Juan Pan Fri, 13 Dec 2019 01:24:45 -0800

Thanks for your clarification, Haisheng.

I am curious how to join the tables from different datasources.

Supposing there is tb1 in datasource1 and tb2 in datasource2 and the SQL is
`select tb1.col1, tb2.col2 from tb1, tb2 where tb1.id = tb2.id`, how to join
two of tables together and get the final result?

Juan Pan (Trista)

Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org

On 12/12/2019 11:05，Haisheng Yuan<hy...@apache.org> wrote：
Nope, it doesn't use any adapters. It just submits partial SQL query to
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan" <panj...@apache.org> wrote:
Hi Haisheng,

The query on different data source will then be registered as temp spark tables
(with filter or join pushed in), the whole query is rewritten as SQL text over
these temp tables and submitted to Spark.

Does it mean QuickSQL also need adaptors to make query executed on different
data source?

Yes, virtualization is one of Calcite’s goals. In fact, when I created Calcite
I was thinking about virtualization + in-memory materialized views. Not only
the Spark convention but any of the “engine” conventions (Drill, Flink, Beam,
Enumerable) could be used to create a virtual query engine.

Basically, i like and agree with Julian’s statement. It is a great idea which
personally hope Calcite move towards.

Give my best wishes to Calcite community.

Thanks,
Trista

Juan Pan

panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere

On 12/11/2019 10:53，Haisheng Yuan<h.y...@alibaba-inc.com> wrote：
As far as I know, users still need to register tables from other data sources
before querying it. QuickSQL uses Calcite for parsing queries and optimizing
logical expressions with several transformation rules. The query on different
data source will then be registered as temp spark tables (with filter or join
pushed in), the whole query is rewritten as SQL text over these temp tables and
submitted to Spark.

- Haisheng

------------------------------------------------------------------
发件人：Rui Wang<amaliu...@apache.org>
日 期：2019年12月11日 06:24:45
收件人：<dev@calcite.apache.org>
主 题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.

-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <jh...@apache.org> wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
<zolyfar...@yahoo.com.invalid> wrote:

What is the ultimate goal of the Calcite Interpreter?

To provide some context, I have been playing around with calcite + REST
(see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
detail of my experiments)

—Z

On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote:

Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.

See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
<
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
.

Julian

On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org>
wrote:

I recently contacted one of the active contributors asking about the
purpose of the project and here's his reply:

From my understanding, Quicksql is a data virtualization platform. It
can
query multiple data sources altogether and in a distributed way;
Say, you
can write a SQL with a MySql table join with an Elasticsearch table.
Quicksql can recognize that, and then generate Spark code, in which
it will
fetch the MySQL/ES data as a temporary table separately, and then
join them
in Spark. The execution is in Spark so it is totally distributed.
The user
doesn't need to aware of where the table is from.

I understand that the Spark convention Calcite has attempts to
achieve the
same goal, but it isn't fully implemented yet.

On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote:

Anyone know anything about Quicksql? It seems to be quite a popular
project, and they have an internal fork of Calcite.

https://github.com/Qihoo360/ <https://github.com/Qihoo360/>

https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
<

https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite

Julian

Re: Quicksql

Reply via email to