subject:"Quicksql"

Re: Quicksql

2020-03-04 Thread Chunwei Lei

Thank you for your introduction, Siyuan.

Quicksql is an interesting project and full of potential. I'd like to learn
more about it.


Best,
Chunwei


On Tue, Mar 3, 2020 at 3:26 AM Siyuan Liu  wrote:

> Hi, everyone:
>
> Glad to see a lot of old friends here. Quicksql is a project born in early
> 2019. It was designed to solve the problem of long and complex work flow in
> the big data field with many data sources, many compute engines, and many
> types of syntax. The core idea is `Connect All Data Sources with One Extra
> Parsing Cost`.
>
> Because it involves standard SQL parsing, we finally chose Calcite as the
> parsing engine that has the best SQL compatibility. Thanks to the excellent
> architecture and toolkits provided by Calcite, Quicksql has made some
> extensions on this basis and made more logical plans Rich definitions
> enable single data source and multi-source queries to be described. For
> single data sources, an end-to-end connection query is directly
> established, and for multiple data sources, logical plans are divided and
> pushed down, final interpreted as the code of the compute engine (such as
> Spark, Flink) with distributed computing capabilities for data merge.
>
> Based on this design, Quicksql makes extensive use of the ability of
> Calcite Adapter \ Dialect \ UDF to provide syntax adaptation compatibility
> for various data sources and compute engines, and also uses Avatica as a
> JDBC protocol. We are very grateful for the excellent artwork provided by
> the Calcite community.
>
> At the beginning of the project, Quicksql was confused about the
> application areas. After one year of polishing, Quicksql has successfully
> applied two areas:
> 1. Interactive Query Engine: Provides big data interactive query and BI
> analysis with standard SQL syntax, and response time is in seconds to
> minutes.
> 2. ETL Compute Engine: SQL-based ETL for multi-data source, which can use
> optimization capabilities of SQL for data cleaning \ transformation \ join,
> etc.
> In the future, we will also focus on dynamic engine selection, so that
> engines such as Hive, Spark, and Presto can run more suitable SQL.
>
> Looking forward to working with the Calcite community to do some
> interesting things and explore the unlimited possibilities of SQL
>
> Siyuan Liu
>
> On Mon, Mar 2, 2020 at 3:45 PM Francis Du  wrote:
>
> > Hi everyone:
> >
> > Allow me to introduce my good friend Siyuan Liu, who is the leader of
> > Quicksql project.
> >
> > I CC to him and ask him to introduce the project to us.Here is the
> > documentation link for
> >
> > Quicksql [1].
> >
> > [1].  https://quicksql.readthedocs.io/en/latest/
> >
> > Regards,
> > Francis
> >
> > Juan Pan  于2019年12月23日周一 上午11:44写道：
> >
> >> Thanks Gelbana,
> >>
> >>
> >> Very appreciated your explanation, which sheds me some light on
> exploring
> >> Calcite. :)
> >>
> >>
> >> Best wishes,
> >> Trista
> >>
> >>
> >>  Juan Pan (Trista)
> >>
> >> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> >> E-mail: panj...@apache.org
> >>
> >>
> >>
> >>
> >> On 12/22/2019 05:58，Muhammad Gelbana wrote：
> >> I am curious how to join the tables from different datasources.
> >> Based on Calcite's conventions concept, the Join operator and its input
> >> operators should all have the same convention. If they don't, the
> >> convention different from the Join operator's convention will have to
> >> register a converter rule. This rule should produce an operator that
> only
> >> converts from that convention to the Join operator's convention.
> >>
> >> This way the Join operator will be able to handle the data obtained from
> >> its input operators because it understands the data structure.
> >>
> >> Thanks,
> >> Gelbana
> >>
> >>
> >> On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:
> >>
> >> Some updates.
> >>
> >>
> >> Recently i took a look at their doc and source code, and found this
> >> project uses SQL parsing and Relational algebra of Calcite to get query
> >> plan, and also translates to spark SQL for joining different
> datasources,
> >> or corresponding query for single datasource.
> >>
> >>
> >> Although it copies many classes from Calcite, the idea of QuickSQL seems
> >> some of interests, and code is succinct.
> >>
> >>
> >> Best,
> >> Trista
>

Re: Quicksql

2020-03-02 Thread Siyuan Liu

Hi, everyone:

Glad to see a lot of old friends here. Quicksql is a project born in early
2019. It was designed to solve the problem of long and complex work flow in
the big data field with many data sources, many compute engines, and many
types of syntax. The core idea is `Connect All Data Sources with One Extra
Parsing Cost`.

Because it involves standard SQL parsing, we finally chose Calcite as the
parsing engine that has the best SQL compatibility. Thanks to the excellent
architecture and toolkits provided by Calcite, Quicksql has made some
extensions on this basis and made more logical plans Rich definitions
enable single data source and multi-source queries to be described. For
single data sources, an end-to-end connection query is directly
established, and for multiple data sources, logical plans are divided and
pushed down, final interpreted as the code of the compute engine (such as
Spark, Flink) with distributed computing capabilities for data merge.

Based on this design, Quicksql makes extensive use of the ability of
Calcite Adapter \ Dialect \ UDF to provide syntax adaptation compatibility
for various data sources and compute engines, and also uses Avatica as a
JDBC protocol. We are very grateful for the excellent artwork provided by
the Calcite community.

At the beginning of the project, Quicksql was confused about the
application areas. After one year of polishing, Quicksql has successfully
applied two areas:
1. Interactive Query Engine: Provides big data interactive query and BI
analysis with standard SQL syntax, and response time is in seconds to
minutes.
2. ETL Compute Engine: SQL-based ETL for multi-data source, which can use
optimization capabilities of SQL for data cleaning \ transformation \ join,
etc.
In the future, we will also focus on dynamic engine selection, so that
engines such as Hive, Spark, and Presto can run more suitable SQL.

Looking forward to working with the Calcite community to do some
interesting things and explore the unlimited possibilities of SQL

Siyuan Liu

On Mon, Mar 2, 2020 at 3:45 PM Francis Du  wrote:

> Hi everyone:
>
> Allow me to introduce my good friend Siyuan Liu, who is the leader of
> Quicksql project.
>
> I CC to him and ask him to introduce the project to us.Here is the
> documentation link for
>
> Quicksql [1].
>
> [1].  https://quicksql.readthedocs.io/en/latest/
>
> Regards,
> Francis
>
> Juan Pan  于2019年12月23日周一 上午11:44写道：
>
>> Thanks Gelbana,
>>
>>
>> Very appreciated your explanation, which sheds me some light on exploring
>> Calcite. :)
>>
>>
>> Best wishes,
>> Trista
>>
>>
>>  Juan Pan (Trista)
>>
>> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
>> E-mail: panj...@apache.org
>>
>>
>>
>>
>> On 12/22/2019 05:58，Muhammad Gelbana wrote：
>> I am curious how to join the tables from different datasources.
>> Based on Calcite's conventions concept, the Join operator and its input
>> operators should all have the same convention. If they don't, the
>> convention different from the Join operator's convention will have to
>> register a converter rule. This rule should produce an operator that only
>> converts from that convention to the Join operator's convention.
>>
>> This way the Join operator will be able to handle the data obtained from
>> its input operators because it understands the data structure.
>>
>> Thanks,
>> Gelbana
>>
>>
>> On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:
>>
>> Some updates.
>>
>>
>> Recently i took a look at their doc and source code, and found this
>> project uses SQL parsing and Relational algebra of Calcite to get query
>> plan, and also translates to spark SQL for joining different datasources,
>> or corresponding query for single datasource.
>>
>>
>> Although it copies many classes from Calcite, the idea of QuickSQL seems
>> some of interests, and code is succinct.
>>
>>
>> Best,
>> Trista
>>
>>
>> Juan Pan (Trista)
>>
>> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
>> E-mail: panj...@apache.org
>>
>>
>>
>>
>> On 12/13/2019 17:16，Juan Pan wrote：
>> Yes, indeed.
>>
>>
>> Juan Pan (Trista)
>>
>> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
>> E-mail: panj...@apache.org
>>
>>
>>
>>
>> On 12/12/2019 18:00，Alessandro Solimando
>> wrote：
>> Adapters must be needed by data sources not supporting SQL, I think this
>> is
>> what Juan Pan was asking for.
>>
>> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:
>>
>> Nope, it doesn't use any

Re: Quicksql

2020-03-02 Thread Francis Du

Hi everyone:

Allow me to introduce my good friend Siyuan Liu, who is the leader of
Quicksql project.

I CC to him and ask him to introduce the project to us.Here is the
documentation link for

Quicksql [1].

[1].  https://quicksql.readthedocs.io/en/latest/

Regards,
Francis

Juan Pan  于2019年12月23日周一 上午11:44写道：

> Thanks Gelbana,
>
>
> Very appreciated your explanation, which sheds me some light on exploring
> Calcite. :)
>
>
> Best wishes,
> Trista
>
>
>  Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/22/2019 05:58，Muhammad Gelbana wrote：
> I am curious how to join the tables from different datasources.
> Based on Calcite's conventions concept, the Join operator and its input
> operators should all have the same convention. If they don't, the
> convention different from the Join operator's convention will have to
> register a converter rule. This rule should produce an operator that only
> converts from that convention to the Join operator's convention.
>
> This way the Join operator will be able to handle the data obtained from
> its input operators because it understands the data structure.
>
> Thanks,
> Gelbana
>
>
> On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:
>
> Some updates.
>
>
> Recently i took a look at their doc and source code, and found this
> project uses SQL parsing and Relational algebra of Calcite to get query
> plan, and also translates to spark SQL for joining different datasources,
> or corresponding query for single datasource.
>
>
> Although it copies many classes from Calcite, the idea of QuickSQL seems
> some of interests, and code is succinct.
>
>
> Best,
> Trista
>
>
> Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/13/2019 17:16，Juan Pan wrote：
> Yes, indeed.
>
>
> Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/12/2019 18:00，Alessandro Solimando
> wrote：
> Adapters must be needed by data sources not supporting SQL, I think this is
> what Juan Pan was asking for.
>
> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:
>
> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan"  wrote:
> Hi Haisheng,
>
>
> The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
>
>
> Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
>
>
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
>
>
> Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
>
>
> Give my best wishes to Calcite community.
>
>
> Thanks,
> Trista
>
>
> Juan Pan
>
>
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
>
>
> On 12/11/2019 10:53，Haisheng Yuan wrote：
> As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
>
> - Haisheng
>
> --
> 发件人：Rui Wang
> 日 期：2019年12月11日 06:24:45
> 收件人：
> 主 题：Re: Quicksql
>
> The co-routine model sounds fitting into Streaming cases well.
>
> I was thinking how shoul

Re: Quicksql

2019-12-22 Thread Juan Pan

Thanks Gelbana,


Very appreciated your explanation, which sheds me some light on exploring 
Calcite. :)


Best wishes,
Trista


 Juan Pan (Trista) 
 
Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/22/2019 05:58，Muhammad Gelbana wrote：
I am curious how to join the tables from different datasources.
Based on Calcite's conventions concept, the Join operator and its input
operators should all have the same convention. If they don't, the
convention different from the Join operator's convention will have to
register a converter rule. This rule should produce an operator that only
converts from that convention to the Join operator's convention.

This way the Join operator will be able to handle the data obtained from
its input operators because it understands the data structure.

Thanks,
Gelbana


On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:

Some updates.


Recently i took a look at their doc and source code, and found this
project uses SQL parsing and Relational algebra of Calcite to get query
plan, and also translates to spark SQL for joining different datasources,
or corresponding query for single datasource.


Although it copies many classes from Calcite, the idea of QuickSQL seems
some of interests, and code is succinct.


Best,
Trista


Juan Pan (Trista)

Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/13/2019 17:16，Juan Pan wrote：
Yes, indeed.


Juan Pan (Trista)

Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/12/2019 18:00，Alessandro Solimando
wrote：
Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.

On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:

Nope, it doesn't use any adapters. It just submits partial SQL query to
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from
mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote:
Hi Haisheng,


The query on different data source will then be registered as temp
spark tables (with filter or join pushed in), the whole query is rewritten
as SQL text over these temp tables and submitted to Spark.


Does it mean QuickSQL also need adaptors to make query executed on
different data source?


Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.


Basically, i like and agree with Julian’s statement. It is a great idea
which personally hope Calcite move towards.


Give my best wishes to Calcite community.


Thanks,
Trista


Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 12/11/2019 10:53，Haisheng Yuan wrote：
As far as I know, users still need to register tables from other data
sources before querying it. QuickSQL uses Calcite for parsing queries and
optimizing logical expressions with several transformation rules. The query
on different data source will then be registered as temp spark tables (with
filter or join pushed in), the whole query is rewritten as SQL text over
these temp tables and submitted to Spark.

- Haisheng

--
发件人：Rui Wang
日 期：2019年12月11日 06:24:45
收件人：
主 题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which

Re: Quicksql

2019-12-21 Thread Muhammad Gelbana

> I am curious how to join the tables from different datasources.
Based on Calcite's conventions concept, the Join operator and its input
operators should all have the same convention. If they don't, the
convention different from the Join operator's convention will have to
register a converter rule. This rule should produce an operator that only
converts from that convention to the Join operator's convention.

This way the Join operator will be able to handle the data obtained from
its input operators because it understands the data structure.

Thanks,
Gelbana


On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:

> Some updates.
>
>
> Recently i took a look at their doc and source code, and found this
> project uses SQL parsing and Relational algebra of Calcite to get query
> plan, and also translates to spark SQL for joining different datasources,
> or corresponding query for single datasource.
>
>
> Although it copies many classes from Calcite, the idea of QuickSQL seems
> some of interests, and code is succinct.
>
>
> Best,
> Trista
>
>
>  Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/13/2019 17:16，Juan Pan wrote：
> Yes, indeed.
>
>
> Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/12/2019 18:00，Alessandro Solimando
> wrote：
> Adapters must be needed by data sources not supporting SQL, I think this is
> what Juan Pan was asking for.
>
> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:
>
> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan"  wrote:
> Hi Haisheng,
>
>
> The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
>
>
> Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
>
>
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
>
>
> Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
>
>
> Give my best wishes to Calcite community.
>
>
> Thanks,
> Trista
>
>
> Juan Pan
>
>
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
>
>
> On 12/11/2019 10:53，Haisheng Yuan wrote：
> As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
>
> - Haisheng
>
> --
> 发件人：Rui Wang
> 日 期：2019年12月11日 06:24:45
> 收件人：
> 主 题：Re: Quicksql
>
> The co-routine model sounds fitting into Streaming cases well.
>
> I was thinking how should Enumerable interface work with streaming cases
> but now I should also check Interpreter.
>
>
> -Rui
>
> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:
>
> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and

Re: Quicksql

2019-12-17 Thread Juan Pan

Some updates.


Recently i took a look at their doc and source code, and found this project 
uses SQL parsing and Relational algebra of Calcite to get query plan, and also 
translates to spark SQL for joining different datasources, or corresponding 
query for single datasource.


Although it copies many classes from Calcite, the idea of QuickSQL seems some 
of interests, and code is succinct.


Best,
Trista 


 Juan Pan (Trista) 
 
Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/13/2019 17:16，Juan Pan wrote：
Yes, indeed.


Juan Pan (Trista)

Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/12/2019 18:00，Alessandro Solimando wrote：
Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.

On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:

Nope, it doesn't use any adapters. It just submits partial SQL query to
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from
mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote:
Hi Haisheng,


The query on different data source will then be registered as temp
spark tables (with filter or join pushed in), the whole query is rewritten
as SQL text over these temp tables and submitted to Spark.


Does it mean QuickSQL also need adaptors to make query executed on
different data source?


Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.


Basically, i like and agree with Julian’s statement. It is a great idea
which personally hope Calcite move towards.


Give my best wishes to Calcite community.


Thanks,
Trista


Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 12/11/2019 10:53，Haisheng Yuan wrote：
As far as I know, users still need to register tables from other data
sources before querying it. QuickSQL uses Calcite for parsing queries and
optimizing logical expressions with several transformation rules. The query
on different data source will then be registered as temp spark tables (with
filter or join pushed in), the whole query is rewritten as SQL text over
these temp tables and submitted to Spark.

- Haisheng

--
发件人：Rui Wang
日 期：2019年12月11日 06:24:45
收件人：
主 题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
 wrote:

What is the ultimate goal of the Calcite Interpreter?

To provide some context, I have been playing around with calcite + REST
(see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
<
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
detail of my experiments)


—Z

On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:

Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized
views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink,

Re: Quicksql

2019-12-13 Thread Juan Pan

Thanks for your clarification, Haisheng.


I am curious how to join the tables from different datasources. 


Supposing there is tb1 in datasource1 and tb2 in datasource2 and the SQL is 
`select tb1.col1, tb2.col2 from tb1, tb2 where tb1.id = tb2.id`, how to join 
two of tables together and get the final result?


 Juan Pan (Trista) 
 
Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/12/2019 11:05，Haisheng Yuan wrote：
Nope, it doesn't use any adapters. It just submits partial SQL query to 
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote:
Hi Haisheng,


The query on different data source will then be registered as temp spark tables 
(with filter or join pushed in), the whole query is rewritten as SQL text over 
these temp tables and submitted to Spark.


Does it mean QuickSQL also need adaptors to make query executed on different 
data source?


Yes, virtualization is one of Calcite’s goals. In fact, when I created Calcite 
I was thinking about virtualization + in-memory materialized views. Not only 
the Spark convention but any of the “engine” conventions (Drill, Flink, Beam, 
Enumerable) could be used to create a virtual query engine.


Basically, i like and agree with Julian’s statement. It is a great idea which 
personally hope Calcite move towards.


Give my best wishes to Calcite community.


Thanks,
Trista


Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 12/11/2019 10:53，Haisheng Yuan wrote：
As far as I know, users still need to register tables from other data sources 
before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
logical expressions with several transformation rules. The query on different 
data source will then be registered as temp spark tables (with filter or join 
pushed in), the whole query is rewritten as SQL text over these temp tables and 
submitted to Spark.

- Haisheng

--
发件人：Rui Wang
日　期：2019年12月11日 06:24:45
收件人：
主　题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
 wrote:

What is the ultimate goal of the Calcite Interpreter?

To provide some context, I have been playing around with calcite + REST
(see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
detail of my experiments)


—Z

On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:

Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.

See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
<
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
.

Julian



On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana 
wrote:

I recently contacted one of the active co

Re: Quicksql

2019-12-13 Thread Juan Pan

Yes, indeed.


 Juan Pan (Trista) 
 
Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/12/2019 18:00，Alessandro Solimando wrote：
Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.

On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:

Nope, it doesn't use any adapters. It just submits partial SQL query to
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from
mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote:
Hi Haisheng,


The query on different data source will then be registered as temp
spark tables (with filter or join pushed in), the whole query is rewritten
as SQL text over these temp tables and submitted to Spark.


Does it mean QuickSQL also need adaptors to make query executed on
different data source?


Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.


Basically, i like and agree with Julian’s statement. It is a great idea
which personally hope Calcite move towards.


Give my best wishes to Calcite community.


Thanks,
Trista


Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 12/11/2019 10:53，Haisheng Yuan wrote：
As far as I know, users still need to register tables from other data
sources before querying it. QuickSQL uses Calcite for parsing queries and
optimizing logical expressions with several transformation rules. The query
on different data source will then be registered as temp spark tables (with
filter or join pushed in), the whole query is rewritten as SQL text over
these temp tables and submitted to Spark.

- Haisheng

--
发件人：Rui Wang
日 期：2019年12月11日 06:24:45
收件人：
主 题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
 wrote:

What is the ultimate goal of the Calcite Interpreter?

To provide some context, I have been playing around with calcite + REST
(see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
<
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
detail of my experiments)


—Z

On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:

Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized
views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.

See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)

https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
<

https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
.

Julian



On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana 
wrote:

I recently contacted one of the active contributors asking about the
purpose of the project and here's his reply:

From my understanding, Quicksql is a data virtualization platform. It
can
quer

Re: Quicksql

2019-12-12 Thread Alessandro Solimando

Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.

On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:

> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan"  wrote:
> > Hi Haisheng,
> >
> >
> > > The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
> >
> >
> > Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
> >
> >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> >
> > Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
> >
> >
> > Give my best wishes to Calcite community.
> >
> >
> > Thanks,
> > Trista
> >
> >
> >  Juan Pan
> >
> >
> > panj...@apache.org
> > Juan Pan(Trista), Apache ShardingSphere
> >
> >
> > On 12/11/2019 10:53，Haisheng Yuan wrote：
> > As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
> >
> > - Haisheng
> >
> > --
> > 发件人：Rui Wang
> > 日 期：2019年12月11日 06:24:45
> > 收件人：
> > 主 题：Re: Quicksql
> >
> > The co-routine model sounds fitting into Streaming cases well.
> >
> > I was thinking how should Enumerable interface work with streaming cases
> > but now I should also check Interpreter.
> >
> >
> > -Rui
> >
> > On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:
> >
> > The goal (or rather my goal) for the interpreter is to replace
> > Enumerable as the quick, easy default convention.
> >
> > Enumerable is efficient but not that efficient (compared to engines
> > that work on off-heap data representing batches of records). And
> > because it generates java byte code there is a certain latency to
> > getting a query prepared and ready to run.
> >
> > It basically implements the old Volcano query evaluation model. It is
> > single-threaded (because all work happens as a result of a call to
> > 'next()' on the root node) and cannot handle branching data-flow
> > graphs (DAGs).
> >
> > The Interpreter operates uses a co-routine model (reading from queues,
> > writing to queues, and yielding when there is no work to be done) and
> > therefore could be more efficient than enumerable in a single-node
> > multi-core system. Also, there is little start-up time, which is
> > important for small queries.
> >
> > I would love to add another built-in convention that uses Arrow as
> > data format and generates co-routines for each operator. Those
> > co-routines could be deployed in a parallel and/or distributed data
> > engine.
> >
> > Julian
> >
> > On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> >  wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> > (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
> <
> > https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> > detail of my experiments)
> >
> >
> > —Z
> >
> > On Dec 9, 20

Re: Quicksql

2019-12-11 Thread Haisheng Yuan

Nope, it doesn't use any adapters. It just submits partial SQL query to 
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table 
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote: 
> Hi Haisheng,
> 
> 
> > The query on different data source will then be registered as temp spark 
> > tables (with filter or join pushed in), the whole query is rewritten as SQL 
> > text over these temp tables and submitted to Spark.
> 
> 
> Does it mean QuickSQL also need adaptors to make query executed on different 
> data source? 
> 
> 
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created 
> > Calcite I was thinking about virtualization + in-memory materialized views. 
> > Not only the Spark convention but any of the “engine” conventions (Drill, 
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> 
> Basically, i like and agree with Julian’s statement. It is a great idea which 
> personally hope Calcite move towards.
> 
> 
> Give my best wishes to Calcite community. 
> 
> 
> Thanks,
> Trista
> 
> 
>  Juan Pan
> 
> 
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
> 
> 
> On 12/11/2019 10:53，Haisheng Yuan wrote：
> As far as I know, users still need to register tables from other data sources 
> before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
> logical expressions with several transformation rules. The query on different 
> data source will then be registered as temp spark tables (with filter or join 
> pushed in), the whole query is rewritten as SQL text over these temp tables 
> and submitted to Spark.
> 
> - Haisheng
> 
> --
> 发件人：Rui Wang
> 日　期：2019年12月11日 06:24:45
> 收件人：
> 主　题：Re: Quicksql
> 
> The co-routine model sounds fitting into Streaming cases well.
> 
> I was thinking how should Enumerable interface work with streaming cases
> but now I should also check Interpreter.
> 
> 
> -Rui
> 
> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:
> 
> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
> 
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
> 
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
> 
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
> 
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
> 
> Julian
> 
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
>  wrote:
> 
> What is the ultimate goal of the Calcite Interpreter?
> 
> To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> 
> 
> —Z
> 
> On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> 
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
>

> The query on different data source will then be registered as temp spark
> tables (with filter or join pushed in), the whole query is rewritten as SQL
> text over these temp tables and submitted to Spark.

Does it mean QuickSQL also need adaptors to make query executed on different
data source?

> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.

Basically, i like and agree with Julian’s statement. It is a great idea which
personally hope Calcite move towards.

Give my best wishes to Calcite community.

Thanks,
Trista

Juan Pan

panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere

On 12/11/2019 10:53，Haisheng Yuan wrote：
As far as I know, users still need to register tables from other data sources
before querying it. QuickSQL uses Calcite for parsing queries and optimizing
logical expressions with several transformation rules. The query on different
data source will then be registered as temp spark tables (with filter or join
pushed in), the whole query is rewritten as SQL text over these temp tables and
submitted to Spark.

- Haisheng

--
发件人：Rui Wang
日 期：2019年12月11日 06:24:45
收件人：
主 题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.

-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
wrote:

What is the ultimate goal of the Calcite Interpreter?

To provide some context, I have been playing around with calcite + REST
(see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
detail of my experiments)

—Z

On Dec 9, 2019, at 9:05 PM, Julian Hyde wrote:

Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.

See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
<
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
.

Julian

On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana
wrote:

I recently contacted one of the active contributors asking about the
purpose of the project and here's his reply:

From my understanding, Quicksql is a data virtualization platform. It
can
query multiple data sources altogether and in a distributed way;
Say, you
can write a SQL with a MySql table join with an Elasticsearch table.
Quicksql can recognize that, and then generate Spark code, in which
it will
fetch the MySQL/ES data as a temporary table separately, and then
join them
in Spark. The execution is in Spark so it is totally distributed.
The user
doesn't need to aware of where the table is from.

I understand that the Spark convention Calcite has attempts to
achieve the
same goal, but it isn't fully implemented yet.

On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde wrote:

Anyone know anything about Quicksql? It seems to be quite a popular
project, and they have an internal fork of Calcite.

https://github.com/Qihoo360/ <https://github.com/Qihoo360/>

https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
<

https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite

Julian

Re: Re: Quicksql

2019-12-10 Thread Haisheng Yuan

As far as I know, users still need to register tables from other data sources 
before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
logical expressions with several transformation rules. The query on different 
data source will then be registered as temp spark tables (with filter or join 
pushed in), the whole query is rewritten as SQL text over these temp tables and 
submitted to Spark.

- Haisheng

--
发件人：Rui Wang
日　期：2019年12月11日 06:24:45
收件人：
主　题：Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
>
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
>
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
>
> Julian
>
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
>  wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> >
> >
> > —Z
> >
> > > On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> > >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> > >
> > > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >.
> > >
> > > Julian
> > >
> > >
> > >
> > >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana 
> wrote:
> > >>
> > >> I recently contacted one of the active contributors asking about the
> > >> purpose of the project and here's his reply:
> > >>
> > >> From my understanding, Quicksql is a data virtualization platform. It
> can
> > >>> query multiple data sources altogether and in a distributed way;
> Say, you
> > >>> can write a SQL with a MySql table join with an Elasticsearch table.
> > >>> Quicksql can recognize that, and then generate Spark code, in which
> it will
> > >>> fetch the MySQL/ES data as a temporary table separately, and then
> join them
> > >>> in Spark. The execution is in Spark so it is totally distributed.
> The user
> > >>> doesn't need to aware of where the table is from.
> > >>>
> > >>
> > >> I understand that the Spark convention Calcite has attempts to
> achieve the
> > >> same goal, but it isn't fully implemented yet.
> > >>
> > >>
> > >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> > >>
> > >>> Anyone know anything about Quicksql? It seems to be quite a popular
> > >>> project, and they have an internal fork of Calcite.
> > >>>
> > >>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> > >>>
> > >>>
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>> <
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>>>
> > >>>
> > >>> Julian
> > >>>
> > >>>
> > >
> >
>

Re: Quicksql

2019-12-10 Thread Rui Wang

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
>
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
>
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
>
> Julian
>
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
>  wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> >
> >
> > —Z
> >
> > > On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> > >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> > >
> > > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >.
> > >
> > > Julian
> > >
> > >
> > >
> > >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana 
> wrote:
> > >>
> > >> I recently contacted one of the active contributors asking about the
> > >> purpose of the project and here's his reply:
> > >>
> > >> From my understanding, Quicksql is a data virtualization platform. It
> can
> > >>> query multiple data sources altogether and in a distributed way;
> Say, you
> > >>> can write a SQL with a MySql table join with an Elasticsearch table.
> > >>> Quicksql can recognize that, and then generate Spark code, in which
> it will
> > >>> fetch the MySQL/ES data as a temporary table separately, and then
> join them
> > >>> in Spark. The execution is in Spark so it is totally distributed.
> The user
> > >>> doesn't need to aware of where the table is from.
> > >>>
> > >>
> > >> I understand that the Spark convention Calcite has attempts to
> achieve the
> > >> same goal, but it isn't fully implemented yet.
> > >>
> > >>
> > >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> > >>
> > >>> Anyone know anything about Quicksql? It seems to be quite a popular
> > >>> project, and they have an internal fork of Calcite.
> > >>>
> > >>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> > >>>
> > >>>
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>> <
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>>>
> > >>>
> > >>> Julian
> > >>>
> > >>>
> > >
> >
>

Re: Quicksql

2019-12-10 Thread Julian Hyde

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
 wrote:
>
> What is the ultimate goal of the Calcite Interpreter?
>
> To provide some context, I have been playing around with calcite + REST (see 
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest 
> <https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for 
> detail of my experiments)
>
>
> —Z
>
> > On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> >
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created 
> > Calcite I was thinking about virtualization + in-memory materialized views. 
> > Not only the Spark convention but any of the “engine” conventions (Drill, 
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite) 
> > https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >  
> > <https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework>.
> >
> > Julian
> >
> >
> >
> >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana  wrote:
> >>
> >> I recently contacted one of the active contributors asking about the
> >> purpose of the project and here's his reply:
> >>
> >> From my understanding, Quicksql is a data virtualization platform. It can
> >>> query multiple data sources altogether and in a distributed way; Say, you
> >>> can write a SQL with a MySql table join with an Elasticsearch table.
> >>> Quicksql can recognize that, and then generate Spark code, in which it 
> >>> will
> >>> fetch the MySQL/ES data as a temporary table separately, and then join 
> >>> them
> >>> in Spark. The execution is in Spark so it is totally distributed. The user
> >>> doesn't need to aware of where the table is from.
> >>>
> >>
> >> I understand that the Spark convention Calcite has attempts to achieve the
> >> same goal, but it isn't fully implemented yet.
> >>
> >>
> >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> >>
> >>> Anyone know anything about Quicksql? It seems to be quite a popular
> >>> project, and they have an internal fork of Calcite.
> >>>
> >>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> >>>
> >>>
> >>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >>> <
> >>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >>>>
> >>>
> >>> Julian
> >>>
> >>>
> >
>

Re: Quicksql

2019-12-10 Thread Zoltan Farkas

What is the ultimate goal of the Calcite Interpreter? 

To provide some context, I have been playing around with calcite + REST (see 
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest 
<https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for 
detail of my experiments)


—Z

> On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> 
> Yes, virtualization is one of Calcite’s goals. In fact, when I created 
> Calcite I was thinking about virtualization + in-memory materialized views. 
> Not only the Spark convention but any of the “engine” conventions (Drill, 
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite) 
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
>  
> <https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework>.
> 
> Julian
> 
> 
> 
>> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana  wrote:
>> 
>> I recently contacted one of the active contributors asking about the
>> purpose of the project and here's his reply:
>> 
>> From my understanding, Quicksql is a data virtualization platform. It can
>>> query multiple data sources altogether and in a distributed way; Say, you
>>> can write a SQL with a MySql table join with an Elasticsearch table.
>>> Quicksql can recognize that, and then generate Spark code, in which it will
>>> fetch the MySQL/ES data as a temporary table separately, and then join them
>>> in Spark. The execution is in Spark so it is totally distributed. The user
>>> doesn't need to aware of where the table is from.
>>> 
>> 
>> I understand that the Spark convention Calcite has attempts to achieve the
>> same goal, but it isn't fully implemented yet.
>> 
>> 
>> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
>> 
>>> Anyone know anything about Quicksql? It seems to be quite a popular
>>> project, and they have an internal fork of Calcite.
>>> 
>>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
>>> 
>>> 
>>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>>> <
>>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>>>> 
>>> 
>>> Julian
>>> 
>>> 
>

Re: Quicksql

2019-12-10 Thread Stamatis Zampetakis

The guys at Contiamo [1] are doing similar stuff. I added the nice talk [2]
by Chris in ApacheCon to our website.

[1] https://www.contiamo.com/
[2] https://youtu.be/4JAOkLKrcYE

On Tue, Dec 10, 2019 at 3:05 AM Julian Hyde  wrote:

> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
>
> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >.
>
> Julian
>
>
>
> > On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana 
> wrote:
> >
> > I recently contacted one of the active contributors asking about the
> > purpose of the project and here's his reply:
> >
> > From my understanding, Quicksql is a data virtualization platform. It can
> >> query multiple data sources altogether and in a distributed way; Say,
> you
> >> can write a SQL with a MySql table join with an Elasticsearch table.
> >> Quicksql can recognize that, and then generate Spark code, in which it
> will
> >> fetch the MySQL/ES data as a temporary table separately, and then join
> them
> >> in Spark. The execution is in Spark so it is totally distributed. The
> user
> >> doesn't need to aware of where the table is from.
> >>
> >
> > I understand that the Spark convention Calcite has attempts to achieve
> the
> > same goal, but it isn't fully implemented yet.
> >
> >
> > On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> >
> >> Anyone know anything about Quicksql? It seems to be quite a popular
> >> project, and they have an internal fork of Calcite.
> >>
> >> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> >>
> >>
> >>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >> <
> >>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >>>
> >>
> >> Julian
> >>
> >>
>
>

Re: Quicksql

2019-12-09 Thread Julian Hyde

Yes, virtualization is one of Calcite’s goals. In fact, when I created Calcite 
I was thinking about virtualization + in-memory materialized views. Not only 
the Spark convention but any of the “engine” conventions (Drill, Flink, Beam, 
Enumerable) could be used to create a virtual query engine.

See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite) 
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework 
<https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework>.

Julian



> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana  wrote:
> 
> I recently contacted one of the active contributors asking about the
> purpose of the project and here's his reply:
> 
> From my understanding, Quicksql is a data virtualization platform. It can
>> query multiple data sources altogether and in a distributed way; Say, you
>> can write a SQL with a MySql table join with an Elasticsearch table.
>> Quicksql can recognize that, and then generate Spark code, in which it will
>> fetch the MySQL/ES data as a temporary table separately, and then join them
>> in Spark. The execution is in Spark so it is totally distributed. The user
>> doesn't need to aware of where the table is from.
>> 
> 
> I understand that the Spark convention Calcite has attempts to achieve the
> same goal, but it isn't fully implemented yet.
> 
> 
> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> 
>> Anyone know anything about Quicksql? It seems to be quite a popular
>> project, and they have an internal fork of Calcite.
>> 
>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
>> 
>> 
>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>> <
>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>>> 
>> 
>> Julian
>> 
>>

Re: Quicksql

2019-12-09 Thread Muhammad Gelbana

I recently contacted one of the active contributors asking about the
purpose of the project and here's his reply:

>From my understanding, Quicksql is a data virtualization platform. It can
> query multiple data sources altogether and in a distributed way; Say, you
> can write a SQL with a MySql table join with an Elasticsearch table.
> Quicksql can recognize that, and then generate Spark code, in which it will
> fetch the MySQL/ES data as a temporary table separately, and then join them
> in Spark. The execution is in Spark so it is totally distributed. The user
> doesn't need to aware of where the table is from.
>

I understand that the Spark convention Calcite has attempts to achieve the
same goal, but it isn't fully implemented yet.


On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:

> Anyone know anything about Quicksql? It seems to be quite a popular
> project, and they have an internal fork of Calcite.
>
> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
>
>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> <
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >
>
> Julian
>
>

Quicksql

2019-10-29 Thread Julian Hyde

Anyone know anything about Quicksql? It seems to be quite a popular project, 
and they have an internal fork of Calcite.

https://github.com/Qihoo360/ <https://github.com/Qihoo360/>

https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
 
<https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite>

Julian

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Re: Quicksql

Quicksql

19 matches

Site Navigation

Mail list logo

Footer information