Re: Monthly online Calcite meetups

2019-12-21 Thread Muhammad Gelbana
I love the idea. I added my availability times to doodle. I'll try to do my
best to attend the meeting even if it's out of the ranges I specified
anyway.


On Sat, Dec 21, 2019 at 9:30 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Stamatis>To begin with we could try to hold a single meetup per month and
> see later
> Stamatis>on how it goes
>
> It might be nice to try, however, it did not survive long the last time :(
>
> Stamatis>The ranges should be rather large so that it is easier to find
> Stamatis>some overlapping among us
>
> An alternative option is to mark checkboxes here:
> https://doodle.com/poll/4xymswz842i8xat8
> Note: even though it says "22..28 Dec" I suggest to treat it as "sunday ..
> monday"
>
> Vladimir
>


Re: Quicksql

2019-12-21 Thread Muhammad Gelbana
> I am curious how to join the tables from different datasources.
Based on Calcite's conventions concept, the Join operator and its input
operators should all have the same convention. If they don't, the
convention different from the Join operator's convention will have to
register a converter rule. This rule should produce an operator that only
converts from that convention to the Join operator's convention.

This way the Join operator will be able to handle the data obtained from
its input operators because it understands the data structure.

Thanks,
Gelbana


On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:

> Some updates.
>
>
> Recently i took a look at their doc and source code, and found this
> project uses SQL parsing and Relational algebra of Calcite to get query
> plan, and also translates to spark SQL for joining different datasources,
> or corresponding query for single datasource.
>
>
> Although it copies many classes from Calcite, the idea of QuickSQL seems
> some of interests, and code is succinct.
>
>
> Best,
> Trista
>
>
>  Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/13/2019 17:16,Juan Pan wrote:
> Yes, indeed.
>
>
> Juan Pan (Trista)
>
> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> E-mail: panj...@apache.org
>
>
>
>
> On 12/12/2019 18:00,Alessandro Solimando
> wrote:
> Adapters must be needed by data sources not supporting SQL, I think this is
> what Juan Pan was asking for.
>
> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:
>
> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan"  wrote:
> Hi Haisheng,
>
>
> The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
>
>
> Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
>
>
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
>
>
> Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
>
>
> Give my best wishes to Calcite community.
>
>
> Thanks,
> Trista
>
>
> Juan Pan
>
>
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
>
>
> On 12/11/2019 10:53,Haisheng Yuan wrote:
> As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
>
> - Haisheng
>
> --
> 发件人:Rui Wang
> 日 期:2019年12月11日 06:24:45
> 收件人:
> 主 题:Re: Quicksql
>
> The co-routine model sounds fitting into Streaming cases well.
>
> I was thinking how should Enumerable interface work with streaming cases
> but now I should also check Interpreter.
>
>
> -Rui
>
> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:
>
> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
>
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
>
> I would love to add another built-in convention that uses 

Re: Monthly online Calcite meetups

2019-12-21 Thread Vladimir Sitnikov
Stamatis>To begin with we could try to hold a single meetup per month and
see later
Stamatis>on how it goes

It might be nice to try, however, it did not survive long the last time :(

Stamatis>The ranges should be rather large so that it is easier to find
Stamatis>some overlapping among us

An alternative option is to mark checkboxes here:
https://doodle.com/poll/4xymswz842i8xat8
Note: even though it says "22..28 Dec" I suggest to treat it as "sunday ..
monday"

Vladimir


Monthly online Calcite meetups

2019-12-21 Thread Stamatis Zampetakis
Hi all,

Quite often there are subjects in the dev list, Jira issues, and GitHub
PRs, that trigger long discussions and exchanges among many people. In many
cases, I think that we could reach consensus and better understand each
other if we were holding live discussions.

Online meetups would be also a nice opportunity to meet each other in a
different way than the emails which could possibly help to improve our
relationships.

Although we could decide to hold meetups in a per-case basis it might be
easier to schedule in advance some slots a few times per month were anybody
is invited to join.

We could use these slots for advancing ongoing discussions, brainstorm on
new ideas, review PRs, sign keys, and anything else related to our
community.

To begin with we could try to hold a single meetup per month and see later
on how it goes. It doesn't have to be long and we can decide the duration
based on the availability of the participants at the beginning of each
meetup.

I would like to invite people (committers or not) who would be willing to
join these online meetups to reply to this thread and indicate their
preferences (in UTC): a few intervals that they would be potentially
available. The ranges should be rather large so that it is easier to find
some overlapping among us. For instance:

Stamatis
===
Option 1: Mon to Fri 10:00 - 16:00
Option 2: Sat, Sun: 7:00 - 21:00

It is definitely not a commitment that the person will join the meetup but
rather a best effort to find slots that would satisfy the majority. So for
me the above means that if the meetup is inside the above intervals there
is a high change that I will join. If it is outside, I might still join but
it will be more difficult.

A few topics that may be suitable for the first meetup would be the
following:

1. Line endings for source files on Windows
2. Remove Kotlin
3. Move CalciteAssert and similar "test framework"
4. On-demand traitset request
5. Volcano's problem with trait propagation: current state and future
Other??

Needless to say that these meetups do not replace the dev list or Jira but
just add another mean to improve our collaboration.

Let me know how does this look!
Stamatis