Re: [DISCUSS] Restart the work of introducing TIME/TIMESTAMP WITH TIME ZONE types to Calcite

2019-12-22 Thread Zhenghua Gao
Hi Rui,

Thanks for your reply.

I also prefer [long + TimeZone] as internal representation of TIMESTAMP
WITH TIME ZONE, and
[int + TimeZone] as representation of TIME WITH TIME ZONE for the following
reasons:
1) Current Calcite codebase use java.sql.TimeZone to describe time zone
information(as you mentioned above)
2) [long/int + TimeZone] is simple and clear for the semantic of
TIME/TIMESTAMP WITH TIME ZONE


*Best Regards,*
*Zhenghua Gao*


On Sat, Dec 21, 2019 at 3:30 AM Rui Wang  wrote:

> In general, I incline to a solution of saving timestamp with time zone by
> [long + a format of time zone]. The long is the epoch millis in UTC and the
> time zone is the original zone of the value. Using UTC epoch millis will
> simplify comparison and make instant semantic based functions reusable.
> Zone information will be useful for zone dependent functions. E.g.
> Extract() that extracts which "DAY" or format() that print a string which
> should have the right time zone included.
>
> The above is more conceptual, as many existing Java classes has saving it
> in that way, and implementing a class in Calcite can achieve the same goal.
> I think people who have more experience in Calcite runtime code might could
> better answer it and say which solution is better for
> function implementation, enumerable implementation. etc.
>
>
> Regarding Zhenghua's investigation of different types of time zones, in
> codebase I found [1], does it mean that in the past Calcite community
> decides to adopt the time zone format defined in [2]?
>
>
> [1]:
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/util/TimestampWithTimeZoneString.java#L38
> [2]: https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html
>
> -Rui
>
>
>
> On Fri, Dec 20, 2019 at 2:42 AM Zhenghua Gao  wrote:
>
> > Hi
> >
> > I did some research for the storage type for TIME/TIMESTAMP WITH TIME
> ZONE
> > types, and
> > want to share some ideas for discussion:
> >
> >- There are three types of  time zone information for both
> >java.sql.TimeZone and java.time.ZoneId
> >   - Region-based time zone, such as:
> >   Asia/Shanghai, America/Los_Angeles, etc
> >   - Offset-based time zone, such
> >   as: Etc/GMT, Etc/GMT+1, GMT+08:30/+08:30, etc
> >   - Some abbreviations which have been deprecated in
> java.sql.TimeZone,
> >   such as: EST, HST, ACT
> >
> > The abbreviations are always mapped to the first two types,
> we
> > can ignore them in the following discussion.
> >
> >- java.time.OffsetTime/java.time.OffsetDateTime can't be storage types
> >because they only covers the second type of time zone information
> >- So we should introduce our own internal classes to represent there
> >types and there are 3 choice:
> >   - Use (int + TimeZone) to represent TIME WITH TIME ZONE, (long
> >   +Timezone) to represent TIMESTAMP WITH TIME ZONE
> >   - Use (int + ZoneId) to represent TIME WITH TIME ZONE, (long +
> >   ZoneId) to represent TIMESTAMP WITH TIME ZONE
> >   - Presto style
> >
> >   Presto provide a *zone-index.properties* which
> > contains the fixed number key (a short value)  for every supported time
> > zone id (a string),
> >   and use a single long value to store the
> millisecond
> > and time zone key
> >
> > What do you think which one should be our storage solution?
> >
> > *Best Regards,*
> > *Zhenghua Gao*
> >
> >
> > On Fri, Dec 20, 2019 at 2:51 PM Rui Wang  wrote:
> >
> > > Thanks Zhenghua sharing [1], which really explaining three different
> > > semantics of TIMESTAMP and clarified some of my long term confusion
> about
> > > TIMESTAMP.
> > >
> > >
> > > Julian> We need all 3, regardless of what they are called
> > > Can I confirm that Calcite already have the following two semantics
> > > support:
> > >
> > > 1. timestamp that has (number) content and “zoneless” semantics (I
> > believe
> > > it is TIMESTAMP, alternatively it might be named as
> > > TIMESTAMP_WITIOUT_TIME_ZONE)
> > > 2. a timestamp type with (number) content and “instant” semantics
> (which
> > I
> > > believe it is the TIME_WITH_LOCAL_TIME_ZONE
> > >
> > >
> > > What I am interested in is how's the current support of the first
> > semantic.
> > > And if it does not have well support, I would like to work on it to
> make
> > it
> > > better. (In the past I don't really find the first semantic exists in
> > > Calcite, maybe I have missed something).
> > >
> > > [1]:
> > >
> > >
> >
> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#
> > >
> > > -Rui
> > >
> > >
> > > On Thu, Dec 19, 2019 at 6:27 PM Zhenghua Gao  wrote:
> > >
> > > > Thanks for your comments!
> > > > I have opened an umbrella issue[1] to track this.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/CALCITE-3611
> > > >
> > > > *Best Regards,*
> > > > *Zhenghua Gao*
> > > >
> > > >
> > > > On Fri, Dec 20, 2019 at

Re: Quicksql

2019-12-22 Thread Juan Pan
Thanks Gelbana,


Very appreciated your explanation, which sheds me some light on exploring 
Calcite. :)


Best wishes,
Trista


 Juan Pan (Trista) 
 
Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/22/2019 05:58,Muhammad Gelbana wrote:
I am curious how to join the tables from different datasources.
Based on Calcite's conventions concept, the Join operator and its input
operators should all have the same convention. If they don't, the
convention different from the Join operator's convention will have to
register a converter rule. This rule should produce an operator that only
converts from that convention to the Join operator's convention.

This way the Join operator will be able to handle the data obtained from
its input operators because it understands the data structure.

Thanks,
Gelbana


On Wed, Dec 18, 2019 at 5:08 AM Juan Pan  wrote:

Some updates.


Recently i took a look at their doc and source code, and found this
project uses SQL parsing and Relational algebra of Calcite to get query
plan, and also translates to spark SQL for joining different datasources,
or corresponding query for single datasource.


Although it copies many classes from Calcite, the idea of QuickSQL seems
some of interests, and code is succinct.


Best,
Trista


Juan Pan (Trista)

Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/13/2019 17:16,Juan Pan wrote:
Yes, indeed.


Juan Pan (Trista)

Senior DBA & PPMC of Apache ShardingSphere(Incubating)
E-mail: panj...@apache.org




On 12/12/2019 18:00,Alessandro Solimando
wrote:
Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.

On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:

Nope, it doesn't use any adapters. It just submits partial SQL query to
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from
mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote:
Hi Haisheng,


The query on different data source will then be registered as temp
spark tables (with filter or join pushed in), the whole query is rewritten
as SQL text over these temp tables and submitted to Spark.


Does it mean QuickSQL also need adaptors to make query executed on
different data source?


Yes, virtualization is one of Calcite’s goals. In fact, when I created
Calcite I was thinking about virtualization + in-memory materialized views.
Not only the Spark convention but any of the “engine” conventions (Drill,
Flink, Beam, Enumerable) could be used to create a virtual query engine.


Basically, i like and agree with Julian’s statement. It is a great idea
which personally hope Calcite move towards.


Give my best wishes to Calcite community.


Thanks,
Trista


Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 12/11/2019 10:53,Haisheng Yuan wrote:
As far as I know, users still need to register tables from other data
sources before querying it. QuickSQL uses Calcite for parsing queries and
optimizing logical expressions with several transformation rules. The query
on different data source will then be registered as temp spark tables (with
filter or join pushed in), the whole query is rewritten as SQL text over
these temp tables and submitted to Spark.

- Haisheng

--
发件人:Rui Wang
日 期:2019年12月11日 06:24:45
收件人:
主 题:Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.

Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small q

[jira] [Created] (CALCITE-3623) Replace Spotless with Autostyle

2019-12-22 Thread Vladimir Sitnikov (Jira)
Vladimir Sitnikov created CALCITE-3623:
--

 Summary: Replace Spotless with Autostyle
 Key: CALCITE-3623
 URL: https://issues.apache.org/jira/browse/CALCITE-3623
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.21.0
Reporter: Vladimir Sitnikov


Spotless has certain drawbacks:
1) It is not able to verify license headers for non-Java files. For instance, 
it skips `package-info.java`, it skips `*.kts` and so on :(
2) Its error messages are too verbose. Sometimes it prints the full stacktrace 
when just one line was enough: "file X line Y column Z has error: ..."
3) It uses unsafe Gradle APIs, so it will be incompatible with Gradle 7.0

I suggest to replace it with https://github.com/autostyle/autostyle


Note: I tried to contact Spotless authors, and their way to manage code makes 
it very hard to co-operate :((



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3622) Update geode tests upgrade from junit4 to junit5

2019-12-22 Thread Forward Xu (Jira)
Forward Xu created CALCITE-3622:
---

 Summary: Update geode tests upgrade from junit4 to junit5
 Key: CALCITE-3622
 URL: https://issues.apache.org/jira/browse/CALCITE-3622
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu


Update `geode` tests upgrade from junit4 to junit5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Monthly online Calcite meetups

2019-12-22 Thread Muhammad Gelbana
Here are my availability times in case we won't use doodle. Do you think
it's useful to record and save those meetings ?

Mohamed
==
Option 1:  Sun-Thur 15:00-19:0
Option 2: Fri 12:00-19:00
Option 3: Sat 10:00 AM - 12:00

On Sun, Dec 22, 2019 at 12:12 AM Muhammad Gelbana 
wrote:

> I love the idea. I added my availability times to doodle. I'll try to do
> my best to attend the meeting even if it's out of the ranges I specified
> anyway.
>
>
> On Sat, Dec 21, 2019 at 9:30 PM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>
>> Stamatis>To begin with we could try to hold a single meetup per month and
>> see later
>> Stamatis>on how it goes
>>
>> It might be nice to try, however, it did not survive long the last time :(
>>
>> Stamatis>The ranges should be rather large so that it is easier to find
>> Stamatis>some overlapping among us
>>
>> An alternative option is to mark checkboxes here:
>> https://doodle.com/poll/4xymswz842i8xat8
>> Note: even though it says "22..28 Dec" I suggest to treat it as "sunday ..
>> monday"
>>
>> Vladimir
>>
>