Re: [DISCUSS] Restart the work of introducing TIME/TIMESTAMP WITH TIME ZONE types to Calcite
Hi Rui, Thanks for your reply. I also prefer [long + TimeZone] as internal representation of TIMESTAMP WITH TIME ZONE, and [int + TimeZone] as representation of TIME WITH TIME ZONE for the following reasons: 1) Current Calcite codebase use java.sql.TimeZone to describe time zone information(as you mentioned above) 2) [long/int + TimeZone] is simple and clear for the semantic of TIME/TIMESTAMP WITH TIME ZONE *Best Regards,* *Zhenghua Gao* On Sat, Dec 21, 2019 at 3:30 AM Rui Wang wrote: > In general, I incline to a solution of saving timestamp with time zone by > [long + a format of time zone]. The long is the epoch millis in UTC and the > time zone is the original zone of the value. Using UTC epoch millis will > simplify comparison and make instant semantic based functions reusable. > Zone information will be useful for zone dependent functions. E.g. > Extract() that extracts which "DAY" or format() that print a string which > should have the right time zone included. > > The above is more conceptual, as many existing Java classes has saving it > in that way, and implementing a class in Calcite can achieve the same goal. > I think people who have more experience in Calcite runtime code might could > better answer it and say which solution is better for > function implementation, enumerable implementation. etc. > > > Regarding Zhenghua's investigation of different types of time zones, in > codebase I found [1], does it mean that in the past Calcite community > decides to adopt the time zone format defined in [2]? > > > [1]: > > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/util/TimestampWithTimeZoneString.java#L38 > [2]: https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html > > -Rui > > > > On Fri, Dec 20, 2019 at 2:42 AM Zhenghua Gao wrote: > > > Hi > > > > I did some research for the storage type for TIME/TIMESTAMP WITH TIME > ZONE > > types, and > > want to share some ideas for discussion: > > > >- There are three types of time zone information for both > >java.sql.TimeZone and java.time.ZoneId > > - Region-based time zone, such as: > > Asia/Shanghai, America/Los_Angeles, etc > > - Offset-based time zone, such > > as: Etc/GMT, Etc/GMT+1, GMT+08:30/+08:30, etc > > - Some abbreviations which have been deprecated in > java.sql.TimeZone, > > such as: EST, HST, ACT > > > > The abbreviations are always mapped to the first two types, > we > > can ignore them in the following discussion. > > > >- java.time.OffsetTime/java.time.OffsetDateTime can't be storage types > >because they only covers the second type of time zone information > >- So we should introduce our own internal classes to represent there > >types and there are 3 choice: > > - Use (int + TimeZone) to represent TIME WITH TIME ZONE, (long > > +Timezone) to represent TIMESTAMP WITH TIME ZONE > > - Use (int + ZoneId) to represent TIME WITH TIME ZONE, (long + > > ZoneId) to represent TIMESTAMP WITH TIME ZONE > > - Presto style > > > > Presto provide a *zone-index.properties* which > > contains the fixed number key (a short value) for every supported time > > zone id (a string), > > and use a single long value to store the > millisecond > > and time zone key > > > > What do you think which one should be our storage solution? > > > > *Best Regards,* > > *Zhenghua Gao* > > > > > > On Fri, Dec 20, 2019 at 2:51 PM Rui Wang wrote: > > > > > Thanks Zhenghua sharing [1], which really explaining three different > > > semantics of TIMESTAMP and clarified some of my long term confusion > about > > > TIMESTAMP. > > > > > > > > > Julian> We need all 3, regardless of what they are called > > > Can I confirm that Calcite already have the following two semantics > > > support: > > > > > > 1. timestamp that has (number) content and “zoneless” semantics (I > > believe > > > it is TIMESTAMP, alternatively it might be named as > > > TIMESTAMP_WITIOUT_TIME_ZONE) > > > 2. a timestamp type with (number) content and “instant” semantics > (which > > I > > > believe it is the TIME_WITH_LOCAL_TIME_ZONE > > > > > > > > > What I am interested in is how's the current support of the first > > semantic. > > > And if it does not have well support, I would like to work on it to > make > > it > > > better. (In the past I don't really find the first semantic exists in > > > Calcite, maybe I have missed something). > > > > > > [1]: > > > > > > > > > https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit# > > > > > > -Rui > > > > > > > > > On Thu, Dec 19, 2019 at 6:27 PM Zhenghua Gao wrote: > > > > > > > Thanks for your comments! > > > > I have opened an umbrella issue[1] to track this. > > > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-3611 > > > > > > > > *Best Regards,* > > > > *Zhenghua Gao* > > > > > > > > > > > > On Fri, Dec 20, 2019 at
Re: Quicksql
Thanks Gelbana, Very appreciated your explanation, which sheds me some light on exploring Calcite. :) Best wishes, Trista Juan Pan (Trista) Senior DBA & PPMC of Apache ShardingSphere(Incubating) E-mail: panj...@apache.org On 12/22/2019 05:58,Muhammad Gelbana wrote: I am curious how to join the tables from different datasources. Based on Calcite's conventions concept, the Join operator and its input operators should all have the same convention. If they don't, the convention different from the Join operator's convention will have to register a converter rule. This rule should produce an operator that only converts from that convention to the Join operator's convention. This way the Join operator will be able to handle the data obtained from its input operators because it understands the data structure. Thanks, Gelbana On Wed, Dec 18, 2019 at 5:08 AM Juan Pan wrote: Some updates. Recently i took a look at their doc and source code, and found this project uses SQL parsing and Relational algebra of Calcite to get query plan, and also translates to spark SQL for joining different datasources, or corresponding query for single datasource. Although it copies many classes from Calcite, the idea of QuickSQL seems some of interests, and code is succinct. Best, Trista Juan Pan (Trista) Senior DBA & PPMC of Apache ShardingSphere(Incubating) E-mail: panj...@apache.org On 12/13/2019 17:16,Juan Pan wrote: Yes, indeed. Juan Pan (Trista) Senior DBA & PPMC of Apache ShardingSphere(Incubating) E-mail: panj...@apache.org On 12/12/2019 18:00,Alessandro Solimando wrote: Adapters must be needed by data sources not supporting SQL, I think this is what Juan Pan was asking for. On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan wrote: Nope, it doesn't use any adapters. It just submits partial SQL query to different engines. If query contains table from single source, e.g. select count(*) from hive_table1, hive_table2 where a=b; then the whole query will be submitted to hive. Otherwise, e.g. select distinct a,b from hive_table union select distinct a,b from mysql_table; The following query will be submitted to Spark and executed by Spark: select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2; spark_tmp_table1: select distinct a,b from hive_table spark_tmp_table2: select distinct a,b from mysql_table On 2019/12/11 04:27:07, "Juan Pan" wrote: Hi Haisheng, The query on different data source will then be registered as temp spark tables (with filter or join pushed in), the whole query is rewritten as SQL text over these temp tables and submitted to Spark. Does it mean QuickSQL also need adaptors to make query executed on different data source? Yes, virtualization is one of Calcite’s goals. In fact, when I created Calcite I was thinking about virtualization + in-memory materialized views. Not only the Spark convention but any of the “engine” conventions (Drill, Flink, Beam, Enumerable) could be used to create a virtual query engine. Basically, i like and agree with Julian’s statement. It is a great idea which personally hope Calcite move towards. Give my best wishes to Calcite community. Thanks, Trista Juan Pan panj...@apache.org Juan Pan(Trista), Apache ShardingSphere On 12/11/2019 10:53,Haisheng Yuan wrote: As far as I know, users still need to register tables from other data sources before querying it. QuickSQL uses Calcite for parsing queries and optimizing logical expressions with several transformation rules. The query on different data source will then be registered as temp spark tables (with filter or join pushed in), the whole query is rewritten as SQL text over these temp tables and submitted to Spark. - Haisheng -- 发件人:Rui Wang 日 期:2019年12月11日 06:24:45 收件人: 主 题:Re: Quicksql The co-routine model sounds fitting into Streaming cases well. I was thinking how should Enumerable interface work with streaming cases but now I should also check Interpreter. -Rui On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde wrote: The goal (or rather my goal) for the interpreter is to replace Enumerable as the quick, easy default convention. Enumerable is efficient but not that efficient (compared to engines that work on off-heap data representing batches of records). And because it generates java byte code there is a certain latency to getting a query prepared and ready to run. It basically implements the old Volcano query evaluation model. It is single-threaded (because all work happens as a result of a call to 'next()' on the root node) and cannot handle branching data-flow graphs (DAGs). The Interpreter operates uses a co-routine model (reading from queues, writing to queues, and yielding when there is no work to be done) and therefore could be more efficient than enumerable in a single-node multi-core system. Also, there is little start-up time, which is important for small q
[jira] [Created] (CALCITE-3623) Replace Spotless with Autostyle
Vladimir Sitnikov created CALCITE-3623: -- Summary: Replace Spotless with Autostyle Key: CALCITE-3623 URL: https://issues.apache.org/jira/browse/CALCITE-3623 Project: Calcite Issue Type: Improvement Components: core Affects Versions: 1.21.0 Reporter: Vladimir Sitnikov Spotless has certain drawbacks: 1) It is not able to verify license headers for non-Java files. For instance, it skips `package-info.java`, it skips `*.kts` and so on :( 2) Its error messages are too verbose. Sometimes it prints the full stacktrace when just one line was enough: "file X line Y column Z has error: ..." 3) It uses unsafe Gradle APIs, so it will be incompatible with Gradle 7.0 I suggest to replace it with https://github.com/autostyle/autostyle Note: I tried to contact Spotless authors, and their way to manage code makes it very hard to co-operate :(( -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3622) Update geode tests upgrade from junit4 to junit5
Forward Xu created CALCITE-3622: --- Summary: Update geode tests upgrade from junit4 to junit5 Key: CALCITE-3622 URL: https://issues.apache.org/jira/browse/CALCITE-3622 Project: Calcite Issue Type: Improvement Reporter: Forward Xu Update `geode` tests upgrade from junit4 to junit5. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Monthly online Calcite meetups
Here are my availability times in case we won't use doodle. Do you think it's useful to record and save those meetings ? Mohamed == Option 1: Sun-Thur 15:00-19:0 Option 2: Fri 12:00-19:00 Option 3: Sat 10:00 AM - 12:00 On Sun, Dec 22, 2019 at 12:12 AM Muhammad Gelbana wrote: > I love the idea. I added my availability times to doodle. I'll try to do > my best to attend the meeting even if it's out of the ranges I specified > anyway. > > > On Sat, Dec 21, 2019 at 9:30 PM Vladimir Sitnikov < > sitnikov.vladi...@gmail.com> wrote: > >> Stamatis>To begin with we could try to hold a single meetup per month and >> see later >> Stamatis>on how it goes >> >> It might be nice to try, however, it did not survive long the last time :( >> >> Stamatis>The ranges should be rather large so that it is easier to find >> Stamatis>some overlapping among us >> >> An alternative option is to mark checkboxes here: >> https://doodle.com/poll/4xymswz842i8xat8 >> Note: even though it says "22..28 Dec" I suggest to treat it as "sunday .. >> monday" >> >> Vladimir >> >