Re: [ANNOUNCE] Haisheng Yuan joins Calcite PMC

2019-11-11 Thread Chunhui Shi
Congratulations!

On Mon, Nov 11, 2019 at 10:09 AM Jinfeng Ni  wrote:

> Congratulations!
>
>
> On Tue, Nov 12, 2019 at 1:23 AM Rui Wang  wrote:
> >
> > Congrats HaiSheng!
> >
> >
> > -Rui
> >
> > On Mon, Nov 11, 2019 at 8:05 AM Stamatis Zampetakis 
> > wrote:
> >
> > > Congrats Haisheng!
> > >
> > > Reviews, code contributions, design discussions, helping users, and
> many
> > > more things for improving the project.
> > >
> > > Personally, I also learn a lot from our interactions.
> > >
> > > All these are much appreciated; keep it up!!
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Mon, Nov 11, 2019, 4:17 PM Michael Mior  wrote:
> > >
> > > > Welcome and congratulations HaiSheng!
> > > > --
> > > > Michael Mior
> > > > mm...@apache.org
> > > >
> > > > Le dim. 10 nov. 2019 à 22:45, Francis Chuang
> > > >  a écrit :
> > > > >
> > > > > I'm pleased to announce that Haisheng has accepted an invitation to
> > > > > join the Calcite PMC. Haisheng has been a consistent and helpful
> > > > > figure in the Calcite community for which we are very grateful. We
> > > > > look forward to the continued contributions and support.
> > > > >
> > > > > Please join me in congratulating Haisheng!
> > > > >
> > > > > - Francis (on behalf of the Calcite PMC)
> > > >
> > >
>


Re: Another Calcite-related paper accepted for SIGMOD -- "One SQL to Rule Them All"

2019-02-12 Thread Chunhui Shi
Congratulations! Can not wait to read this paper!

On Tue, Feb 12, 2019 at 5:34 AM Michael Mior  wrote:

> Excellent! Really glad to see all the work that has been happening on
> streaming SQL in the Apache community get recognized.
> --
> Michael Mior
> mm...@apache.org
>
> Le mar. 12 févr. 2019 à 08:11, Edmon Begoli  a écrit :
> >
> > Dear Calcite community,
> >
> > I want to let you know that another significant paper featuring Calcite
> > (alongside Apache Flink and Beam) has been accepted for SIGMOD 2019.
> >
> > The full title of the paper is:
> > One SQL to Rule Them All – an Efficient and Syntactically Idiomatic
> > Approach to Management of Streams and Tables. Edmon Begoli, Tyler Akidau,
> > Fabian Hueske, Julian Hyde, Kathryn Knight, and Kenneth Knowles. To
> appear
> > in Proceedings of ACM SIGMOD conference (SIGMOD ’19). ACM, New York, NY,
> > USA
> >
> > I want to thank Julian Hyde for his contributions, and for introducing us
> > to the co-authors, with special thanks to Fabian Hueske from Flink, and
> > Tyler Akidau and Kenn Knowles from Beam for their outstanding
> contributions.
> >
> > Thank you,
> > Edmon
>


Re: Publish Drill Calcite project artifacts to Apache maven repository

2018-09-12 Thread Chunhui Shi
For CALCITE-1178 and other loosing type check things, if the concern was that 
it is not compliant with SQL standard, should this be a SQL flavor defined in 
one of compliance? So Calcite users (e.g. Drill) can choose a customized 
compliance to enable the implicit type conversion.
--
Sender:Julian Hyde 
Sent at:2018 Sep 12 (Wed) 17:04
To:dev 
Cc:dev 
Subject:Re: Publish Drill Calcite project artifacts to Apache maven repository

Probably down to me. Although, in my defense, it is hard to be gatekeeper for 
big messy changes that are of obvious benefit to the contributor but not such 
obvious benefit to the rest of the project.

Is there a PR for CALCITE-1178?

Julian


> On Sep 12, 2018, at 10:32 AM, Vova Vysotskyi  wrote:
> 
> Thanks for your responses and clarifications!
> 
> Regarding the reasons for using the fork:
> We would love to move to the Apache Calcite instead of using the fork!
> 
> And we tried very hard to do it, especially during the rebase from 1.4 to
> 1.15 (DRILL-3993 ).
> But unfortunately, there left three Jiras, which weren't accepted by the
> Calcite community yet:
> CALCITE-2087 ,
> CALCITE-2018  and
> CALCITE-1178 .
> 
> Kind regards,
> Volodymyr Vysotskyi
> 
> 
> On Wed, Sep 12, 2018 at 7:39 PM Julian Hyde  wrote:
> 
>> I can confirm what Josh says about OSSRH. You need to fill out a form with
>> Sonatype that convinces them that you own the groupId (basically a domain
>> name). Then they give you authorization to publish artifacts under that
>> groupId. For example, I publish artifacts under the sqlline and
>> net.hydromatic groupIds.
>> 
>>> On Sep 12, 2018, at 9:28 AM, Josh Elser  wrote:
>>> 
>>> Maven central is made up of a number of "Trusted" Maven repositories.
>> This includes the ASF and OSSRH Maven repositories. Many other
>> organizations run "mirrors" of central.
>>> 
>>> The ASF Maven repo is published to by ASF projects who have gone through
>> the ASF release process. OSSRH allows any release which meets the criteria
>> described here[1]. As an individual, you are within your rights to publish
>> your fork of Calcite to OSSRH as long as there are no legal or trademark
>> concerns. It would be imperative to not cause confusion with official
>> Apache Calcite releases -- clear branding and separate Maven
>> groupId/artifactId "coordinates" should be sufficient.
>>> 
>>> However, since you are (presumably) acting as a member of Apache Drill,
>> it would be very odd (and potentially against ASF policy) to make a release
>> of software that *isn't* using the ASF Maven resources. This gives me some
>> pause -- do you have an ASF member on your PMC you can run this by?
>>> 
>>> Finally, as a Calcite PMC member, I feel obligated to ask why Drill
>> needs to maintain this fork, and see if there is something that can be done
>> from the Calcite side to get you "back on upstream"? Why the need to make
>> long-term plans to isolate Apache Drill from Apache Calcite?
>>> 
>>> [1] https://central.sonatype.org/pages/ossrh-guide.html
>>> 
>>> On 9/12/18 11:33 AM, Vova Vysotskyi wrote:
 Hi all,
 As you know, Drill uses its fork of Apache Calcite.
 In DRILL-6711  was
 proposed to deploy Drill Calcite project artifacts
 to Apache Maven repository or at least to the central maven repository.
 I have looked for the similar cases of fork versions and didn't find
 anything similar in the central repo.
 Also, I have looked at the Sonatype OSSRH Jiras for similar cases
 of deploying fork versions, but that projects used custom groupIds.
 Could someone please give me the advice what is the acceptable way
 of publishing the custom Drill Calcite artifacts to the central repo and
 is it possible to publish them without changing groupId?
 Kind regards,
 Volodymyr Vysotskyi
>> 
>> 

[jira] [Created] (CALCITE-2356) Allow user defined policy to dynamically define all or some specific rules' execution order or even skip some rules

2018-06-07 Thread Chunhui Shi (JIRA)
Chunhui Shi created CALCITE-2356:


 Summary: Allow user defined policy to dynamically define all or 
some specific rules' execution order or even skip some rules
 Key: CALCITE-2356
 URL: https://issues.apache.org/jira/browse/CALCITE-2356
 Project: Calcite
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Julian Hyde


We have seen the order of the rule execution order did impact VolcanoPlanner's 
behavior, for example, in CALCITE-2223, if we reverse the order decided by name 
in RuleMatchImportanceComparator, we could get different result for 
CALCITE-2223 case.

And in some of our practices, we have seen rules on overlapped patterns could 
also trigger unnecessary chaos and much bigger exploration space which caused 
the planning time became much longer.

So the proposal of this Jira is, while each rule focuses on the local pattern, 
Calcite allow a pluggable coordinator of rule execution to utilize the 
knowledge we have about the rules and current state. The output of this 
coordinator is the sequence of rules to execute on matching patterns. The input 
is still the matching rules and pattern discovered by Calcite. When new nodes 
added and new rules need to be triggered, they should be added through the 
coordinator to decide whether/how they will be executed.

This proposed feature should not impact any current Calcite users since they 
don't define their own policies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2291) Add rule to push Project past Correlate

2018-04-30 Thread Chunhui Shi (JIRA)
Chunhui Shi created CALCITE-2291:


 Summary: Add rule to push Project past Correlate
 Key: CALCITE-2291
 URL: https://issues.apache.org/jira/browse/CALCITE-2291
 Project: Calcite
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Julian Hyde


Correlate is not derived from Join so we need a rule similar to 
ProjectJoinTransposeRule to push Project to under Correlate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: SQL:2016

2018-04-19 Thread Chunhui Shi
Glad to see that we are discussing JSON features in latest SQL standard and 
Michael already worked on one JSON function.


I think we Drill community is interested in polymorphic table function feature 
and I just filed CALCITE-2270 for that.


From: Julian Hyde 
Sent: Thursday, April 19, 2018 1:36:20 PM
To: dev@calcite.apache.org
Subject: Re: SQL:2016

Yes. Apache has representation in the W3C and Open Geospatial Consortium. Much 
database innovation these days is coming out of open source, and these open 
source projects tend to join Apache, so it makes a lot of sense that we have a 
voice.

> On Apr 19, 2018, at 1:30 PM, Edmon Begoli  wrote:
>
> I will certainly ask.
>
> I noticed that most members are big vendors, but that might have to do with
> a $2200 membership fee.
>
> It would probably be good to have ASF represented more broadly by the
> projects that use SQL as an API.
>
> On Thursday, April 19, 2018, Julian Hyde  wrote:
>
>> Do you think you could help bring a little more openness to the process?
>> I’d love to know what areas are being considered, and what is the target
>> date for the next standard.
>>
>> Even if the information flow is only one way, it would help counter some
>> perceptions that the process is dominated by the large vendors.
>>
>> Julian
>>
>>
>>> On Apr 19, 2018, at 1:21 PM, Edmon Begoli  wrote:
>>>
>>> I’ve actually joined the standard to be, in addition to representing my
>>> lab, an advocate for Calcite and ASF, so I could represent these needs,
>> and
>>> bring anything else up.
>>>
>>> Just let me know.
>>>
>>> Thank you,
>>> Edmon
>>>
>>> On Thursday, April 19, 2018, Julian Hyde  wrote:
>>>
 I’d love to know whether/when you guys intend to standardize streaming
>> SQL.

 I have come to the conclusion that extensions to SQL’s existing temporal
 support (i.e. being able to join each row to a different temporal
>> snapshot
 of a table) would be extremely useful to support streaming.

 Also, extensions to handle weakly typed relations (think of javascript
>> and
 ruby’s type systems as opposed to java’s) would be welcome.

 And support for approximations, e.g.

 select count(distinct x) approximate (within 1 percent) from t;

 Julian



> On Apr 19, 2018, at 11:40 AM, Edmon Begoli  wrote:
>
> I am on the SQL standards committee and I will ask.
>
> Are there any other things anyone would like to know?
>
> On Thu, Apr 19, 2018 at 5:54 PM Michael Mior 
>> wrote:
>
>> Thanks for the review. I have most of these changes sorted out. Is
>> there
>> any good resource for the SQL standard aside from purchasing a copy of
 the
>> standard itself. If not, do you think that this is something the ASF
 would
>> be willing to do? Assuming it could be shared between projects, I
>> think
>> there are many who would benefit from this.
>>
>> --
>> Michael Mior
>> mm...@uwaterloo.ca
>>
>> 2018-04-18 21:31 GMT-04:00 Julian Hyde :
>>
>>> A couple of minor things. Your isJson function should return boolean
 not
>>> Boolean, because the ISJSON function is strict - i.e. returns unknown
 if
>>> and only if its input is null. If the input is null the code
>> generator
>> will
>>> not call it.
>>>
>>> I think SqlIsJsonFunction is probably not necessary. I think
>> everything
>>> about the function can be deduced by reflection. (That’s how the Geo
>>> functions work, also.)
>>>
>>> I’d add tests for JSON functions to SqlOperatorBaseTest rather than
>>> creating CalciteJsonOperatorTest and JsonOperatorBaseTest. JSON
 functions
>>> are not that different from the built-in function set. (The Geo
 functions
>>> are not in the SQL standard; that’s why I separated them a bit.)
>>>
>>> Julian
>>>
>>>
 On Apr 18, 2018, at 5:59 PM, Michael Mior 
>> wrote:

 Thanks Julian! I opened CALCITE-2266 to track implementing some of
>> the
>>> new
 JSON functions. I took a stab at implementing ISJSON in the
>> following
 commit:

 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_michaelmior_calcite_commit_&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=7bZjGOKpQi7qeyQ_xRpzEQ&m=0Jh3EPSwapGYNnv2hUX4HZk_yFW6lvGPMbaLlQG-JlE&s=cZ3C4egnDTT6K-pzD1vlFntN97u-zPQ47vZEUX92stc&e=
>>> d6930fcd04ed83d37f56a7795ee794
 1b521fb99c

 These are touching parts of the code base I'm unfamiliar with so I
>> mostly
 don't know what I'm doing :) I added a new operator table which I'm
 guessing we probably don't want to do but it made it easier for me
 when
 testing to isolate the new code.

 --
 Michael Mior
 mm...@uwaterloo.ca

 2018-04-18 17:00 GMT-04:00 Julian Hyde :

> Somehow I miss

[jira] [Created] (CALCITE-2270) [SQL:2016] Polymorphic table functions

2018-04-19 Thread Chunhui Shi (JIRA)
Chunhui Shi created CALCITE-2270:


 Summary: [SQL:2016] Polymorphic table functions
 Key: CALCITE-2270
 URL: https://issues.apache.org/jira/browse/CALCITE-2270
 Project: Calcite
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Julian Hyde


Polymorphic table functions: table functions without predefined return type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Can we safely assuming that the rules match nodes which are closer to root would execute earlier than those rules matches nodes which are closer to tableScan?

2018-03-05 Thread Chunhui Shi
I think the answer is no.

At least that is my impression with volcano planner in latest Calcite.
Matched rules will be categorized by matching root nodes' classes. And the
execution, which is, the onMatch() function, will be executed in the order
of iterating through the categories. So the order is not related to the
relnode level or the order of rules in the rule list.

On Mar 5, 2018 7:02 PM, "张静"  wrote:

Hi, guys.
Can we safely assuming that the rules match nodes which are closer to root
would execute earlier than those rules  matches nodes which are closer to
tableScan?
For the following nodes tree, can we assume the rules matches aggregate are
always execute earlier than the rules matches filter?
Aggregate
|_Project
 |_Filter
   |_TableScan


Re: Infinite loop with JoinPushTransitivePredicatesRule

2018-03-02 Thread Chunhui Shi
Could you file correspondent JIRAs to both Calcite and Drill with detailed 
repro steps? This sounds a bug to me. I think this is an issue. We are seeing 
some stuck planning when run on latest Calcite with Drill too - not sure if 
this is relevant though. We could exchange details directly.


From: Vitalii Diravka 
Sent: Thursday, March 1, 2018 4:10:31 PM
To: dev@calcite.apache.org
Subject: Infinite loop with JoinPushTransitivePredicatesRule

Hi all!

I got the infinite loop while using FilterIntoJoinRule +
JoinPushTransitivePredicatesRule in HEP planner for some correlated
queries, for instance:
*select d.deptno from sales.emp d where d.deptno IN (select e.deptno from
sales.emp e where e.deptno = d.deptno or e.deptno = 4)*

I have a reproduce in Calcite. We always get HepRelVertex for one side of
LogicalJoin relNode or LogicalFilter, therefore the rule is fired every
time and infinitely in result.

I've noticed that the rule works fine after JoinToCorrelateRule. But I
think this is not a decision for using JoinPushTransitivePredicatesRule,
even from the performance perspective.

How I can resolve it or maybe I missed something else?

Kind regards
Vitalii


Re: Google Summer of Code 2018

2018-02-26 Thread Chunhui Shi
What about "SQL support for JavaScript Object Notation (JSON)"? 
www.iso.org/standard/67367.html?

There are two categories of JSON functions (8 functions) for query and for 
constructor, some clauses like PLAN, etc are introduced in this standard.

I did not see the supports in mainstream DBs(Oracle and Sql Server) covering 
all of them, but IS JSON, JSON_QUERY and JSON_VALUE are both implemented

SQL server support
https://docs.microsoft.com/en-us/sql/relational-databases/json/validate-query-and-change-json-data-with-built-in-functions-sql-server

Oracle:
https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6374

MySQL and Postgres have a lot functions to handle JSON,  besides JSON_OBJECT or 
one or two other functions, seems none of its function is standardized.
https://dev.mysql.com/doc/refman/5.7/en/json-function-reference.html

Postgres
https://www.postgresql.org/docs/current/static/functions-json.html


While I think the scope could be self contained, if it is not suitable for this 
activity do we want to put this into Calcite roadmap?


Best,

Chunhui


From: Julian Hyde 
Sent: Monday, February 26, 2018 4:00:21 PM
To: dev
Subject: Re: Google Summer of Code 2018

Yes.


> On Feb 26, 2018, at 2:32 PM, Michael Mior  wrote:
>
> Thanks Julian! Would these be some good JIRAs to tag?
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1737&d=DwICAg&c=cskdkSMqhcnjZxdQVpwTXg&r=7bZjGOKpQi7qeyQ_xRpzEQ&m=LYUbb9SvNkKb5PpYzt7-eWRg5qGj9K6txN2zGGcqakg&s=KXsK-qlYTOf5W1KDQmlqjA7CIj4_CtGlmWirBlPxopI&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1861&d=DwICAg&c=cskdkSMqhcnjZxdQVpwTXg&r=7bZjGOKpQi7qeyQ_xRpzEQ&m=LYUbb9SvNkKb5PpYzt7-eWRg5qGj9K6txN2zGGcqakg&s=6xfRRfhSus-SrVHKvEIYKzcyAaEAUJBLdHbkqjUhrRk&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D2031&d=DwICAg&c=cskdkSMqhcnjZxdQVpwTXg&r=7bZjGOKpQi7qeyQ_xRpzEQ&m=LYUbb9SvNkKb5PpYzt7-eWRg5qGj9K6txN2zGGcqakg&s=oLu_OiKf-cFe1RxcmAnv2CFW6QPGbpz5uYeiN5y3EMs&e=
>
> --
> Michael Mior
> mm...@apache.org
>
> 2018-02-26 14:46 GMT-05:00 Julian Hyde :
>
>> Here are two areas that are self-contained and rewarding:
>> * Spatial functions
>> * Spark adapter
>>
>> Julian
>>
>>
>>
>>> On Feb 24, 2018, at 4:24 PM, Michael Mior  wrote:
>>>
>>> You have probably seen that Apache was accepted as an organization for
>> this
>>> year's GSoC. I thought I would see if anyone in the Calcite community can
>>> think of any issues that would be a good fit. It's no guarantee we would
>>> get someone to work on it, but it could be a good push to move some
>>> isolated bits of functionality forward that may not get much attention
>>> otherwise.
>>>
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>
>>



[jira] [Created] (CALCITE-2159) UNNEST to support 'ANY' type

2018-01-30 Thread Chunhui Shi (JIRA)
Chunhui Shi created CALCITE-2159:


 Summary: UNNEST to support 'ANY' type
 Key: CALCITE-2159
 URL: https://issues.apache.org/jira/browse/CALCITE-2159
 Project: Calcite
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Julian Hyde


Not all data source has type information about the input of UNNEST during 
parsing stage. In Drill, if we want to support UNNEST(table.column) syntax for 
a document with nested structure, for now, these two things will happen:

SqlUnnestOperator.inferReturnType will use unknown operand's type 'ANY' so 
isStruct will be false, thus the the following code will hit NULL reference.

 

Another issue is, Should UnnestnameSpace.getTable return a table so when other 
parts of the query tried to refer to some columns coming out of UNNEST, we know 
the query is asking for a column from the table, so the parser could add the 
column to the RowType of UNNEST? An example query is like this:

SELECT AVG(o.o_amount) AS avg_orders FROM  UNNEST(c.orders) AS o

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-1748) Make CalciteCatalogReader.getSchema extendable to support dynamically load schema tree - getSchema need to be set to protected to allow overriding

2017-04-11 Thread Chunhui Shi (JIRA)
Chunhui Shi created CALCITE-1748:


 Summary: Make CalciteCatalogReader.getSchema extendable to support 
dynamically load schema tree - getSchema need to be set to protected to allow 
overriding
 Key: CALCITE-1748
 URL: https://issues.apache.org/jira/browse/CALCITE-1748
 Project: Calcite
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Julian Hyde


In system like Drill, there is a need to load partial schema (e.g. for only one 
storage plugin) only when needed. Since Drill has no way to get a full 
available schema tree before hand, nor could Drill cache available schema for a 
storage plugin(e.g. Hive, MongoDB) since the storage plugin may not have 
notification mechanism to update Schema tree timely.
  
The proposed fix is to load schema dynamically as shown in 
https://issues.apache.org/jira/browse/DRILL-5089

To achieve this, we need to make CalciteCatalogReader.getSchema to be protected 
so it could be overridden by derived class while the derived class can reuse 
other functionalities in CalciteCatalogReader class
private CalciteSchema getSchema(Iterable schemaNames,
  SqlNameMatcher nameMatcher) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CALCITE-1550) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)
Chunhui Shi created CALCITE-1550:


 Summary: ConventionTraitDef.plannerConversionMap is not thread safe
 Key: CALCITE-1550
 URL: https://issues.apache.org/jira/browse/CALCITE-1550
 Project: Calcite
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Julian Hyde
Priority: Critical


We are using static instance ConventionTraitDef.INSTANCE globally and 
plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class is 
not threadsafe. And the data in the map could corrupt and cause dead loop or 
other data error.

  private final WeakHashMap
  plannerConversionMap =
  new WeakHashMap();




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)