[jira] [Created] (CALCITE-3872) Simplify expressions with unary minus

2020-03-24 Thread Liya Fan (Jira)
Liya Fan created CALCITE-3872:
-

 Summary: Simplify expressions with unary minus
 Key: CALCITE-3872
 URL: https://issues.apache.org/jira/browse/CALCITE-3872
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Liya Fan


Support simplifying expression -(-(x)) as x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3871) Remove dependency of org.apiguardian:apiguardian-api

2020-03-24 Thread Danny Chen (Jira)
Danny Chen created CALCITE-3871:
---

 Summary: Remove dependency of org.apiguardian:apiguardian-api
 Key: CALCITE-3871
 URL: https://issues.apache.org/jira/browse/CALCITE-3871
 Project: Calcite
  Issue Type: Improvement
  Components: core, linq4j
Affects Versions: 1.22.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.23.0


The org.apiguardian:apiguardian-api is introduced in CALCITE-3652 in order to 
mark the new introduced API status.

Remove the dependency and copy the class into Calcite because the 
org.apiguardian:apiguardian-api jar has only a single API.java class and it is 
not necessary to add a dependency for that(All the downstream projects that 
have calcite-core as a dependency would see this jar which is annoying).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: STREAM keyword

2020-03-24 Thread Danny Chan
In Apache Flink, we have a syntax:

… A JOIN B for SYSTEM_TIME AS OF A.PROC_TIME

To describe a stream A join a temporal table B where we only want to join the 
records with the current machine time as the time point of table B.

Is that the case Viliam described ?


Best,
Danny Chan
在 2020年3月25日 +0800 AM12:46,Julian Hyde ,写道:
> You’re right that this is a problem.
>
> We’d need some way to say that you don’t care which version of the product 
> table you are joining against. One implication would be that if you replay 
> the query, and the product table has changed in the mean time, you are happy 
> to get different results.
>
> We could devise some syntax to add to the SQL. And/or we could add some 
> annotation to the product TVR. What do you think?
>
> Julian
>
>
> > On Mar 24, 2020, at 12:11 AM, Viliam Durina  wrote:
> >
> > So how would you do a simple stream enrichment query? That is one that for
> > each new record in an append-only relation will join a matching record from
> > a mutable relation that's valid at the processing time? This use case is
> > common, for example in credit card fraud detection, for each transaction
> > you look up the cardholder statistics, merchant statistics, product
> > statistics, transaction history etc. that you have at hand at the moment
> > the transaction is processed and the enriched record is then fed to a
> > rule-based engine or to an ML inference model. You're not interested in
> > later updates in those enrichment tables. In my understanding it is not
> > possible with the proposed semantics.
> >
> > For example, can you refer to the `undo`, `ptime` and `ver` columns in the
> > query itself? We could filter out columns where `ver > 0`:
> >
> > SELECT (
> > SELECT *
> > FROM order_item o
> > JOIN product p USING(product_id)
> > EMIT STREAM
> > ) WHERE ver = 0;
> >
> > You can optimize for the common events, and not use very much memory. For
> > > the rarer events, you can pay the cost of a disk I/O.
> > >
> >
> > With the particular query I don't think you can do this. Let's say the
> > `order_item` is backed by a Kafka topic - you might not have the full
> > history. And even if you do, the receiver of the query results is not
> > interested in retractions and new versions of all the zillions of orders
> > with updated product name. The desired output should be specified by the
> > query itself. And, for example, cardholder statistics could be updated with
> > each transaction in a feedback loop.
> >
> > Viliam
> >
> > --
> > This message contains confidential information and is intended only for the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> > The sender therefore does not accept liability for any errors or omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast
>


Re: set SqlToRelConverter.ConfigBuilder#expand default to false

2020-03-24 Thread JiaTao Tao
Thanks, Julian

I've opened a JIRA: CALCITE-3870 to track and a PR is as followed.


Regards!

Aron Tao


Julian Hyde  于2020年3月25日周三 上午12:55写道:

I agree that expand should default to false in ConfigBuilder.

Rationale: expand = false is the preferred modern behavior. We want
SqlToRelConverter to leave sub-queries as sub-queries (wrapping in
RexSubQuery) and then deal with them later with SubQueryRemoveRule. It is
difficult for SqlToRelConverter to expand sub-queries and also to remove
correlating variables, and there are bugs that are difficult to fix. I
believe that the modern path has fewer bugs. We are not putting much effort
into maintaining the old path.

Can you please log a bug? (Your images did not come through in the email.
If they add significantly to your argument, you could attach them to the
bug.)

This will be a breaking change. However I think it will be acceptable with
a release note.

Julian


> On Mar 23, 2020, at 1:16 AM, JiaTao Tao  wrote:
>
> Hi
>
> Now the default "expand" in  SqlToRelConverter.ConfigBuilder is true but
in calcite's main process, actually, it is false(
`withExpand(THREAD_EXPAND.get())`)
>
>
>
>
>
> That leads we need to explicitly set `withExpand` to false when we use
SqlToRelConverter.
>
> So I think we should change the default to "false" in
SqlToRelConverter.ConfigBuilder.
>
> If you think so, I would like to do the minor change.
>
> Regards!
> Aron Tao


[jira] [Created] (CALCITE-3870) set "SqlToRelConverter.ConfigBuilder#expand" default to false

2020-03-24 Thread Jiatao Tao (Jira)
Jiatao Tao created CALCITE-3870:
---

 Summary: set "SqlToRelConverter.ConfigBuilder#expand" default to 
false
 Key: CALCITE-3870
 URL: https://issues.apache.org/jira/browse/CALCITE-3870
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Jiatao Tao


Now the default "expand" in  SqlToRelConverter.ConfigBuilder is true but in 
calcite's main process, actually, it is false( 
`withExpand(THREAD_EXPAND.get())`)
 
That leads we need to explicitly set `withExpand` to false when we use 
SqlToRelConverter.
 
So I think we should change the default to "false" in 
SqlToRelConverter.ConfigBuilder.
 
!https://mail.google.com/mail/u/0?ui=2&ik=09d5de6bf3&attid=0.1&permmsgid=msg-a:r-1571046416348666034&th=17106774a5349d1d&view=fimg&sz=s0-l75-ft&attbid=ANGjdJ_IjYbiwT8IKlaiAIDzbkq_ehCBktd4DzCYr14Ppst6lBftqMbrrWu8hzGpN2wpCz46a3F1FKvfD_miGHkC-EMx509KPC4FEovxtN2M0WcU1up209hyLtKoZqE&disp=emb&realattid=ii_k8460q950!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Join the community

2020-03-24 Thread Julian Hyde
Sorry, I need to be pedantic. 

In Apache, there is no such thing as a “contributor”. There are committers and 
PMC members, and there is the community - which consists of anyone in the world 
who takes an interest. 

Unfortunately Jira requires people to be given a particular permission in order 
to have Calcite cases assigned to them. But let’s remember that that is just a 
Jira thing, not an Apache thing. 

Folks, please remember that you are in the community just by taking an 
interest. The PMC will offer you committer status based on earned merit - such 
as code contributions, blogging, speaking at conferences, writing documentation 
or just being helpful to others. There is no such thing as a “Calcite 
contributor”. 

Julian

> On Mar 24, 2020, at 2:13 PM, Francis Chuang  wrote:
> 
> Hey Roman,
> 
> I've added you as a contributor to the project.
> 
> Francis
> 
>> On 25/03/2020 6:47 am, Roman Kondakov wrote:
>> Hi everybody!
>> My name is Roman Kondakov. I use Calcite for building SQL layer on the
>> top of Apache Ignite.
>> I would like to join the Calcite community. I will start with minor
>> tasks (I have a couple on my mind) to understand your processes better.
>> I would also appreciate any help.
>> My Jira id is rkondakov.
>> Thank you in advance!


Re: Join the community

2020-03-24 Thread Francis Chuang

Hey Roman,

I've added you as a contributor to the project.

Francis

On 25/03/2020 6:47 am, Roman Kondakov wrote:

Hi everybody!

My name is Roman Kondakov. I use Calcite for building SQL layer on the
top of Apache Ignite.

I would like to join the Calcite community. I will start with minor
tasks (I have a couple on my mind) to understand your processes better.
I would also appreciate any help.

My Jira id is rkondakov.

Thank you in advance!



Re: Promotion factoids about Calcite

2020-03-24 Thread Rui Wang
Maybe add that Apache Calcite empowers cloud computing: AWS Kinesis, Google
Cloud Dataflow, etc provides SQL that is using the help of Calcite. (Let me
know if I mis-understand those products if they are not using Calcite).


-Rui



On Tue, Mar 24, 2020 at 11:29 AM Stamatis Zampetakis 
wrote:

> I agree with Francis. Indeed, it would be nice to have some feedback from
> the people in the companies.
>
> Apart from that here are a few quick ideas:
>
> 1) Flink, Hive, Druid, Solr, Phoenix, and many more data management systems
> provide full-fledged SQL capabilities all thanks to Apache Calcite.
> 2) Eclipse Memory Analyzer allows to efficiently query Java memory heap
> dumps via SQL by using Apache Calcite.
> 3) With Apache Calcite anybody can query anything via SQL with almost zero
> configuration through built-in connectors; from CSV, JSON files in your
> local file system to well-known NoSQL systems like Cassandra and Redis.
> 4) SuperSQL by Tencent is able to integrate and query many heterogeneous
> data-sources (e.g., RDBMS, ES, Hive, Flink, Spark, Presto, ClickHouse) by
> using Apache Calcite.
>
> Let me know what you think. Something worth submitting?
>
> Best,
> Stamatis
>
>
> On Tue, Mar 24, 2020, 11:03 AM Francis Chuang 
> wrote:
>
> > I think Alibaba and a quite a few other companies[1] are heavy users of
> > Calcite. Perhaps someone from those companies can write up a nice little
> > factoid.
> >
> > [1] https://calcite.apache.org/docs/powered_by.html
> >
> > On 24/03/2020 8:26 pm, Stamatis Zampetakis wrote:
> > > Hello,
> > >
> > > There is an effort for promoting Apache projects by sharing impressive
> > > things or highly-visible  implementations/deployments.
> > >
> > > Examples from other projects:
> > >
> > > 1) Apple Siri completes full ring replication around the world in 10
> > > seconds using Apache HBase.
> > >
> > > 2) More than 60% of Apache projects use Apache Maven for build
> > management.
> > >
> > > 3) Netflix uses Apache Druid to manage its 1.5 trillion-row data
> > warehouse
> > > requirements that include what users see when tapping the Netflix icon
> or
> > > logging in from a browser across platforms.
> > >
> > > Does anybody has some ideas of what we could write about Calcite?
> > >
> > > Best,
> > > Stamatis
> > >
> >
>


Join the community

2020-03-24 Thread Roman Kondakov
Hi everybody!

My name is Roman Kondakov. I use Calcite for building SQL layer on the
top of Apache Ignite.

I would like to join the Calcite community. I will start with minor
tasks (I have a couple on my mind) to understand your processes better.
I would also appreciate any help.

My Jira id is rkondakov.

Thank you in advance!

-- 
Kind Regards
Roman Kondakov



Re: Promotion factoids about Calcite

2020-03-24 Thread Stamatis Zampetakis
I agree with Francis. Indeed, it would be nice to have some feedback from
the people in the companies.

Apart from that here are a few quick ideas:

1) Flink, Hive, Druid, Solr, Phoenix, and many more data management systems
provide full-fledged SQL capabilities all thanks to Apache Calcite.
2) Eclipse Memory Analyzer allows to efficiently query Java memory heap
dumps via SQL by using Apache Calcite.
3) With Apache Calcite anybody can query anything via SQL with almost zero
configuration through built-in connectors; from CSV, JSON files in your
local file system to well-known NoSQL systems like Cassandra and Redis.
4) SuperSQL by Tencent is able to integrate and query many heterogeneous
data-sources (e.g., RDBMS, ES, Hive, Flink, Spark, Presto, ClickHouse) by
using Apache Calcite.

Let me know what you think. Something worth submitting?

Best,
Stamatis


On Tue, Mar 24, 2020, 11:03 AM Francis Chuang 
wrote:

> I think Alibaba and a quite a few other companies[1] are heavy users of
> Calcite. Perhaps someone from those companies can write up a nice little
> factoid.
>
> [1] https://calcite.apache.org/docs/powered_by.html
>
> On 24/03/2020 8:26 pm, Stamatis Zampetakis wrote:
> > Hello,
> >
> > There is an effort for promoting Apache projects by sharing impressive
> > things or highly-visible  implementations/deployments.
> >
> > Examples from other projects:
> >
> > 1) Apple Siri completes full ring replication around the world in 10
> > seconds using Apache HBase.
> >
> > 2) More than 60% of Apache projects use Apache Maven for build
> management.
> >
> > 3) Netflix uses Apache Druid to manage its 1.5 trillion-row data
> warehouse
> > requirements that include what users see when tapping the Netflix icon or
> > logging in from a browser across platforms.
> >
> > Does anybody has some ideas of what we could write about Calcite?
> >
> > Best,
> > Stamatis
> >
>


Re: Re: NPE at VolcanoPlanner.setRoot()

2020-03-24 Thread Haisheng Yuan
Hi João,

Can you provide minimal reproducible test cases?
You can log a JIRA if you believe this is a bug.

- Haisheng

--
发件人:João Silva
日 期:2020年03月24日 23:40:12
收件人:
主 题:Re: NPE at VolcanoPlanner.setRoot()

Currently on 1.21.0. And no, I didn't implement any Convention.

Stamatis Zampetakis  escreveu no dia terça, 24/03/2020
à(s) 15:26:

> Hi João,
>
> Which Calcite version are you using?
>
> Did you implement your own Convention (and the method getInterface)?
>
> Best,
> Stamatis
>
> On Tue, Mar 24, 2020 at 4:12 PM João Silva  wrote:
>
> > I keep getting a NPE exception using the method setRoot() even though my
> > RelNode is not null. Does anyone have any idea about what could be the
> > problem?
> >
> > java.lang.NullPointerException
> > > at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1654)
> > > at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:296)
> > > at optimizer.VolcanoTest.main(VolcanoTest.java:134)
> > >
> >
> > Thank you.
> >
>



Re: set SqlToRelConverter.ConfigBuilder#expand default to false

2020-03-24 Thread Julian Hyde
I agree that expand should default to false in ConfigBuilder.

Rationale: expand = false is the preferred modern behavior. We want 
SqlToRelConverter to leave sub-queries as sub-queries (wrapping in RexSubQuery) 
and then deal with them later with SubQueryRemoveRule. It is difficult for 
SqlToRelConverter to expand sub-queries and also to remove correlating 
variables, and there are bugs that are difficult to fix. I believe that the 
modern path has fewer bugs. We are not putting much effort into maintaining the 
old path.

Can you please log a bug? (Your images did not come through in the email. If 
they add significantly to your argument, you could attach them to the bug.)

This will be a breaking change. However I think it will be acceptable with a 
release note.

Julian


> On Mar 23, 2020, at 1:16 AM, JiaTao Tao  wrote:
> 
> Hi
> 
> Now the default "expand" in  SqlToRelConverter.ConfigBuilder is true but in 
> calcite's main process, actually, it is false( 
> `withExpand(THREAD_EXPAND.get())`)
> 
> 
> 
> 
> 
> That leads we need to explicitly set `withExpand` to false when we use 
> SqlToRelConverter.
> 
> So I think we should change the default to "false" in 
> SqlToRelConverter.ConfigBuilder.
> 
> If you think so, I would like to do the minor change.
> 
> Regards!
> Aron Tao



Re: STREAM keyword

2020-03-24 Thread Julian Hyde
You’re right that this is a problem.

We’d need some way to say that you don’t care which version of the product 
table you are joining against. One implication would be that if you replay the 
query, and the product table has changed in the mean time, you are happy to get 
different results.

We could devise some syntax to add to the SQL. And/or we could add some 
annotation to the product TVR. What do you think?

Julian


> On Mar 24, 2020, at 12:11 AM, Viliam Durina  wrote:
> 
> So how would you do a simple stream enrichment query? That is one that for
> each new record in an append-only relation will join a matching record from
> a mutable relation that's valid at the processing time? This use case is
> common, for example in credit card fraud detection, for each transaction
> you look up the cardholder statistics, merchant statistics, product
> statistics, transaction history etc. that you have at hand at the moment
> the transaction is processed and the enriched record is then fed to a
> rule-based engine or to an ML inference model. You're not interested in
> later updates in those enrichment tables. In my understanding it is not
> possible with the proposed semantics.
> 
> For example, can you refer to the `undo`, `ptime` and `ver` columns in the
> query itself? We could filter out columns where `ver > 0`:
> 
> SELECT (
>  SELECT *
>  FROM order_item o
>JOIN product p USING(product_id)
>  EMIT STREAM
> ) WHERE ver = 0;
> 
> You can optimize for the common events, and not use very much memory. For
>> the rarer events, you can pay the cost of a disk I/O.
>> 
> 
> With the particular query I don't think you can do this. Let's say the
> `order_item` is backed by a Kafka topic - you might not have the full
> history. And even if you do, the receiver of the query results is not
> interested in retractions and new versions of all the zillions of orders
> with updated product name. The desired output should be specified by the
> query itself. And, for example, cardholder statistics could be updated with
> each transaction in a feedback loop.
> 
> Viliam
> 
> -- 
> This message contains confidential information and is intended only for the 
> individuals named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail if you have received this e-mail by mistake and 
> delete this e-mail from your system. E-mail transmission cannot be 
> guaranteed to be secure or error-free as information could be intercepted, 
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
> The sender therefore does not accept liability for any errors or omissions 
> in the contents of this message, which arise as a result of e-mail 
> transmission. If verification is required, please request a hard-copy 
> version. -Hazelcast



Re: Split Join condition with CAST which only widening nullability

2020-03-24 Thread Julian Hyde
It does seem to be something that RelBuilder could do. (RexSimplify can’t 
really do it, because it doesn’t know how the expression is being used.)

It’s also worth discovering why the CAST was added in the first place. It 
doesn’t seem to be helpful. I think we should strive to eliminate all of the 
slightly unhelpful things that Calcite does; those things can add up and cause 
major inefficiencies in the planning process and/or sub-optimal plans.

Julian


> On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:
> 
> Hey,
> 
> That's a great diagnosis :)
> I would guess that newCondition became non-nullable for some reason 
> (rexSimplify runs under RexProgramBuilder so it might be able to narrow the 
> nullability)
> you could try invoking simplify.simplifyPreservingType() on it to see if that 
> would help.
> 
> > I know it's necessary to preserve the nullability when simplifying a 
> > boolean expression in project columns, but as for condition in Filter/Calc, 
> > may be we can omit the
> > nullability?
> I think that could probably work - we can't change the nullability on project 
> columns because those could be referenced (and the reference also has the 
> type) ; but for filter/join conditions we don't need to care with it.
> It seems we already have a "matchnullability" in ReduceExpressionsRule ; for 
> FILTER/JOIN we should probably turn that off...  :)
> 
> cheers,
> Zoltan
> 
> 
> On 3/24/20 9:15 AM, Shuo Cheng wrote:
>> Hi Zoltan,
>> I encountered the problem when running TPC tests, and have not reproduced it 
>> in Calcite master.
>> But I figured it out how the problem is produced:
>> There is semi join with the condition:AND(EXPANDED_INDF1, EXPANDED_INDF2), 
>> type of AND is BOOLEAN with nullable `true`
>> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), type 
>> of AND is BOOLEAN with nullable `true`
>> After  SemiJoinProjectTransposeRule --> Join condition: CAST(AND(INDF1, 
>> INDF2)), type of AND is BOOLEAN with nullable `false`
>> Just as what you suspected, It's in `SemiJoinProjectTransposeRule` where 
>> forced type correction is added by `RexProgramBuilder#addCondition`, which 
>> will call `RexSimplify#simplifyPreservingType` before registering an 
>> expression.
>> I know it's necessary to preserve the nullability when simplifying a boolean 
>> expression in project columns, but as for condition in Filter/Calc, may be 
>> we can omit the nullability?
>> Best Regards,
>> Shuo
>> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich > > wrote:
>>Hey Shuo!
>>I think that simplification should been made on join conditions - I've 
>> done a quick check; and it seems to be working for me.
>>I suspected that it will be either a missing call to RexSimplify for some 
>> reason - or it is added by a forced return type correction: IIRC there are 
>> some cases in which
>>the
>>RexNode type should retained after simplification.
>>Is this reproducible on current master; could you share a testcase?
>>cheers,
>>Zoltan
>>On 3/24/20 7:28 AM, Shuo Cheng wrote:
>> > Hi, Julian, That's what we do as a workaround way. we remove CAST 
>> which are
>> > only widening nullability as what CALCITE-2695 does before applying
>> > hash-join/sort-merge-join rule, such that equiv predicate can be split
>> > out.  I'm not sure whether it's properly for Calcite to do the 'convert
>> > back' job, for example, simplify the join condition when create a 
>> Join; Or
>> > maybe let other systems what use Calcite to do the "convert back" job 
>> as an
>> > optimization? What do you think?
>> >
>> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde > > wrote:
>> >
>> >> Or convert it back to a not-nullable BOOLEAN? The join condition 
>> treats
>> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so 
>> the
>> >> conditions with and without the CAST are equivalent.
>> >>
>> >> Julian
>> >>
>> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng > > wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
>> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,
>> >> Calcite
>> >>> can not split out equiv predicate, thus some join operation like hash
>> >> join
>> >>> / sort merge join may not be used. Maybe we can
>> >>> expand RelOptUtil#splitJoinCondition to support this scenario?
>> >>
>> >



Re: NPE at VolcanoPlanner.setRoot()

2020-03-24 Thread João Silva
Currently on 1.21.0. And no, I didn't implement any Convention.

Stamatis Zampetakis  escreveu no dia terça, 24/03/2020
à(s) 15:26:

> Hi João,
>
> Which Calcite version are you using?
>
> Did you implement your own Convention (and the method getInterface)?
>
> Best,
> Stamatis
>
> On Tue, Mar 24, 2020 at 4:12 PM João Silva  wrote:
>
> > I keep getting a NPE exception using the method setRoot() even though my
> > RelNode is not null. Does anyone have any idea about what could be the
> > problem?
> >
> > java.lang.NullPointerException
> > > at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1654)
> > > at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:296)
> > > at optimizer.VolcanoTest.main(VolcanoTest.java:134)
> > >
> >
> > Thank you.
> >
>


Re: NPE at VolcanoPlanner.setRoot()

2020-03-24 Thread Stamatis Zampetakis
Hi João,

Which Calcite version are you using?

Did you implement your own Convention (and the method getInterface)?

Best,
Stamatis

On Tue, Mar 24, 2020 at 4:12 PM João Silva  wrote:

> I keep getting a NPE exception using the method setRoot() even though my
> RelNode is not null. Does anyone have any idea about what could be the
> problem?
>
> java.lang.NullPointerException
> > at
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1654)
> > at
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:296)
> > at optimizer.VolcanoTest.main(VolcanoTest.java:134)
> >
>
> Thank you.
>


NPE at VolcanoPlanner.setRoot()

2020-03-24 Thread João Silva
I keep getting a NPE exception using the method setRoot() even though my
RelNode is not null. Does anyone have any idea about what could be the
problem?

java.lang.NullPointerException
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1654)
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:296)
> at optimizer.VolcanoTest.main(VolcanoTest.java:134)
>

Thank you.


Calcite-Master - Build # 1655 - Still Failing

2020-03-24 Thread Apache Jenkins Server
The Apache Jenkins build system has built Calcite-Master (build #1655)

Status: Still Failing

Check console output at https://builds.apache.org/job/Calcite-Master/1655/ to 
view the results.

Re: Promotion factoids about Calcite

2020-03-24 Thread Francis Chuang
I think Alibaba and a quite a few other companies[1] are heavy users of 
Calcite. Perhaps someone from those companies can write up a nice little 
factoid.


[1] https://calcite.apache.org/docs/powered_by.html

On 24/03/2020 8:26 pm, Stamatis Zampetakis wrote:

Hello,

There is an effort for promoting Apache projects by sharing impressive
things or highly-visible  implementations/deployments.

Examples from other projects:

1) Apple Siri completes full ring replication around the world in 10
seconds using Apache HBase.

2) More than 60% of Apache projects use Apache Maven for build management.

3) Netflix uses Apache Druid to manage its 1.5 trillion-row data warehouse
requirements that include what users see when tapping the Netflix icon or
logging in from a browser across platforms.

Does anybody has some ideas of what we could write about Calcite?

Best,
Stamatis



Promotion factoids about Calcite

2020-03-24 Thread Stamatis Zampetakis
Hello,

There is an effort for promoting Apache projects by sharing impressive
things or highly-visible  implementations/deployments.

Examples from other projects:

1) Apple Siri completes full ring replication around the world in 10
seconds using Apache HBase.

2) More than 60% of Apache projects use Apache Maven for build management.

3) Netflix uses Apache Druid to manage its 1.5 trillion-row data warehouse
requirements that include what users see when tapping the Netflix icon or
logging in from a browser across platforms.

Does anybody has some ideas of what we could write about Calcite?

Best,
Stamatis


Re: Split Join condition with CAST which only widening nullability

2020-03-24 Thread Zoltan Haindrich

Hey,

That's a great diagnosis :)
I would guess that newCondition became non-nullable for some reason 
(rexSimplify runs under RexProgramBuilder so it might be able to narrow the 
nullability)
you could try invoking simplify.simplifyPreservingType() on it to see if that 
would help.

> I know it's necessary to preserve the nullability when simplifying a boolean 
expression in project columns, but as for condition in Filter/Calc, may be we can 
omit the
> nullability?
I think that could probably work - we can't change the nullability on project columns because those could be referenced (and the reference also has the type) ; but for 
filter/join conditions we don't need to care with it.

It seems we already have a "matchnullability" in ReduceExpressionsRule ; for 
FILTER/JOIN we should probably turn that off...  :)

cheers,
Zoltan


On 3/24/20 9:15 AM, Shuo Cheng wrote:

Hi Zoltan,

I encountered the problem when running TPC tests, and have not reproduced it in 
Calcite master.

But I figured it out how the problem is produced:

There is semi join with the condition:AND(EXPANDED_INDF1, EXPANDED_INDF2), type 
of AND is BOOLEAN with nullable `true`

After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), type of 
AND is BOOLEAN with nullable `true`

After  SemiJoinProjectTransposeRule --> Join condition: CAST(AND(INDF1, 
INDF2)), type of AND is BOOLEAN with nullable `false`

Just as what you suspected, It's in `SemiJoinProjectTransposeRule` where forced type correction is added by `RexProgramBuilder#addCondition`, which will call 
`RexSimplify#simplifyPreservingType` before registering an expression.


I know it's necessary to preserve the nullability when simplifying a boolean expression in project columns, but as for condition in Filter/Calc, may be we can omit the 
nullability?



Best Regards,
Shuo

On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich mailto:k...@rxd.hu>> wrote:

Hey Shuo!

I think that simplification should been made on join conditions - I've done 
a quick check; and it seems to be working for me.
I suspected that it will be either a missing call to RexSimplify for some 
reason - or it is added by a forced return type correction: IIRC there are some 
cases in which
the
RexNode type should retained after simplification.
Is this reproducible on current master; could you share a testcase?

cheers,
Zoltan


On 3/24/20 7:28 AM, Shuo Cheng wrote:
 > Hi, Julian, That's what we do as a workaround way. we remove CAST which 
are
 > only widening nullability as what CALCITE-2695 does before applying
 > hash-join/sort-merge-join rule, such that equiv predicate can be split
 > out.  I'm not sure whether it's properly for Calcite to do the 'convert
 > back' job, for example, simplify the join condition when create a Join; 
Or
 > maybe let other systems what use Calcite to do the "convert back" job as 
an
 > optimization? What do you think?
 >
 > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde mailto:jhyde.apa...@gmail.com>> wrote:
 >
 >> Or convert it back to a not-nullable BOOLEAN? The join condition treats
 >> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so the
 >> conditions with and without the CAST are equivalent.
 >>
 >> Julian
 >>
 >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng mailto:njucs...@gmail.com>> wrote:
 >>>
 >>> Hi all,
 >>>
 >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
 >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,
 >> Calcite
 >>> can not split out equiv predicate, thus some join operation like hash
 >> join
 >>> / sort merge join may not be used. Maybe we can
 >>> expand RelOptUtil#splitJoinCondition to support this scenario?
 >>
 >



Re: [DISCUSS] get RexExecutor from RexSimplify in method reduceExpressionsInternal

2020-03-24 Thread Chunwei Lei
IMHO, A disadvantage of supplying a default RexExecutor is that we cannot
make
sure that the reduced result is the same as the result of the execution
engine,
especially when there is some customized implementation.


Best,
Chunwei


On Fri, Mar 20, 2020 at 11:16 AM Danny Chan  wrote:

> This is a preference, I would prefer the default value to not throw
> exceptions.
>
> Best,
> Danny Chan
> 在 2020年3月18日 +0800 PM3:53,Stamatis Zampetakis ,写道:
> > If a Janino exception comes up then it is a bug that we have to fix since
> > it violates the contract of the interface.
> >
> > From my point of view the modification is meaningful for two reasons:
> > * improves code readability;
> > * avoids confusing behavior where the rules for performing
> > constant reduction are present but this does not really happen (because
> > there is an executor missing).
> >
> > I would say that in production, if the engine does not want to perform
> > constant reduction, it is equally easy to not register the respective
> rules.
> >
> > Best,
> > Stamatis
> >
> > On Wed, Mar 18, 2020 at 3:29 AM Danny Chan  wrote:
> >
> > > I’m a little worried about it the default RexExecutorImpl can handle
> all
> > > the downstream projects expressions, and very probably not, there
> would be
> > > some Janino compile exception if it can not translate the RexNodes
> > > correctly.
> > >
> > > So strictly to say, change the RexExecutor to a default implementation
> may
> > > break something. I think it’s better if we have a real case to
> illustrate
> > > that the modification is meaningful.
> > >
> > > In production, if an engine really wants to support constant reduction
> for
> > > their all kinds of expression, they should set up the RexExecutor
> > > explicitly. If they do not set up that, the constant reduction just not
> > > happens, it is better than supplying a default RexExecutor but does not
> > > really work for all expression.
> > >
> > > So I’m +0 for this.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年3月17日 +0800 PM4:16,JiaTao Tao ,写道:
> > > > Hi Danny
> > > >
> > > > Thanks for your reply, I think Stamatis Zampetakis's opinion is
> > > summative,
> > > > and here the problem I think is a default RexExecutor is better than
> > > null,
> > > > especially, in this case, cuz `reduceExpressionsInternal` and
> > > > `reduceExpressions` is in the same path, thought the use of
> RexExecutor
> > > may
> > > > be different, but it still makes people confusing.
> > > >
> > > > IMHO, if "return RexUtil.EXECUTOR" >= "return null", we can do the
> > > modify.
> > > >
> > > > If you think so, I can open a JIRA and do this minor change.
> > > >
> > > > Hope to hear your voice.
> > > >
> > > > Regards!
> > > >
> > > > Aron Tao
> > > >
> > > >
> > > > JiaTao Tao  于2020年3月17日周二 下午4:02写道:
> > > >
> > > > > Hi Stamatis Zampetakis
> > > > >
> > > > > I agree with this completely: "The API of RexExecutor says the
> > > following
> > > > > "If an expression cannot be
> > > > > reduced, writes the original expression..." so we don't break
> anything
> > > by
> > > > > providing a default one."
> > > > >
> > > > >
> > > > >
> > > > > Regards!
> > > > >
> > > > > Aron Tao
> > > > >
> > > > >
> > > > > Stamatis Zampetakis  于2020年3月16日周一 下午9:52写道:
> > > > >
> > > > > > Interestingly, I was looking at this same piece of code not so
> long
> > > ago
> > > > > > and
> > > > > > I agree it is a bit confusing.
> > > > > >
> > > > > > Looking around the places that we obtain a RexExecutor, most
> often
> > > > > > (always?) we observe the following pattern:
> > > > > >
> > > > > > RexExecutor executor =
> > > > > > Util.first(query.getCluster().getPlanner().getExecutor(),
> > > > > > RexUtil.EXECUTOR);
> > > > > >
> > > > > > I think it is always useful to have an executor in the planner
> thus
> > > I am
> > > > > > tempted to change the API of RelOptPlanner#getExecutor to always
> > > return an
> > > > > > (default) executor if an explicit one is not set.
> > > > > >
> > > > > > The API of RexExecutor says the following "If an expression
> cannot be
> > > > > > reduced, writes the original expression..." so we don't break
> > > anything by
> > > > > > providing a default one.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > > > On Mon, Mar 16, 2020 at 11:11 AM Danny Chan <
> yuzhao@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks, the code is a little mess, here is how I understand it:
> > > > > > >
> > > > > > > The executor from `final RexExecutor executor
> > > > > > > = Util.first(cluster.getPlanner().getExecutor(),
> > > RexUtil.EXECUTOR)` is
> > > > > > > mainly used to construct the RexSimplify, in the RexSimplify,
> the
> > > > > > > expression that we evaluate is what we can make sure
> > > RexUtil.EXECUTOR
> > > > > > can
> > > > > > > resolve(if you check the code, it only reduce the literals).
> > > > > > >
> > > > > > > But the expressions in the ReduceExpressionsRule may b

Re: Split Join condition with CAST which only widening nullability

2020-03-24 Thread Zoltan Haindrich

Hey Shuo!

I think that simplification should been made on join conditions - I've done a 
quick check; and it seems to be working for me.
I suspected that it will be either a missing call to RexSimplify for some reason - or it is added by a forced return type correction: IIRC there are some cases in which the 
RexNode type should retained after simplification.

Is this reproducible on current master; could you share a testcase?

cheers,
Zoltan


On 3/24/20 7:28 AM, Shuo Cheng wrote:

Hi, Julian, That's what we do as a workaround way. we remove CAST which are
only widening nullability as what CALCITE-2695 does before applying
hash-join/sort-merge-join rule, such that equiv predicate can be split
out.  I'm not sure whether it's properly for Calcite to do the 'convert
back' job, for example, simplify the join condition when create a Join; Or
maybe let other systems what use Calcite to do the "convert back" job as an
optimization? What do you think?

On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde  wrote:


Or convert it back to a not-nullable BOOLEAN? The join condition treats
UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so the
conditions with and without the CAST are equivalent.

Julian


On Mar 23, 2020, at 9:34 PM, Shuo Cheng  wrote:

Hi all,

Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,

Calcite

can not split out equiv predicate, thus some join operation like hash

join

/ sort merge join may not be used. Maybe we can
expand RelOptUtil#splitJoinCondition to support this scenario?






[jira] [Created] (CALCITE-3869) Stackoverflow with large OR statements

2020-03-24 Thread Stephan Pirnbaum (Jira)
Stephan Pirnbaum created CALCITE-3869:
-

 Summary: Stackoverflow with large OR statements
 Key: CALCITE-3869
 URL: https://issues.apache.org/jira/browse/CALCITE-3869
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.22.0
Reporter: Stephan Pirnbaum
 Attachments: stackoverflow.txt

As described in CALCITE-2792 large OR clauses lead to a StackOverflowError. 
While the ticket was closed with the remark "Resolved in release 1.22.0", the 
issue originally stated was not (completely) resolved. To reproduce this, I 
implemented following simple test case:
{code:java}
@Test
public void testLargeOr() {
  String orClause = IntStream.range(0, 1000).boxed()
 .map(i -> "e.\"empid\"=" + i)
 .collect(Collectors.joining(" OR "));
  final String sql = "SELECT * FROM \"hr\".\"emps\" e WHERE " + orClause;
  CalciteAssert.model(JdbcTest.HR_MODEL)
  .query(sql)
  .runs();
}{code}
 

The stackoverflow can be seen in the attached log.

As also CALCITE-2696 and CALCITE-2630 are not fixed, this is a blocking issue 
in our current use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: STREAM keyword

2020-03-24 Thread Viliam Durina
So how would you do a simple stream enrichment query? That is one that for
each new record in an append-only relation will join a matching record from
a mutable relation that's valid at the processing time? This use case is
common, for example in credit card fraud detection, for each transaction
you look up the cardholder statistics, merchant statistics, product
statistics, transaction history etc. that you have at hand at the moment
the transaction is processed and the enriched record is then fed to a
rule-based engine or to an ML inference model. You're not interested in
later updates in those enrichment tables. In my understanding it is not
possible with the proposed semantics.

For example, can you refer to the `undo`, `ptime` and `ver` columns in the
query itself? We could filter out columns where `ver > 0`:

SELECT (
  SELECT *
  FROM order_item o
JOIN product p USING(product_id)
  EMIT STREAM
) WHERE ver = 0;

You can optimize for the common events, and not use very much memory. For
> the rarer events, you can pay the cost of a disk I/O.
>

With the particular query I don't think you can do this. Let's say the
`order_item` is backed by a Kafka topic - you might not have the full
history. And even if you do, the receiver of the query results is not
interested in retractions and new versions of all the zillions of orders
with updated product name. The desired output should be specified by the
query itself. And, for example, cardholder statistics could be updated with
each transaction in a feedback loop.

Viliam

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast