Re: Split Join condition with CAST which only widening nullability

2020-03-31 Thread Julian Hyde
Thanks Zoltan. I added a test to RelBuilderTest and I can confirm that
'_ IS TRUE' is stripped. And I agree that we cannot safely strip '_ IS
NOT FALSE'.

On Tue, Mar 31, 2020 at 5:11 AM Zoltan Haindrich  wrote:
>
>  > we should consider that 'b IS TRUE' and 'b IS NOT FALSE'
>
> I think we already do that... UnknownAs.FALSE essentially means that the 
> expression is enclosed in an "IS TRUE" - since we run filter/join condition 
> simplification in UAF
> mode; its allowed to remove "_ IS TRUE"
>
> At the same time I don't see a way how "X IS NOT FALSE" could be removed 
> because in case X is null; the expression should evaluate to true (this 
> expression translates to
> UnknownAs.TRUE mode) - which could be lost in case of a rewrite. We may 
> consider adding the UnknownAs mode to the filter/join node; but I think that 
> would just cause
> trouble; are there some other way which I've not considered?
>
> cheers,
> Zoltan
>
> On 3/30/20 7:19 PM, Julian Hyde wrote:
> > If we're going down the path, we should consider that 'b IS TRUE' and
> > 'b IS NOT FALSE' are somewhat like casts. Removing them from join
> > conditions does not affect the result of the join.
> >
> > And the same apply to filter conditions.
> >
> > I don't know whether removing casts, _ IS TRUE and _ IS NOT FALSE from
> > conditions genuinely make the world "simpler". But let's try it and
> > see.
> >
> > On Mon, Mar 30, 2020 at 8:06 AM Zoltan Haindrich  wrote:
> >>
> >> Hey Shuo!
> >>
> >> Thank you for sharing the testcase! I've seen that you were able to fix it 
> >> by calling the builder instead of copy - right now I think fixing this 
> >> thru ReduceExpressionRule
> >> might be better - as it could also fix up other cases.
> >> I've tried disabling nullability retainment for filters/join conditions - 
> >> and it seems to be working; I'll submit it under [1].
> >>
> >> Julian: I recommended to try that to provide a quick check to see if at 
> >> that point the issue could be fixed - I was confident that by disabling 
> >> "matchNullability" for
> >> "simplifyPreservingType()" will do the right thing and it doesn't add an 
> >> unnecessary cast - instead it safely removes it; however: it still added 
> >> the cast...and by doing so
> >> it didn't helped :)
> >>
> >> [1] https://issues.apache.org/jira/browse/CALCITE-3887
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >> On 3/26/20 12:22 PM, Shuo Cheng wrote:
> >>> I think we may solve the problem from two aspects:
> >>> 1. Do not try to preserve type (nullability) of Join/Filter condition
> >>> expression when simplifying or something like pushing down.
> >>> 2. We can do some work (remove unnecessary CAST) right before create a
> >>> Join/Filter, as Julian said, something in RelBuilder could be done.
> >>> I've do some fix in above Link (remove unnecessary CAST when doing
> >>> pushDownEqualJoinConditions)
> >>>
> >>> On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng  wrote:
> >>>
>  Sorry for the late reply, I've reproduced the problem here
>  https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
>  .
> 
>  On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde  wrote:
> 
> > It does seem to be something that RelBuilder could do. (RexSimplify 
> > can’t
> > really do it, because it doesn’t know how the expression is being used.)
> >
> > It’s also worth discovering why the CAST was added in the first place. 
> > It
> > doesn’t seem to be helpful. I think we should strive to eliminate all of
> > the slightly unhelpful things that Calcite does; those things can add up
> > and cause major inefficiencies in the planning process and/or 
> > sub-optimal
> > plans.
> >
> > Julian
> >
> >
> >> On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:
> >>
> >> Hey,
> >>
> >> That's a great diagnosis :)
> >> I would guess that newCondition became non-nullable for some reason
> > (rexSimplify runs under RexProgramBuilder so it might be able to narrow 
> > the
> > nullability)
> >> you could try invoking simplify.simplifyPreservingType() on it to see
> > if that would help.
> >>
> >>> I know it's necessary to preserve the nullability when simplifying a
> > boolean expression in project columns, but as for condition in 
> > Filter/Calc,
> > may be we can omit the
> >>> nullability?
> >> I think that could probably work - we can't change the nullability on
> > project columns because those could be referenced (and the reference 
> > also
> > has the type) ; but for filter/join conditions we don't need to care 
> > with
> > it.
> >> It seems we already have a "matchnullability" in ReduceExpressionsRule
> > ; for FILTER/JOIN we should probably turn that off...  :)
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >> On 3/24/20 9:15 AM, Shuo Cheng wrote:
> >>> Hi Zoltan,
> >>> I encountered th

Re: Split Join condition with CAST which only widening nullability

2020-03-31 Thread Zoltan Haindrich

> we should consider that 'b IS TRUE' and 'b IS NOT FALSE'

I think we already do that... UnknownAs.FALSE essentially means that the expression is enclosed in an "IS TRUE" - since we run filter/join condition simplification in UAF 
mode; its allowed to remove "_ IS TRUE"


At the same time I don't see a way how "X IS NOT FALSE" could be removed because in case X is null; the expression should evaluate to true (this expression translates to 
UnknownAs.TRUE mode) - which could be lost in case of a rewrite. We may consider adding the UnknownAs mode to the filter/join node; but I think that would just cause 
trouble; are there some other way which I've not considered?


cheers,
Zoltan

On 3/30/20 7:19 PM, Julian Hyde wrote:

If we're going down the path, we should consider that 'b IS TRUE' and
'b IS NOT FALSE' are somewhat like casts. Removing them from join
conditions does not affect the result of the join.

And the same apply to filter conditions.

I don't know whether removing casts, _ IS TRUE and _ IS NOT FALSE from
conditions genuinely make the world "simpler". But let's try it and
see.

On Mon, Mar 30, 2020 at 8:06 AM Zoltan Haindrich  wrote:


Hey Shuo!

Thank you for sharing the testcase! I've seen that you were able to fix it by 
calling the builder instead of copy - right now I think fixing this thru 
ReduceExpressionRule
might be better - as it could also fix up other cases.
I've tried disabling nullability retainment for filters/join conditions - and 
it seems to be working; I'll submit it under [1].

Julian: I recommended to try that to provide a quick check to see if at that point the 
issue could be fixed - I was confident that by disabling "matchNullability" for
"simplifyPreservingType()" will do the right thing and it doesn't add an 
unnecessary cast - instead it safely removes it; however: it still added the cast...and 
by doing so
it didn't helped :)

[1] https://issues.apache.org/jira/browse/CALCITE-3887

cheers,
Zoltan


On 3/26/20 12:22 PM, Shuo Cheng wrote:

I think we may solve the problem from two aspects:
1. Do not try to preserve type (nullability) of Join/Filter condition
expression when simplifying or something like pushing down.
2. We can do some work (remove unnecessary CAST) right before create a
Join/Filter, as Julian said, something in RelBuilder could be done.
I've do some fix in above Link (remove unnecessary CAST when doing
pushDownEqualJoinConditions)

On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng  wrote:


Sorry for the late reply, I've reproduced the problem here
https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
.

On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde  wrote:


It does seem to be something that RelBuilder could do. (RexSimplify can’t
really do it, because it doesn’t know how the expression is being used.)

It’s also worth discovering why the CAST was added in the first place. It
doesn’t seem to be helpful. I think we should strive to eliminate all of
the slightly unhelpful things that Calcite does; those things can add up
and cause major inefficiencies in the planning process and/or sub-optimal
plans.

Julian



On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:

Hey,

That's a great diagnosis :)
I would guess that newCondition became non-nullable for some reason

(rexSimplify runs under RexProgramBuilder so it might be able to narrow the
nullability)

you could try invoking simplify.simplifyPreservingType() on it to see

if that would help.



I know it's necessary to preserve the nullability when simplifying a

boolean expression in project columns, but as for condition in Filter/Calc,
may be we can omit the

nullability?

I think that could probably work - we can't change the nullability on

project columns because those could be referenced (and the reference also
has the type) ; but for filter/join conditions we don't need to care with
it.

It seems we already have a "matchnullability" in ReduceExpressionsRule

; for FILTER/JOIN we should probably turn that off...  :)


cheers,
Zoltan


On 3/24/20 9:15 AM, Shuo Cheng wrote:

Hi Zoltan,
I encountered the problem when running TPC tests, and have not

reproduced it in Calcite master.

But I figured it out how the problem is produced:
There is semi join with the condition:AND(EXPANDED_INDF1,

EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true`

After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2),

type of AND is BOOLEAN with nullable `true`

After  SemiJoinProjectTransposeRule --> Join condition:

CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false`

Just as what you suspected, It's in `SemiJoinProjectTransposeRule`

where forced type correction is added by `RexProgramBuilder#addCondition`,
which will call `RexSimplify#simplifyPreservingType` before registering an
expression.

I know it's necessary to preserve the nullability when simplifying a

boolean expression in project columns, but as for condition in Filter/Calc,
may be we can omit 

Re: Split Join condition with CAST which only widening nullability

2020-03-30 Thread Julian Hyde
If we're going down the path, we should consider that 'b IS TRUE' and
'b IS NOT FALSE' are somewhat like casts. Removing them from join
conditions does not affect the result of the join.

And the same apply to filter conditions.

I don't know whether removing casts, _ IS TRUE and _ IS NOT FALSE from
conditions genuinely make the world "simpler". But let's try it and
see.

On Mon, Mar 30, 2020 at 8:06 AM Zoltan Haindrich  wrote:
>
> Hey Shuo!
>
> Thank you for sharing the testcase! I've seen that you were able to fix it by 
> calling the builder instead of copy - right now I think fixing this thru 
> ReduceExpressionRule
> might be better - as it could also fix up other cases.
> I've tried disabling nullability retainment for filters/join conditions - and 
> it seems to be working; I'll submit it under [1].
>
> Julian: I recommended to try that to provide a quick check to see if at that 
> point the issue could be fixed - I was confident that by disabling 
> "matchNullability" for
> "simplifyPreservingType()" will do the right thing and it doesn't add an 
> unnecessary cast - instead it safely removes it; however: it still added the 
> cast...and by doing so
> it didn't helped :)
>
> [1] https://issues.apache.org/jira/browse/CALCITE-3887
>
> cheers,
> Zoltan
>
>
> On 3/26/20 12:22 PM, Shuo Cheng wrote:
> > I think we may solve the problem from two aspects:
> > 1. Do not try to preserve type (nullability) of Join/Filter condition
> > expression when simplifying or something like pushing down.
> > 2. We can do some work (remove unnecessary CAST) right before create a
> > Join/Filter, as Julian said, something in RelBuilder could be done.
> > I've do some fix in above Link (remove unnecessary CAST when doing
> > pushDownEqualJoinConditions)
> >
> > On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng  wrote:
> >
> >> Sorry for the late reply, I've reproduced the problem here
> >> https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
> >> .
> >>
> >> On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde  wrote:
> >>
> >>> It does seem to be something that RelBuilder could do. (RexSimplify can’t
> >>> really do it, because it doesn’t know how the expression is being used.)
> >>>
> >>> It’s also worth discovering why the CAST was added in the first place. It
> >>> doesn’t seem to be helpful. I think we should strive to eliminate all of
> >>> the slightly unhelpful things that Calcite does; those things can add up
> >>> and cause major inefficiencies in the planning process and/or sub-optimal
> >>> plans.
> >>>
> >>> Julian
> >>>
> >>>
>  On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:
> 
>  Hey,
> 
>  That's a great diagnosis :)
>  I would guess that newCondition became non-nullable for some reason
> >>> (rexSimplify runs under RexProgramBuilder so it might be able to narrow 
> >>> the
> >>> nullability)
>  you could try invoking simplify.simplifyPreservingType() on it to see
> >>> if that would help.
> 
> > I know it's necessary to preserve the nullability when simplifying a
> >>> boolean expression in project columns, but as for condition in 
> >>> Filter/Calc,
> >>> may be we can omit the
> > nullability?
>  I think that could probably work - we can't change the nullability on
> >>> project columns because those could be referenced (and the reference also
> >>> has the type) ; but for filter/join conditions we don't need to care with
> >>> it.
>  It seems we already have a "matchnullability" in ReduceExpressionsRule
> >>> ; for FILTER/JOIN we should probably turn that off...  :)
> 
>  cheers,
>  Zoltan
> 
> 
>  On 3/24/20 9:15 AM, Shuo Cheng wrote:
> > Hi Zoltan,
> > I encountered the problem when running TPC tests, and have not
> >>> reproduced it in Calcite master.
> > But I figured it out how the problem is produced:
> > There is semi join with the condition:AND(EXPANDED_INDF1,
> >>> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true`
> > After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2),
> >>> type of AND is BOOLEAN with nullable `true`
> > After  SemiJoinProjectTransposeRule --> Join condition:
> >>> CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false`
> > Just as what you suspected, It's in `SemiJoinProjectTransposeRule`
> >>> where forced type correction is added by `RexProgramBuilder#addCondition`,
> >>> which will call `RexSimplify#simplifyPreservingType` before registering an
> >>> expression.
> > I know it's necessary to preserve the nullability when simplifying a
> >>> boolean expression in project columns, but as for condition in 
> >>> Filter/Calc,
> >>> may be we can omit the nullability?
> > Best Regards,
> > Shuo
> > On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich  >>> k...@rxd.hu>> wrote:
> > Hey Shuo!
> > I think that simplification should been made on join conditions -
> >>> I've done a quick check; an

Re: Split Join condition with CAST which only widening nullability

2020-03-30 Thread Zoltan Haindrich

Hey Shuo!

Thank you for sharing the testcase! I've seen that you were able to fix it by calling the builder instead of copy - right now I think fixing this thru ReduceExpressionRule 
might be better - as it could also fix up other cases.

I've tried disabling nullability retainment for filters/join conditions - and 
it seems to be working; I'll submit it under [1].

Julian: I recommended to try that to provide a quick check to see if at that point the issue could be fixed - I was confident that by disabling "matchNullability" for 
"simplifyPreservingType()" will do the right thing and it doesn't add an unnecessary cast - instead it safely removes it; however: it still added the cast...and by doing so 
it didn't helped :)


[1] https://issues.apache.org/jira/browse/CALCITE-3887

cheers,
Zoltan


On 3/26/20 12:22 PM, Shuo Cheng wrote:

I think we may solve the problem from two aspects:
1. Do not try to preserve type (nullability) of Join/Filter condition
expression when simplifying or something like pushing down.
2. We can do some work (remove unnecessary CAST) right before create a
Join/Filter, as Julian said, something in RelBuilder could be done.
I've do some fix in above Link (remove unnecessary CAST when doing
pushDownEqualJoinConditions)

On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng  wrote:


Sorry for the late reply, I've reproduced the problem here
https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
.

On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde  wrote:


It does seem to be something that RelBuilder could do. (RexSimplify can’t
really do it, because it doesn’t know how the expression is being used.)

It’s also worth discovering why the CAST was added in the first place. It
doesn’t seem to be helpful. I think we should strive to eliminate all of
the slightly unhelpful things that Calcite does; those things can add up
and cause major inefficiencies in the planning process and/or sub-optimal
plans.

Julian



On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:

Hey,

That's a great diagnosis :)
I would guess that newCondition became non-nullable for some reason

(rexSimplify runs under RexProgramBuilder so it might be able to narrow the
nullability)

you could try invoking simplify.simplifyPreservingType() on it to see

if that would help.



I know it's necessary to preserve the nullability when simplifying a

boolean expression in project columns, but as for condition in Filter/Calc,
may be we can omit the

nullability?

I think that could probably work - we can't change the nullability on

project columns because those could be referenced (and the reference also
has the type) ; but for filter/join conditions we don't need to care with
it.

It seems we already have a "matchnullability" in ReduceExpressionsRule

; for FILTER/JOIN we should probably turn that off...  :)


cheers,
Zoltan


On 3/24/20 9:15 AM, Shuo Cheng wrote:

Hi Zoltan,
I encountered the problem when running TPC tests, and have not

reproduced it in Calcite master.

But I figured it out how the problem is produced:
There is semi join with the condition:AND(EXPANDED_INDF1,

EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true`

After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2),

type of AND is BOOLEAN with nullable `true`

After  SemiJoinProjectTransposeRule --> Join condition:

CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false`

Just as what you suspected, It's in `SemiJoinProjectTransposeRule`

where forced type correction is added by `RexProgramBuilder#addCondition`,
which will call `RexSimplify#simplifyPreservingType` before registering an
expression.

I know it's necessary to preserve the nullability when simplifying a

boolean expression in project columns, but as for condition in Filter/Calc,
may be we can omit the nullability?

Best Regards,
Shuo
On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich 
k...@rxd.hu>> wrote:

Hey Shuo!
I think that simplification should been made on join conditions -

I've done a quick check; and it seems to be working for me.

I suspected that it will be either a missing call to RexSimplify

for some reason - or it is added by a forced return type correction: IIRC
there are some cases in which

the
RexNode type should retained after simplification.
Is this reproducible on current master; could you share a testcase?
cheers,
Zoltan
On 3/24/20 7:28 AM, Shuo Cheng wrote:
 > Hi, Julian, That's what we do as a workaround way. we remove

CAST which are

 > only widening nullability as what CALCITE-2695 does before

applying

 > hash-join/sort-merge-join rule, such that equiv predicate can be

split

 > out.  I'm not sure whether it's properly for Calcite to do the

'convert

 > back' job, for example, simplify the join condition when create

a Join; Or

 > maybe let other systems what use Calcite to do the "convert

back" job as an

 > optimization? What do you think?
   

Re: Split Join condition with CAST which only widening nullability

2020-03-26 Thread Shuo Cheng
I think we may solve the problem from two aspects:
1. Do not try to preserve type (nullability) of Join/Filter condition
expression when simplifying or something like pushing down.
2. We can do some work (remove unnecessary CAST) right before create a
Join/Filter, as Julian said, something in RelBuilder could be done.
I've do some fix in above Link (remove unnecessary CAST when doing
pushDownEqualJoinConditions)

On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng  wrote:

> Sorry for the late reply, I've reproduced the problem here
> https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
> .
>
> On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde  wrote:
>
>> It does seem to be something that RelBuilder could do. (RexSimplify can’t
>> really do it, because it doesn’t know how the expression is being used.)
>>
>> It’s also worth discovering why the CAST was added in the first place. It
>> doesn’t seem to be helpful. I think we should strive to eliminate all of
>> the slightly unhelpful things that Calcite does; those things can add up
>> and cause major inefficiencies in the planning process and/or sub-optimal
>> plans.
>>
>> Julian
>>
>>
>> > On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:
>> >
>> > Hey,
>> >
>> > That's a great diagnosis :)
>> > I would guess that newCondition became non-nullable for some reason
>> (rexSimplify runs under RexProgramBuilder so it might be able to narrow the
>> nullability)
>> > you could try invoking simplify.simplifyPreservingType() on it to see
>> if that would help.
>> >
>> > > I know it's necessary to preserve the nullability when simplifying a
>> boolean expression in project columns, but as for condition in Filter/Calc,
>> may be we can omit the
>> > > nullability?
>> > I think that could probably work - we can't change the nullability on
>> project columns because those could be referenced (and the reference also
>> has the type) ; but for filter/join conditions we don't need to care with
>> it.
>> > It seems we already have a "matchnullability" in ReduceExpressionsRule
>> ; for FILTER/JOIN we should probably turn that off...  :)
>> >
>> > cheers,
>> > Zoltan
>> >
>> >
>> > On 3/24/20 9:15 AM, Shuo Cheng wrote:
>> >> Hi Zoltan,
>> >> I encountered the problem when running TPC tests, and have not
>> reproduced it in Calcite master.
>> >> But I figured it out how the problem is produced:
>> >> There is semi join with the condition:AND(EXPANDED_INDF1,
>> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true`
>> >> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2),
>> type of AND is BOOLEAN with nullable `true`
>> >> After  SemiJoinProjectTransposeRule --> Join condition:
>> CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false`
>> >> Just as what you suspected, It's in `SemiJoinProjectTransposeRule`
>> where forced type correction is added by `RexProgramBuilder#addCondition`,
>> which will call `RexSimplify#simplifyPreservingType` before registering an
>> expression.
>> >> I know it's necessary to preserve the nullability when simplifying a
>> boolean expression in project columns, but as for condition in Filter/Calc,
>> may be we can omit the nullability?
>> >> Best Regards,
>> >> Shuo
>> >> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich > k...@rxd.hu>> wrote:
>> >>Hey Shuo!
>> >>I think that simplification should been made on join conditions -
>> I've done a quick check; and it seems to be working for me.
>> >>I suspected that it will be either a missing call to RexSimplify
>> for some reason - or it is added by a forced return type correction: IIRC
>> there are some cases in which
>> >>the
>> >>RexNode type should retained after simplification.
>> >>Is this reproducible on current master; could you share a testcase?
>> >>cheers,
>> >>Zoltan
>> >>On 3/24/20 7:28 AM, Shuo Cheng wrote:
>> >> > Hi, Julian, That's what we do as a workaround way. we remove
>> CAST which are
>> >> > only widening nullability as what CALCITE-2695 does before
>> applying
>> >> > hash-join/sort-merge-join rule, such that equiv predicate can be
>> split
>> >> > out.  I'm not sure whether it's properly for Calcite to do the
>> 'convert
>> >> > back' job, for example, simplify the join condition when create
>> a Join; Or
>> >> > maybe let other systems what use Calcite to do the "convert
>> back" job as an
>> >> > optimization? What do you think?
>> >> >
>> >> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde <
>> jhyde.apa...@gmail.com > wrote:
>> >> >
>> >> >> Or convert it back to a not-nullable BOOLEAN? The join
>> condition treats
>> >> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never
>> occur, so the
>> >> >> conditions with and without the CAST are equivalent.
>> >> >>
>> >> >> Julian
>> >> >>
>> >> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng > > wrote:
>> >> 

Re: Split Join condition with CAST which only widening nullability

2020-03-26 Thread Shuo Cheng
Sorry for the late reply, I've reproduced the problem here
https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
.

On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde  wrote:

> It does seem to be something that RelBuilder could do. (RexSimplify can’t
> really do it, because it doesn’t know how the expression is being used.)
>
> It’s also worth discovering why the CAST was added in the first place. It
> doesn’t seem to be helpful. I think we should strive to eliminate all of
> the slightly unhelpful things that Calcite does; those things can add up
> and cause major inefficiencies in the planning process and/or sub-optimal
> plans.
>
> Julian
>
>
> > On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:
> >
> > Hey,
> >
> > That's a great diagnosis :)
> > I would guess that newCondition became non-nullable for some reason
> (rexSimplify runs under RexProgramBuilder so it might be able to narrow the
> nullability)
> > you could try invoking simplify.simplifyPreservingType() on it to see if
> that would help.
> >
> > > I know it's necessary to preserve the nullability when simplifying a
> boolean expression in project columns, but as for condition in Filter/Calc,
> may be we can omit the
> > > nullability?
> > I think that could probably work - we can't change the nullability on
> project columns because those could be referenced (and the reference also
> has the type) ; but for filter/join conditions we don't need to care with
> it.
> > It seems we already have a "matchnullability" in ReduceExpressionsRule ;
> for FILTER/JOIN we should probably turn that off...  :)
> >
> > cheers,
> > Zoltan
> >
> >
> > On 3/24/20 9:15 AM, Shuo Cheng wrote:
> >> Hi Zoltan,
> >> I encountered the problem when running TPC tests, and have not
> reproduced it in Calcite master.
> >> But I figured it out how the problem is produced:
> >> There is semi join with the condition:AND(EXPANDED_INDF1,
> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true`
> >> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2),
> type of AND is BOOLEAN with nullable `true`
> >> After  SemiJoinProjectTransposeRule --> Join condition: CAST(AND(INDF1,
> INDF2)), type of AND is BOOLEAN with nullable `false`
> >> Just as what you suspected, It's in `SemiJoinProjectTransposeRule`
> where forced type correction is added by `RexProgramBuilder#addCondition`,
> which will call `RexSimplify#simplifyPreservingType` before registering an
> expression.
> >> I know it's necessary to preserve the nullability when simplifying a
> boolean expression in project columns, but as for condition in Filter/Calc,
> may be we can omit the nullability?
> >> Best Regards,
> >> Shuo
> >> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich  k...@rxd.hu>> wrote:
> >>Hey Shuo!
> >>I think that simplification should been made on join conditions -
> I've done a quick check; and it seems to be working for me.
> >>I suspected that it will be either a missing call to RexSimplify for
> some reason - or it is added by a forced return type correction: IIRC there
> are some cases in which
> >>the
> >>RexNode type should retained after simplification.
> >>Is this reproducible on current master; could you share a testcase?
> >>cheers,
> >>Zoltan
> >>On 3/24/20 7:28 AM, Shuo Cheng wrote:
> >> > Hi, Julian, That's what we do as a workaround way. we remove CAST
> which are
> >> > only widening nullability as what CALCITE-2695 does before
> applying
> >> > hash-join/sort-merge-join rule, such that equiv predicate can be
> split
> >> > out.  I'm not sure whether it's properly for Calcite to do the
> 'convert
> >> > back' job, for example, simplify the join condition when create a
> Join; Or
> >> > maybe let other systems what use Calcite to do the "convert back"
> job as an
> >> > optimization? What do you think?
> >> >
> >> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde <
> jhyde.apa...@gmail.com > wrote:
> >> >
> >> >> Or convert it back to a not-nullable BOOLEAN? The join condition
> treats
> >> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur,
> so the
> >> >> conditions with and without the CAST are equivalent.
> >> >>
> >> >> Julian
> >> >>
> >> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng  > wrote:
> >> >>>
> >> >>> Hi all,
> >> >>>
> >> >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1,
> $2),
> >> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable
> BOOLEAN,
> >> >> Calcite
> >> >>> can not split out equiv predicate, thus some join operation
> like hash
> >> >> join
> >> >>> / sort merge join may not be used. Maybe we can
> >> >>> expand RelOptUtil#splitJoinCondition to support this scenario?
> >> >>
> >> >
>
>


Re: Split Join condition with CAST which only widening nullability

2020-03-24 Thread Julian Hyde
It does seem to be something that RelBuilder could do. (RexSimplify can’t 
really do it, because it doesn’t know how the expression is being used.)

It’s also worth discovering why the CAST was added in the first place. It 
doesn’t seem to be helpful. I think we should strive to eliminate all of the 
slightly unhelpful things that Calcite does; those things can add up and cause 
major inefficiencies in the planning process and/or sub-optimal plans.

Julian


> On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich  wrote:
> 
> Hey,
> 
> That's a great diagnosis :)
> I would guess that newCondition became non-nullable for some reason 
> (rexSimplify runs under RexProgramBuilder so it might be able to narrow the 
> nullability)
> you could try invoking simplify.simplifyPreservingType() on it to see if that 
> would help.
> 
> > I know it's necessary to preserve the nullability when simplifying a 
> > boolean expression in project columns, but as for condition in Filter/Calc, 
> > may be we can omit the
> > nullability?
> I think that could probably work - we can't change the nullability on project 
> columns because those could be referenced (and the reference also has the 
> type) ; but for filter/join conditions we don't need to care with it.
> It seems we already have a "matchnullability" in ReduceExpressionsRule ; for 
> FILTER/JOIN we should probably turn that off...  :)
> 
> cheers,
> Zoltan
> 
> 
> On 3/24/20 9:15 AM, Shuo Cheng wrote:
>> Hi Zoltan,
>> I encountered the problem when running TPC tests, and have not reproduced it 
>> in Calcite master.
>> But I figured it out how the problem is produced:
>> There is semi join with the condition:AND(EXPANDED_INDF1, EXPANDED_INDF2), 
>> type of AND is BOOLEAN with nullable `true`
>> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), type 
>> of AND is BOOLEAN with nullable `true`
>> After  SemiJoinProjectTransposeRule --> Join condition: CAST(AND(INDF1, 
>> INDF2)), type of AND is BOOLEAN with nullable `false`
>> Just as what you suspected, It's in `SemiJoinProjectTransposeRule` where 
>> forced type correction is added by `RexProgramBuilder#addCondition`, which 
>> will call `RexSimplify#simplifyPreservingType` before registering an 
>> expression.
>> I know it's necessary to preserve the nullability when simplifying a boolean 
>> expression in project columns, but as for condition in Filter/Calc, may be 
>> we can omit the nullability?
>> Best Regards,
>> Shuo
>> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich > > wrote:
>>Hey Shuo!
>>I think that simplification should been made on join conditions - I've 
>> done a quick check; and it seems to be working for me.
>>I suspected that it will be either a missing call to RexSimplify for some 
>> reason - or it is added by a forced return type correction: IIRC there are 
>> some cases in which
>>the
>>RexNode type should retained after simplification.
>>Is this reproducible on current master; could you share a testcase?
>>cheers,
>>Zoltan
>>On 3/24/20 7:28 AM, Shuo Cheng wrote:
>> > Hi, Julian, That's what we do as a workaround way. we remove CAST 
>> which are
>> > only widening nullability as what CALCITE-2695 does before applying
>> > hash-join/sort-merge-join rule, such that equiv predicate can be split
>> > out.  I'm not sure whether it's properly for Calcite to do the 'convert
>> > back' job, for example, simplify the join condition when create a 
>> Join; Or
>> > maybe let other systems what use Calcite to do the "convert back" job 
>> as an
>> > optimization? What do you think?
>> >
>> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde > > wrote:
>> >
>> >> Or convert it back to a not-nullable BOOLEAN? The join condition 
>> treats
>> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so 
>> the
>> >> conditions with and without the CAST are equivalent.
>> >>
>> >> Julian
>> >>
>> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng > > wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
>> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,
>> >> Calcite
>> >>> can not split out equiv predicate, thus some join operation like hash
>> >> join
>> >>> / sort merge join may not be used. Maybe we can
>> >>> expand RelOptUtil#splitJoinCondition to support this scenario?
>> >>
>> >



Re: Split Join condition with CAST which only widening nullability

2020-03-24 Thread Zoltan Haindrich

Hey,

That's a great diagnosis :)
I would guess that newCondition became non-nullable for some reason 
(rexSimplify runs under RexProgramBuilder so it might be able to narrow the 
nullability)
you could try invoking simplify.simplifyPreservingType() on it to see if that 
would help.

> I know it's necessary to preserve the nullability when simplifying a boolean 
expression in project columns, but as for condition in Filter/Calc, may be we can 
omit the
> nullability?
I think that could probably work - we can't change the nullability on project columns because those could be referenced (and the reference also has the type) ; but for 
filter/join conditions we don't need to care with it.

It seems we already have a "matchnullability" in ReduceExpressionsRule ; for 
FILTER/JOIN we should probably turn that off...  :)

cheers,
Zoltan


On 3/24/20 9:15 AM, Shuo Cheng wrote:

Hi Zoltan,

I encountered the problem when running TPC tests, and have not reproduced it in 
Calcite master.

But I figured it out how the problem is produced:

There is semi join with the condition:AND(EXPANDED_INDF1, EXPANDED_INDF2), type 
of AND is BOOLEAN with nullable `true`

After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), type of 
AND is BOOLEAN with nullable `true`

After  SemiJoinProjectTransposeRule --> Join condition: CAST(AND(INDF1, 
INDF2)), type of AND is BOOLEAN with nullable `false`

Just as what you suspected, It's in `SemiJoinProjectTransposeRule` where forced type correction is added by `RexProgramBuilder#addCondition`, which will call 
`RexSimplify#simplifyPreservingType` before registering an expression.


I know it's necessary to preserve the nullability when simplifying a boolean expression in project columns, but as for condition in Filter/Calc, may be we can omit the 
nullability?



Best Regards,
Shuo

On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich mailto:k...@rxd.hu>> wrote:

Hey Shuo!

I think that simplification should been made on join conditions - I've done 
a quick check; and it seems to be working for me.
I suspected that it will be either a missing call to RexSimplify for some 
reason - or it is added by a forced return type correction: IIRC there are some 
cases in which
the
RexNode type should retained after simplification.
Is this reproducible on current master; could you share a testcase?

cheers,
Zoltan


On 3/24/20 7:28 AM, Shuo Cheng wrote:
 > Hi, Julian, That's what we do as a workaround way. we remove CAST which 
are
 > only widening nullability as what CALCITE-2695 does before applying
 > hash-join/sort-merge-join rule, such that equiv predicate can be split
 > out.  I'm not sure whether it's properly for Calcite to do the 'convert
 > back' job, for example, simplify the join condition when create a Join; 
Or
 > maybe let other systems what use Calcite to do the "convert back" job as 
an
 > optimization? What do you think?
 >
 > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde mailto:jhyde.apa...@gmail.com>> wrote:
 >
 >> Or convert it back to a not-nullable BOOLEAN? The join condition treats
 >> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so the
 >> conditions with and without the CAST are equivalent.
 >>
 >> Julian
 >>
 >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng mailto:njucs...@gmail.com>> wrote:
 >>>
 >>> Hi all,
 >>>
 >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
 >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,
 >> Calcite
 >>> can not split out equiv predicate, thus some join operation like hash
 >> join
 >>> / sort merge join may not be used. Maybe we can
 >>> expand RelOptUtil#splitJoinCondition to support this scenario?
 >>
 >



Re: Split Join condition with CAST which only widening nullability

2020-03-24 Thread Zoltan Haindrich

Hey Shuo!

I think that simplification should been made on join conditions - I've done a 
quick check; and it seems to be working for me.
I suspected that it will be either a missing call to RexSimplify for some reason - or it is added by a forced return type correction: IIRC there are some cases in which the 
RexNode type should retained after simplification.

Is this reproducible on current master; could you share a testcase?

cheers,
Zoltan


On 3/24/20 7:28 AM, Shuo Cheng wrote:

Hi, Julian, That's what we do as a workaround way. we remove CAST which are
only widening nullability as what CALCITE-2695 does before applying
hash-join/sort-merge-join rule, such that equiv predicate can be split
out.  I'm not sure whether it's properly for Calcite to do the 'convert
back' job, for example, simplify the join condition when create a Join; Or
maybe let other systems what use Calcite to do the "convert back" job as an
optimization? What do you think?

On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde  wrote:


Or convert it back to a not-nullable BOOLEAN? The join condition treats
UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so the
conditions with and without the CAST are equivalent.

Julian


On Mar 23, 2020, at 9:34 PM, Shuo Cheng  wrote:

Hi all,

Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,

Calcite

can not split out equiv predicate, thus some join operation like hash

join

/ sort merge join may not be used. Maybe we can
expand RelOptUtil#splitJoinCondition to support this scenario?






Re: Split Join condition with CAST which only widening nullability

2020-03-23 Thread Shuo Cheng
Hi, Julian, That's what we do as a workaround way. we remove CAST which are
only widening nullability as what CALCITE-2695 does before applying
hash-join/sort-merge-join rule, such that equiv predicate can be split
out.  I'm not sure whether it's properly for Calcite to do the 'convert
back' job, for example, simplify the join condition when create a Join; Or
maybe let other systems what use Calcite to do the "convert back" job as an
optimization? What do you think?

On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde  wrote:

> Or convert it back to a not-nullable BOOLEAN? The join condition treats
> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, so the
> conditions with and without the CAST are equivalent.
>
> Julian
>
> > On Mar 23, 2020, at 9:34 PM, Shuo Cheng  wrote:
> >
> > Hi all,
> >
> > Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
> > BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN,
> Calcite
> > can not split out equiv predicate, thus some join operation like hash
> join
> > / sort merge join may not be used. Maybe we can
> > expand RelOptUtil#splitJoinCondition to support this scenario?
>


Re: Split Join condition with CAST which only widening nullability

2020-03-23 Thread Julian Hyde
Or convert it back to a not-nullable BOOLEAN? The join condition treats UNKNOWN 
the same as FALSE, and besides UNKNOWN will never occur, so the conditions with 
and without the CAST are equivalent. 

Julian

> On Mar 23, 2020, at 9:34 PM, Shuo Cheng  wrote:
> 
> Hi all,
> 
> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN, Calcite
> can not split out equiv predicate, thus some join operation like hash join
> / sort merge join may not be used. Maybe we can
> expand RelOptUtil#splitJoinCondition to support this scenario?


Split Join condition with CAST which only widening nullability

2020-03-23 Thread Shuo Cheng
Hi all,

Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, $2),
BOOLEAN)', which cast the non-nullable BOOLEAN to nullable BOOLEAN, Calcite
can not split out equiv predicate, thus some join operation like hash join
/ sort merge join may not be used. Maybe we can
expand RelOptUtil#splitJoinCondition to support this scenario?