Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-10 Thread Stamatis Zampetakis
Hi Danny,

>From the initial discussion, I was leaning more towards the idea of having
a separate component (something like what is described in CALCITE-4073)
that people can use if they really need to.
>From my experience so far I can confirm that different projects  have
different needs in terms of normalization.
Putting this logic in the constructor forces people to follow a specific
pattern so it would be nice if we could avoid that.

I am not sure I understand what you mean when you say disabling it by
default. Are you talking about features that are already released or the
things currently in progress for 1.24.0?

Best,
Stamatis

On Wed, Jul 8, 2020 at 10:39 AM Danny Chan  wrote:

> In CALCITE-2450, we proposed a change to normalize the RexNode, and there
> is a discussion[1], the change is in very early  phrase and the
> normalization pattern is unstable.
>
> There is actually no common consensus about what a form (or pattern)  a
> desired normalization should be:
>
> • People may have different requests in different contexts.
> • Different downstream projects may also have different requests
>
> The problem becomes critical after CALCITE-3786 because there are more
> cases be normalized (about 50+ plan changes). In CALCITE-3786, we move the
> normalization to constructor because the digest equals and object equals
> should be equivalent for the RexCalls.
>
> The downstream project like Apache Flink would have much more cases with
> normalized plans. But actually, the normalization gains little. I think
> other downstream projects have similar situation.
>
> I would suggest to default disable the normalization until it is “stable”
> enough, at least, after we have a consensus about what is a normalized
> pattern should be, there is an issue [3] already and we can have more
> discussion based on that.
>
> Appreciate for your suggestions, thanks in advance ~
>
> [1]
> https://lists.apache.org/x/thread.html/54bf3ed733eb7e725ce3ea397334aad8f1323ead13e450b1753b1521@%3Cdev.calcite.apache.org%3E
> [2] https://issues.apache.org/jira/browse/CALCITE-2450
> [3] https://issues.apache.org/jira/browse/CALCITE-4073
>
> Best,
> Danny Chan
>


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-10 Thread Danny Chan
I mean default to not make normalization. This feature starts from release
1.22.0.

Stamatis Zampetakis 于2020年7月11日 周六上午7:04写道:

> Hi Danny,
>
> From the initial discussion, I was leaning more towards the idea of having
> a separate component (something like what is described in CALCITE-4073)
> that people can use if they really need to.
> From my experience so far I can confirm that different projects  have
> different needs in terms of normalization.
> Putting this logic in the constructor forces people to follow a specific
> pattern so it would be nice if we could avoid that.
>
> I am not sure I understand what you mean when you say disabling it by
> default. Are you talking about features that are already released or the
> things currently in progress for 1.24.0?
>
> Best,
> Stamatis
>
> On Wed, Jul 8, 2020 at 10:39 AM Danny Chan  wrote:
>
> > In CALCITE-2450, we proposed a change to normalize the RexNode, and there
> > is a discussion[1], the change is in very early  phrase and the
> > normalization pattern is unstable.
> >
> > There is actually no common consensus about what a form (or pattern)  a
> > desired normalization should be:
> >
> > • People may have different requests in different contexts.
> > • Different downstream projects may also have different requests
> >
> > The problem becomes critical after CALCITE-3786 because there are more
> > cases be normalized (about 50+ plan changes). In CALCITE-3786, we move
> the
> > normalization to constructor because the digest equals and object equals
> > should be equivalent for the RexCalls.
> >
> > The downstream project like Apache Flink would have much more cases with
> > normalized plans. But actually, the normalization gains little. I think
> > other downstream projects have similar situation.
> >
> > I would suggest to default disable the normalization until it is “stable”
> > enough, at least, after we have a consensus about what is a normalized
> > pattern should be, there is an issue [3] already and we can have more
> > discussion based on that.
> >
> > Appreciate for your suggestions, thanks in advance ~
> >
> > [1]
> >
> https://lists.apache.org/x/thread.html/54bf3ed733eb7e725ce3ea397334aad8f1323ead13e450b1753b1521@%3Cdev.calcite.apache.org%3E
> > [2] https://issues.apache.org/jira/browse/CALCITE-2450
> > [3] https://issues.apache.org/jira/browse/CALCITE-4073
> >
> > Best,
> > Danny Chan
> >
>


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-13 Thread Danny Chan
Hi, all, I’m planning to default disable the RexNode normalization in 
CALCITE-4073, if you have any objections, please let me know in 24 hours, 
thanks so much ~

Looking forward to your feedback ~

Best,
Danny Chan
在 2020年7月8日 +0800 PM4:38,Danny Chan ,写道:
> In CALCITE-2450, we proposed a change to normalize the RexNode, and there is 
> a discussion[1], the change is in very early  phrase and the normalization 
> pattern is unstable.
>
> There is actually no common consensus about what a form (or pattern)  a 
> desired normalization should be:
>
> • People may have different requests in different contexts.
> • Different downstream projects may also have different requests
>
> The problem becomes critical after CALCITE-3786 because there are more cases 
> be normalized (about 50+ plan changes). In CALCITE-3786, we move the 
> normalization to constructor because the digest equals and object equals 
> should be equivalent for the RexCalls.
>
> The downstream project like Apache Flink would have much more cases with 
> normalized plans. But actually, the normalization gains little. I think other 
> downstream projects have similar situation.
>
> I would suggest to default disable the normalization until it is “stable” 
> enough, at least, after we have a consensus about what is a normalized 
> pattern should be, there is an issue [3] already and we can have more 
> discussion based on that.
>
> Appreciate for your suggestions, thanks in advance ~
>
> [1] 
> https://lists.apache.org/x/thread.html/54bf3ed733eb7e725ce3ea397334aad8f1323ead13e450b1753b1521@%3Cdev.calcite.apache.org%3E
> [2] https://issues.apache.org/jira/browse/CALCITE-2450
> [3] https://issues.apache.org/jira/browse/CALCITE-4073
>
> Best,
> Danny Chan


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-13 Thread Vladimir Sitnikov
>Hi, all, I’m planning to default disable the RexNode normalization in
CALCITE-4073, if you have any objections, please let me know in 24 hours,
thanks so much ~

I assume it would still normalize RexNodes when building plan digest. Is it
the case?

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-13 Thread Danny Chan
Yes, it is. We can keep it as a builtin promotion.

Best,
Danny Chan
在 2020年7月13日 +0800 PM3:48,Vladimir Sitnikov ,写道:
> > Hi, all, I’m planning to default disable the RexNode normalization in
> CALCITE-4073, if you have any objections, please let me know in 24 hours,
> thanks so much ~
>
> I assume it would still normalize RexNodes when building plan digest. Is it
> the case?
>
> Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-14 Thread Haisheng Yuan
+1 to move rex node normalization out of constructor.

On a second thought, I can't help thinking, do we really need RexNode 
normalization?

There are 2 kinds of normalization in current codebase:
1. reverse the operator
$2 > $1 is reordered to  $1 < $2, so that predicate a > b and b < a can be 
reduced to a > b.
In my experience, this is very rare, unless written by human on purpose, it may 
not be worth the optimization. Perhaps some other people may find it useful in 
their specific scenario.

2. simply flipping 2 sides
$2 = $1 is normalized to $1 = $2, which can help dedup the predicates with 
different operand orders that are generated during optimization, like inferring 
predicates from Join.

However, it is very limited, because:
- it only applies to RexInputRef, not general expression. 
- It is not extensible. Customized sql operator can't benefit from this, e.g. 
geospatial operator intersect:
  boolean &&( geometry A , geometry B )

So here is another approach in my mind (only for case 2):

Add an default overridable method to SqlOperator:
boolean inputOrderSensitive() {
  return true;
}

Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to indicate 
they are not input order sensitive.

When computing the hash code of RexCall with input order insensitive sql 
operator, we can use XOR.
op0.hashCode() ^ op1.hashCode() ...

Usually XOR is not a good candidate for hash code, but here it is a very good 
scenario to use XOR, because no matter what order it is, the XOR will generate 
the same hash code, which is exactly what we want.

By doing this way, we can avoid early normalization when computing hashcode, 
because there may not be equivalent rexnodes with different input order at all. 
If the hashcode is the same, then we need to check equals() method. We don't 
need to sort them, and in some cases like the 2 operands of EQUAL are not 
RexInputRef, it may be hard to sort. We just check the 2 RexNodes have the same 
number of distinct operands and each occurred the same times. And we can limit 
the number of operands to 2, 3, 5... what ever, if we are concerned about the 
performance when sql operator like OR has thousands of operands.


Thanks,
Haisheng Yuan

On 2020/07/13 07:58:19, Danny Chan  wrote: 
> Yes, it is. We can keep it as a builtin promotion.
> 
> Best,
> Danny Chan
> 在 2020年7月13日 +0800 PM3:48,Vladimir Sitnikov ,写道:
> > > Hi, all, I’m planning to default disable the RexNode normalization in
> > CALCITE-4073, if you have any objections, please let me know in 24 hours,
> > thanks so much ~
> >
> > I assume it would still normalize RexNodes when building plan digest. Is it
> > the case?
> >
> > Vladimir
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-14 Thread Danny Chan
I have extended the logic to support all the symmetrical operators(=, <> ..) 
and the binary comparison operators (>=, < ..),
not only just RexInputRef. Customized sql operator can also benefit. [1]

The way is to compare the operands with SqlKind first then fallback to their 
hashcode (eargely computed).

Add a SqlOperator interface is not that feasible because most of the operators 
share the same class (for example, SqlBinaryOperator) with different SqlKind. 
Instead, the current code has a special SqlKind to distinguish the operators we 
want to normalize which I think is a better way.

Eagerly computed hashcode for operand should not be a problem because sooner or 
later the hashcode would be computed in the digest cache of the planner and the 
RexCall already has a hashcode cache.

We actually don’t really need to care about what the operands sort algorithm 
is, only if:


• the algorithm is deterministic
• It can cover most of the cases, like the user defined function
• It does not affect the performance too much

[1]  https://github.com/apache/calcite/pull/2065

Best,
Danny Chan
在 2020年7月15日 +0800 PM12:10,dev@calcite.apache.org,写道:
>
> Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to 
> indicate they are not input order sensitive.


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Haisheng Yuan
> Customized sql operator can also benefit. [1]

I am not sure if I missed something. Can you show me how can the customized sql 
operator benefit from this?
e.g. geospatial operator intersect (it is input order insensitive):
boolean &&( geometry A , geometry B )

> Add a SqlOperator interface is not that feasible because most of the 
> operators share the same class

Really? How hard it is to create a SqlOrderInsensitiveBinaryOperator that just 
override inputOrderSensitive() method? like how SqlMonotonicBinaryOperator does.

> Eagerly computed hashcode for operand should not be a problem because sooner 
> or later the hashcode would be computed in the digest cache of the planner 
> and the RexCall already has a hashcode cache.

The thing is do we have to do all the gymnastics like normalization for input 
order insensitive operator when just computing the hash code? Is it the best we 
can do?

Currently It is limited to sql operator with 2 operands. What if I have a 
customized sql operand AND/OR that can accept more than 2 operands?

On 2020/07/15 06:34:55, Danny Chan  wrote: 
> I have extended the logic to support all the symmetrical operators(=, <> ..) 
> and the binary comparison operators (>=, < ..),
> not only just RexInputRef. Customized sql operator can also benefit. [1]
> 
> The way is to compare the operands with SqlKind first then fallback to their 
> hashcode (eargely computed).
> 
> Add a SqlOperator interface is not that feasible because most of the 
> operators share the same class (for example, SqlBinaryOperator) with 
> different SqlKind. Instead, the current code has a special SqlKind to 
> distinguish the operators we want to normalize which I think is a better way.
> 
> Eagerly computed hashcode for operand should not be a problem because sooner 
> or later the hashcode would be computed in the digest cache of the planner 
> and the RexCall already has a hashcode cache.
> 
> We actually don’t really need to care about what the operands sort algorithm 
> is, only if:
> 
> 
> • the algorithm is deterministic
> • It can cover most of the cases, like the user defined function
> • It does not affect the performance too much
> 
> [1]  https://github.com/apache/calcite/pull/2065
> 
> Best,
> Danny Chan
> 在 2020年7月15日 +0800 PM12:10,dev@calcite.apache.org,写道:
> >
> > Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to 
> > indicate they are not input order sensitive.
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Vladimir Sitnikov
I agree that extensibility might be helpful, however:

1) I do not like the idea of XOR-based hash code, because it would make
($1=$1) have the same hashcode as ($2=$2) and so on.
2) "$2 > $1 is reordered to  $1 < $2, so that predicate a > b and b < a can
be reduced to a > b."
This reverting can easily happen as rule does its transformations (e.g.
swap join order and so on).
That is why ability to normalize < into > helps like it helps for $1=$2 vs
$2=$1

>when just computing the hash code?

What I do not like with the current code is it does perform
compute-intensive operations when calling equals.
Previous code (the one from CALCITE-2450) never computed the normalization
multiple times per RexNode.
It looks like now we losing that feature.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Haisheng Yuan
> 1) I do not like the idea of XOR-based hash code, because it would make
> ($1=$1) have the same hashcode as ($2=$2) and so on.
You have a point. However, is it really a concern? How frequent will it occur? 
Especially when an operator like Join, Filter, that has the same input rel, but 
with different rex node $1=$1 vs $2=$2? Most likely they have different input. 
Even if we consider AND/OR with many these sub-expressions, OR($1=$1, $2=$2, 
$3=$3), things might be bad if we want do dedup RexNode children using 
Set, but how often will we see this in production? I haven't. 
Greenplum database has been using this strategy for many years, I haven't seen 
any performance issue that is caused by it.

> This reverting can easily happen as rule does its transformations (e.g.
> swap join order and so on).

If just swapping join order, I doubt it really helps to normalize it. The Join 
operator e.g. innerjoin(S, R) generated by join reordering is not equivalent 
with the original join innerjoin(R, S), they are in different RelSet, and with 
different input rel order, we don't have chance to compare the 2 joins, because 
the input order, rel type and hashcode are different.

> What I do not like with the current code is it does perform
> compute-intensive operations when calling equals.
I agree. I guess you mean every time we call equals, it will normalize it again 
and again. However, it is a tradeoff. Still better than normalize it in RexNode 
constructor. Sometimes, we want to specify the exact operand order, e.g. 
AND($2>10, $1 < 5). If $2>10 is much more selective, can filter out more 
tuples, it will help the query performance a lot. In that case, I don't want 
Calcite reorder it for me.

On 2020/07/15 18:06:43, Vladimir Sitnikov  wrote: 
> I agree that extensibility might be helpful, however:
> 
> 1) I do not like the idea of XOR-based hash code, because it would make
> ($1=$1) have the same hashcode as ($2=$2) and so on.
> 2) "$2 > $1 is reordered to  $1 < $2, so that predicate a > b and b < a can
> be reduced to a > b."
> This reverting can easily happen as rule does its transformations (e.g.
> swap join order and so on).
> That is why ability to normalize < into > helps like it helps for $1=$2 vs
> $2=$1
> 
> >when just computing the hash code?
> 
> What I do not like with the current code is it does perform
> compute-intensive operations when calling equals.
> Previous code (the one from CALCITE-2450) never computed the normalization
> multiple times per RexNode.
> It looks like now we losing that feature.
> 
> Vladimir
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Vladimir Sitnikov
> things might be bad if we want do dedup RexNode children using
Set

I guess it is exactly the action RexSimplify does currently :-/
https://github.com/apache/calcite/blob/3fb68f6c22a7bcbc4cb1fff114bc911b1e31c4de/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1377-L1380

>The Join operator e.g. innerjoin(S, R) generated by join reordering is not
equivalent with the original join innerjoin(R, S),

I might be wrong, however, I did see "< vs >" normalization to reduce the
search space. I added the normalization performance rather than aesthetics
reasons.

Note: innerjoin(S, S) does appear in practice, and it might benefit from <
vs >

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Vladimir Sitnikov
> things might be bad if we want do dedup RexNode children using
Set

I guess the following might work better than XOR:

a) compute xor of the hashes
b) compute sum of the hashes
c) compute product of the hashes (with 0 replaced by 1 to avoid 0*n=0)
Then combine the three values somehow. For instance: (a*17+b)*17+c

That would result in a hash code that tolerates item reordering, it is
better than XOR, and it should be fast.

Unfortunately, we can't use the improved function in Pair since Map.Entry
specifies the exact computation formula :-/

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Haisheng Yuan
> a) compute xor of the hashes
> b) compute sum of the hashes
> c) compute product of the hashes (with 0 replaced by 1 to avoid 0*n=0)
> Then combine the three values somehow. For instance: (a*17+b)*17+c

That is really good one. Better than pure XOR.
I also found this in scala:
https://github.com/scala/scala/blob/2.11.x/src/library/scala/util/hashing/MurmurHash3.scala#L88


On 2020/07/15 20:16:13, Vladimir Sitnikov  wrote: 
> > things might be bad if we want do dedup RexNode children using
> Set
> 
> I guess the following might work better than XOR:
> 
> a) compute xor of the hashes
> b) compute sum of the hashes
> c) compute product of the hashes (with 0 replaced by 1 to avoid 0*n=0)
> Then combine the three values somehow. For instance: (a*17+b)*17+c
> 
> That would result in a hash code that tolerates item reordering, it is
> better than XOR, and it should be fast.
> 
> Unfortunately, we can't use the improved function in Pair since Map.Entry
> specifies the exact computation formula :-/
> 
> Vladimir
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Haisheng Yuan
Guava library's approach is pretty naive:
Hashing.combineUnordered

It just adds each byte together, I don't think it is better than the one you 
proposed.

BTW, I don't understand why we need c? Isn't a*17+b good enough to avoid the 
corner case?

On 2020/07/15 20:33:55, Haisheng Yuan  wrote: 
> > a) compute xor of the hashes
> > b) compute sum of the hashes
> > c) compute product of the hashes (with 0 replaced by 1 to avoid 0*n=0)
> > Then combine the three values somehow. For instance: (a*17+b)*17+c
> 
> That is really good one. Better than pure XOR.
> I also found this in scala:
> https://github.com/scala/scala/blob/2.11.x/src/library/scala/util/hashing/MurmurHash3.scala#L88
> 
> 
> On 2020/07/15 20:16:13, Vladimir Sitnikov  
> wrote: 
> > > things might be bad if we want do dedup RexNode children using
> > Set
> > 
> > I guess the following might work better than XOR:
> > 
> > a) compute xor of the hashes
> > b) compute sum of the hashes
> > c) compute product of the hashes (with 0 replaced by 1 to avoid 0*n=0)
> > Then combine the three values somehow. For instance: (a*17+b)*17+c
> > 
> > That would result in a hash code that tolerates item reordering, it is
> > better than XOR, and it should be fast.
> > 
> > Unfortunately, we can't use the improved function in Pair since Map.Entry
> > specifies the exact computation formula :-/
> > 
> > Vladimir
> > 
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Vladimir Sitnikov
>BTW, I don't understand why we need c? Isn't a*17+b good enough to avoid
the corner case?

That depends on the number of corner cases we want to avoid :)
a*17+b might be good enough.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Haisheng Yuan
> I might be wrong, however, I did see "< vs >" normalization to reduce the
> search space. I added the normalization performance rather than aesthetics
> reasons.

Are you sure the performance gain is caused by "< vs >" normalization instead 
of "=" normalization? Can you show me the test case?

> Note: innerjoin(S, S) does appear in practice, and it might benefit from <
> vs >

JoinCommuteRule won't apply on this kind of join, so that won't be a problem.

On 2020/07/15 19:26:57, Vladimir Sitnikov  wrote: 
> > things might be bad if we want do dedup RexNode children using
> Set
> 
> I guess it is exactly the action RexSimplify does currently :-/
> https://github.com/apache/calcite/blob/3fb68f6c22a7bcbc4cb1fff114bc911b1e31c4de/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1377-L1380
> 
> >The Join operator e.g. innerjoin(S, R) generated by join reordering is not
> equivalent with the original join innerjoin(R, S),
> 
> I might be wrong, however, I did see "< vs >" normalization to reduce the
> search space. I added the normalization performance rather than aesthetics
> reasons.
> 
> Note: innerjoin(S, S) does appear in practice, and it might benefit from <
> vs >
> 
> Vladimir
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Danny Chan
Well, I think I now got your idea, I agree a specific operator sub-class plus a 
special symmetric hash code is more efficient and extensible. But for the first 
version, I would only consider binary operators because I could not figure out 
how to implement an efficient equals for operator like IN.

Best,
Danny Chan
在 2020年7月16日 +0800 AM1:55,Haisheng Yuan ,写道:
> > Customized sql operator can also benefit. [1]
>
> I am not sure if I missed something. Can you show me how can the customized 
> sql operator benefit from this?
> e.g. geospatial operator intersect (it is input order insensitive):
> boolean &&( geometry A , geometry B )
>
> > Add a SqlOperator interface is not that feasible because most of the 
> > operators share the same class
>
> Really? How hard it is to create a SqlOrderInsensitiveBinaryOperator that 
> just override inputOrderSensitive() method? like how 
> SqlMonotonicBinaryOperator does.
>
> > Eagerly computed hashcode for operand should not be a problem because 
> > sooner or later the hashcode would be computed in the digest cache of the 
> > planner and the RexCall already has a hashcode cache.
>
> The thing is do we have to do all the gymnastics like normalization for input 
> order insensitive operator when just computing the hash code? Is it the best 
> we can do?
>
> Currently It is limited to sql operator with 2 operands. What if I have a 
> customized sql operand AND/OR that can accept more than 2 operands?
>
> On 2020/07/15 06:34:55, Danny Chan  wrote:
> > I have extended the logic to support all the symmetrical operators(=, <> 
> > ..) and the binary comparison operators (>=, < ..),
> > not only just RexInputRef. Customized sql operator can also benefit. [1]
> >
> > The way is to compare the operands with SqlKind first then fallback to 
> > their hashcode (eargely computed).
> >
> > Add a SqlOperator interface is not that feasible because most of the 
> > operators share the same class (for example, SqlBinaryOperator) with 
> > different SqlKind. Instead, the current code has a special SqlKind to 
> > distinguish the operators we want to normalize which I think is a better 
> > way.
> >
> > Eagerly computed hashcode for operand should not be a problem because 
> > sooner or later the hashcode would be computed in the digest cache of the 
> > planner and the RexCall already has a hashcode cache.
> >
> > We actually don’t really need to care about what the operands sort 
> > algorithm is, only if:
> >
> >
> > • the algorithm is deterministic
> > • It can cover most of the cases, like the user defined function
> > • It does not affect the performance too much
> >
> > [1]  https://github.com/apache/calcite/pull/2065
> >
> > Best,
> > Danny Chan
> > 在 2020年7月15日 +0800 PM12:10,dev@calcite.apache.org,写道:
> > >
> > > Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to 
> > > indicate they are not input order sensitive.
> >


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Haisheng Yuan
If the number of operands is greater than a threshold, like 5, or 10, we can 
just stop normalizing it.
But currently AND/OR in Calcite is always binary operator, IN is not supported 
in RexNode world.

Anyway, the discussion is orthogonal with your pull request, and it doesn't 
block the proceeding of your PR. 

On 2020/07/16 02:16:18, Danny Chan  wrote: 
> Well, I think I now got your idea, I agree a specific operator sub-class plus 
> a special symmetric hash code is more efficient and extensible. But for the 
> first version, I would only consider binary operators because I could not 
> figure out how to implement an efficient equals for operator like IN.
> 
> Best,
> Danny Chan
> 在 2020年7月16日 +0800 AM1:55,Haisheng Yuan ,写道:
> > > Customized sql operator can also benefit. [1]
> >
> > I am not sure if I missed something. Can you show me how can the customized 
> > sql operator benefit from this?
> > e.g. geospatial operator intersect (it is input order insensitive):
> > boolean &&( geometry A , geometry B )
> >
> > > Add a SqlOperator interface is not that feasible because most of the 
> > > operators share the same class
> >
> > Really? How hard it is to create a SqlOrderInsensitiveBinaryOperator that 
> > just override inputOrderSensitive() method? like how 
> > SqlMonotonicBinaryOperator does.
> >
> > > Eagerly computed hashcode for operand should not be a problem because 
> > > sooner or later the hashcode would be computed in the digest cache of the 
> > > planner and the RexCall already has a hashcode cache.
> >
> > The thing is do we have to do all the gymnastics like normalization for 
> > input order insensitive operator when just computing the hash code? Is it 
> > the best we can do?
> >
> > Currently It is limited to sql operator with 2 operands. What if I have a 
> > customized sql operand AND/OR that can accept more than 2 operands?
> >
> > On 2020/07/15 06:34:55, Danny Chan  wrote:
> > > I have extended the logic to support all the symmetrical operators(=, <> 
> > > ..) and the binary comparison operators (>=, < ..),
> > > not only just RexInputRef. Customized sql operator can also benefit. [1]
> > >
> > > The way is to compare the operands with SqlKind first then fallback to 
> > > their hashcode (eargely computed).
> > >
> > > Add a SqlOperator interface is not that feasible because most of the 
> > > operators share the same class (for example, SqlBinaryOperator) with 
> > > different SqlKind. Instead, the current code has a special SqlKind to 
> > > distinguish the operators we want to normalize which I think is a better 
> > > way.
> > >
> > > Eagerly computed hashcode for operand should not be a problem because 
> > > sooner or later the hashcode would be computed in the digest cache of the 
> > > planner and the RexCall already has a hashcode cache.
> > >
> > > We actually don’t really need to care about what the operands sort 
> > > algorithm is, only if:
> > >
> > >
> > > • the algorithm is deterministic
> > > • It can cover most of the cases, like the user defined function
> > > • It does not affect the performance too much
> > >
> > > [1]  https://github.com/apache/calcite/pull/2065
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年7月15日 +0800 PM12:10,dev@calcite.apache.org,写道:
> > > >
> > > > Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to 
> > > > indicate they are not input order sensitive.
> > >
> 


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-15 Thread Vladimir Sitnikov
>But currently AND/OR in Calcite is always binary operator

I guess we might want to add multi-arg AND in the future to address
AND(AND(AND(...))) issues.
I know IntelliJ IDEA switched to multi-arg "+" and similar representations
to reduce the complexity of certain operations.

PS. Even though we can have a nice unordered hash function, we still need
to implement `equals`, and it looks like we need to have normalization
there.
Do you suggest we should implement `equals` in such a way that it does not
need to normalize the tree?
Is there an approach?

I see currently the implementation is normalizing both expressions every
time `equals` is called.
On top of that, it s not clear how it should behave with deep normalization.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-16 Thread Danny Chan
I would commit the PR soon because I want this into 1.24 release, we can 
continue the discussion though ~

Best,
Danny Chan
在 2020年7月16日 +0800 PM2:53,Vladimir Sitnikov ,写道:
> > But currently AND/OR in Calcite is always binary operator
>
> I guess we might want to add multi-arg AND in the future to address
> AND(AND(AND(...))) issues.
> I know IntelliJ IDEA switched to multi-arg "+" and similar representations
> to reduce the complexity of certain operations.
>
> PS. Even though we can have a nice unordered hash function, we still need
> to implement `equals`, and it looks like we need to have normalization
> there.
> Do you suggest we should implement `equals` in such a way that it does not
> need to normalize the tree?
> Is there an approach?
>
> I see currently the implementation is normalizing both expressions every
> time `equals` is called.
> On top of that, it s not clear how it should behave with deep normalization.
>
> Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-16 Thread Vladimir Sitnikov
Danny, could you please add a test case for nestes normalization (see
comment in pr) ?

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2020-07-16 Thread Danny Chan
I added and it passed.

Best,
Danny Chan
在 2020年7月16日 +0800 PM4:35,Vladimir Sitnikov ,写道:
> Danny, could you please add a test case for nestes normalization (see
> comment in pr) ?
>
> Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-08 Thread Stamatis Zampetakis
Hello,

Today I bumped into a case where digest normalization is not desirable and
brings problems to a downstream project.

To be more specific, Hive has a rule to reorder operators in a filter
condition for performance reasons [1] which becomes useless after
normalization.

The good news is that normalization can be disabled by a system property
[2] but tweaking system properties becomes a bit cumbersome in big
projects.

What do you think about making the normalizer (RexNormalize) more modular
and independent from RexCall?
If downstream projects want to use the normalizer they could pass it in the
planner (e.g., similar to how RexExecutor is plugged in).

Best,
Stamatis

[1]
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java
[2]
https://github.com/apache/calcite/blob/b49693d31964657bf5058bd9387e505992cebd51/core/src/main/java/org/apache/calcite/config/CalciteSystemProperty.java#L104

On Thu, Jul 16, 2020 at 10:52 AM Danny Chan  wrote:

> I added and it passed.
>
> Best,
> Danny Chan
> 在 2020年7月16日 +0800 PM4:35,Vladimir Sitnikov  >,写道:
> > Danny, could you please add a test case for nestes normalization (see
> > comment in pr) ?
> >
> > Vladimir
>


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-08 Thread Vladimir Sitnikov
Stamatis>a case where digest normalization is not desirable and
Stamatis>brings problems to a downstream project.

It looks more like a custom normalizer to me rather than "normalization is
not desirable".
If I read the problem right, the very same idea was #1 in CALCITE-2450:
<>.

The current "per rule" and "per rel" overhead is enormous, and it makes it
doubtful to use optimizer rules for shuffling predicate order.
It might be more efficient to have a consistent normalizer, so the rels
with different expression orders are not generated at all.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-09 Thread Stamatis Zampetakis
I think we agree on this Vladimir, although I don't want to focus too much
on the approach of Hive which I used mostly to have a concrete example but
rather on the fact that normalization should be customizable.

If downstream projects can pass their own normalizer in the planner or
possibly in RelBuilder then I guess we can cover pretty much every use
case. WDYT?

Best,
Stamatis

On Tue, Mar 9, 2021 at 8:16 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Stamatis>a case where digest normalization is not desirable and
> Stamatis>brings problems to a downstream project.
>
> It looks more like a custom normalizer to me rather than "normalization is
> not desirable".
> If I read the problem right, the very same idea was #1 in CALCITE-2450:
> <>.
>
> The current "per rule" and "per rel" overhead is enormous, and it makes it
> doubtful to use optimizer rules for shuffling predicate order.
> It might be more efficient to have a consistent normalizer, so the rels
> with different expression orders are not generated at all.
>
> Vladimir
>


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-09 Thread Vladimir Sitnikov
Stamatis>If downstream projects can pass their own normalizer in the
planner or
Stamatis>possibly in RelBuilder then I guess we can cover pretty much every
use
Stamatis>case.

This is my understanding as well.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-09 Thread Julian Hyde
> normalization should be customizable.

I'll make the counter-argument. If people can't agree what should be
the "right" normalization, then it isn't normalization, it's
optimization.

A good design doesn't need 100 ways to customize. It needs just a few
powerful ones. We have rewrite rules, and we have simplifications
performed by RelBuilder (most of which can be turned off in the
RelBuilder config).

I can see two solutions, each of which are better than making
normalization more customizable:
 * If Hive doesn't want its ANDs re-ordered, introduce a new AND
operator whose order is fixed. (Or use CASE: "x AND y AND z" can be
rewritten to something like "CASE WHEN NOT x THEN FALSE WHEN NOT y
THEN FALSE WHEN NOT z THEN FALSE ELSE TRUE END".)
 * Make normalization less agressive

Julian

On Tue, Mar 9, 2021 at 1:20 AM Vladimir Sitnikov
 wrote:
>
> Stamatis>If downstream projects can pass their own normalizer in the
> planner or
> Stamatis>possibly in RelBuilder then I guess we can cover pretty much every
> use
> Stamatis>case.
>
> This is my understanding as well.
>
> Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-10 Thread Vladimir Sitnikov
Julian> If Hive doesn't want its ANDs re-ordered, i

I'm afraid that is a non-starter for Hive use case.
Replacing all ANDs with CASE would defeat simplified and it would defeat
lots of rules.

Julian> * Make normalization less aggressive

It still looks like a customization.

There's one more option: admit that "AND operand shuffling" is a peephole
optimization,
and put the shuffling into a separate Hep phase.
Then Calcite's normalization does not hurt, and the use case does not
justify the need for a custom normalizer.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-11 Thread Stamatis Zampetakis
The shuffling is already part of a Hep phase [1] although the same planner
is reused to take advantage of metadata caching etc.

If people feel otherwise then we don't really need to have something super
customisable but just the option to use it or not in a more friendly way
than a system property.

As others mentioned, RelBuilder already has some optimizations that can be
turned on/off so it seems like a good place to put the current
normalization logic.

We could extract the current normalization from RexCall#equals, etc., and
put it in the most common methods of the RelBuilder. For example:
RelBuilder#project
RelBuilder#filter
RelBuilder#join
+ variants

Best,
Stamatis

[1]
https://github.com/apache/hive/blob/6e8936fbb725450250271552c683f7b76bc38d93/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2492

On Wed, Mar 10, 2021 at 11:19 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Julian> If Hive doesn't want its ANDs re-ordered, i
>
> I'm afraid that is a non-starter for Hive use case.
> Replacing all ANDs with CASE would defeat simplified and it would defeat
> lots of rules.
>
> Julian> * Make normalization less aggressive
>
> It still looks like a customization.
>
> There's one more option: admit that "AND operand shuffling" is a peephole
> optimization,
> and put the shuffling into a separate Hep phase.
> Then Calcite's normalization does not hurt, and the use case does not
> justify the need for a custom normalizer.
>
> Vladimir
>


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-11 Thread Vladimir Sitnikov
Stamatis>just the option to use it or not in a more friendly way
Stamatis>than a system property.

As far as I remember, the key issue here is that new RexBuilder(...) is a
quite common pattern,
and what you suggest looks like "everyone would have to provide extra
argument when creating RexBuilder".

On top of that, there are use cases like "new RexCall(...)" in the static
context (see org.apache.calcite.rex.RexUtil#not).

Making the uses customizable adds significant overhead with doubtful gains.

I have not explored the route though, so there might be solutions.
For instance, it might work if we have an in-core dependency injection that
would hide the complexity
when coding :core, however, I don't think we could expose DI to Calcite
users.

Vladimir


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-11 Thread Vladimir Ozerov
in our practice, we also had some problems with normalization. First, we
observed problems with the unwanted (and sometimes
incorrect) simplification of expressions with CASTs and literals which came
from RexSimplify. I couldn't find an easy way to disable that behavior.
Note, that RexSimplify may also be considered a "normalization". Second, we
implemented a way to avoid Project when doing join reordering but had some
issues with operator signatures due to lack of automatic normalization for
expressions for permuted inputs. These two cases demonstrate two opposite
views: sometimes you want a specific normalization to happen automatically,
but sometimes you want to disable it.

Perhaps an alternative approach could be to unify all simplification and
normalization logic and split it into configurable rules. Then, we may add
these rules as a separate rule set to the planner, which would be invoked
heuristically every time an operator with expressions is registered in
MEMO. In this case, a user would not need to bother about RexNode
constructors. To clarify, under "rules" I do not mean heavy-weight rules
similar to normal rules. Instead, it might be simple pattern+method pairs,
that could even be compiled into a static program using Janino. This
approach could be very flexible and convenient: a single place in the code
where all rewrite happens, complete control of the optimization rules,
modular rules instead of monolithic code (like in RexSimplify). The obvious
downside - it would require more time to implement than other proposed
approaches.

What do you think about that?

Regards,
Vladimir.

чт, 11 мар. 2021 г. в 13:33, Vladimir Sitnikov :

> Stamatis>just the option to use it or not in a more friendly way
> Stamatis>than a system property.
>
> As far as I remember, the key issue here is that new RexBuilder(...) is a
> quite common pattern,
> and what you suggest looks like "everyone would have to provide extra
> argument when creating RexBuilder".
>
> On top of that, there are use cases like "new RexCall(...)" in the static
> context (see org.apache.calcite.rex.RexUtil#not).
>
> Making the uses customizable adds significant overhead with doubtful gains.
>
> I have not explored the route though, so there might be solutions.
> For instance, it might work if we have an in-core dependency injection that
> would hide the complexity
> when coding :core, however, I don't think we could expose DI to Calcite
> users.
>
> Vladimir
>


Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-11 Thread Julian Hyde
Without simplifications, many trivial RelNodes would be produced. It is 
beneficial to have those in RelBuilder; if they were in rules, the trivial 
RelNodes (and equivalence sets) would still be present, increasing the size of 
the search space.

I want to draw a distinction between simplification and normalization. A 
normalized form is relied upon throughout the system. Suppose for example, that 
we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef = RexLiteral’. 
If a rule encountered the latter case, it would not be a bug if the rule failed 
with, say, a ClassCastException.

So, I disagree with Vladimir that 'RexSimplify may also be considered a 
“normalization”’. If simplification is turned off, each rule must be able to 
deal with the unsimplified expressions.

Also, the very idea of normalizations being optional, enabled by system 
properties or other config, is rather disturbing, because the rules probably 
don’t know that the normalization has been turned off.

The only place for normalization, in my opinion, is explicitly, in a particular 
planner phase. For example, pulling up all filters before attempting to match 
materialized views.

Julian

> On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov  wrote:
> 
> in our practice, we also had some problems with normalization. First, we
> observed problems with the unwanted (and sometimes
> incorrect) simplification of expressions with CASTs and literals which came
> from RexSimplify. I couldn't find an easy way to disable that behavior.
> Note, that RexSimplify may also be considered a "normalization". Second, we
> implemented a way to avoid Project when doing join reordering but had some
> issues with operator signatures due to lack of automatic normalization for
> expressions for permuted inputs. These two cases demonstrate two opposite
> views: sometimes you want a specific normalization to happen automatically,
> but sometimes you want to disable it.
> 
> Perhaps an alternative approach could be to unify all simplification and
> normalization logic and split it into configurable rules. Then, we may add
> these rules as a separate rule set to the planner, which would be invoked
> heuristically every time an operator with expressions is registered in
> MEMO. In this case, a user would not need to bother about RexNode
> constructors. To clarify, under "rules" I do not mean heavy-weight rules
> similar to normal rules. Instead, it might be simple pattern+method pairs,
> that could even be compiled into a static program using Janino. This
> approach could be very flexible and convenient: a single place in the code
> where all rewrite happens, complete control of the optimization rules,
> modular rules instead of monolithic code (like in RexSimplify). The obvious
> downside - it would require more time to implement than other proposed
> approaches.
> 
> What do you think about that?
> 
> Regards,
> Vladimir.
> 
> чт, 11 мар. 2021 г. в 13:33, Vladimir Sitnikov > :
> 
>> Stamatis>just the option to use it or not in a more friendly way
>> Stamatis>than a system property.
>> 
>> As far as I remember, the key issue here is that new RexBuilder(...) is a
>> quite common pattern,
>> and what you suggest looks like "everyone would have to provide extra
>> argument when creating RexBuilder".
>> 
>> On top of that, there are use cases like "new RexCall(...)" in the static
>> context (see org.apache.calcite.rex.RexUtil#not).
>> 
>> Making the uses customizable adds significant overhead with doubtful gains.
>> 
>> I have not explored the route though, so there might be solutions.
>> For instance, it might work if we have an in-core dependency injection that
>> would hide the complexity
>> when coding :core, however, I don't think we could expose DI to Calcite
>> users.
>> 
>> Vladimir
>> 



Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-13 Thread Vladimir Ozerov
Hi Julian,

I agree that in your example normalization may have some different concerns
comparing to simplification. However, both normalization and simplification
sometimes address similar problems either. For example, the simplification
may decrease the search space, but so does the normalization. E.g.
normalized reordering of operands in a join condition may allow for the
merge of equivalent nodes that otherwise would be considered
non-equivalent. Do any of the currently implemented rules depend on some
normalized representation?

Also, as many rules (such as join reorder rules) generate filters, I would
argue that moving the normalization to a separate phase might cause the
unnecessary expansion of the search space.

The idea I expressed above is inspired by CockroachDB (again :-)). In
CockroachDB, expressions are part of the MEMO and treated similarly to
relational operators, which allows for the unified rule infrastructure for
both operators and expressions. Expressions are created using a
context-aware builder, which knows the set of active normalization rules.
Whenever a builder is to create a new expression (not necessarily
the top-level), the normalization rules are invoked in a heuristic manner.
The code generation is used to build the heuristic rule executor. Both
normalization and simplification (in our terms) rules are invoked here. For
example, see [1] (normalization) and [2] (simplification). Finally, the
expression is registered in MEMO. As a result, every expression ever
produced is always in a normalized/simplified form.

I am not saying that we should follow this approach. But IMO (1) unified
handling of simplification and normalization through rules and (2) a single
entry point for all normalization (builder) are interesting design
decisions, as they offer both flexibility and convenience.

Regards,
Vladimir.

[1]
https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/scalar.opt#L8
[2]
https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/bool.opt#L30

пт, 12 мар. 2021 г. в 07:15, Julian Hyde :

> Without simplifications, many trivial RelNodes would be produced. It is
> beneficial to have those in RelBuilder; if they were in rules, the trivial
> RelNodes (and equivalence sets) would still be present, increasing the size
> of the search space.
>
> I want to draw a distinction between simplification and normalization. A
> normalized form is relied upon throughout the system. Suppose for example,
> that we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef =
> RexLiteral’. If a rule encountered the latter case, it would not be a bug
> if the rule failed with, say, a ClassCastException.
>
> So, I disagree with Vladimir that 'RexSimplify may also be considered a
> “normalization”’. If simplification is turned off, each rule must be able
> to deal with the unsimplified expressions.
>
> Also, the very idea of normalizations being optional, enabled by system
> properties or other config, is rather disturbing, because the rules
> probably don’t know that the normalization has been turned off.
>
> The only place for normalization, in my opinion, is explicitly, in a
> particular planner phase. For example, pulling up all filters before
> attempting to match materialized views.
>
> Julian
>
> > On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov 
> wrote:
> >
> > in our practice, we also had some problems with normalization. First, we
> > observed problems with the unwanted (and sometimes
> > incorrect) simplification of expressions with CASTs and literals which
> came
> > from RexSimplify. I couldn't find an easy way to disable that behavior.
> > Note, that RexSimplify may also be considered a "normalization". Second,
> we
> > implemented a way to avoid Project when doing join reordering but had
> some
> > issues with operator signatures due to lack of automatic normalization
> for
> > expressions for permuted inputs. These two cases demonstrate two opposite
> > views: sometimes you want a specific normalization to happen
> automatically,
> > but sometimes you want to disable it.
> >
> > Perhaps an alternative approach could be to unify all simplification and
> > normalization logic and split it into configurable rules. Then, we may
> add
> > these rules as a separate rule set to the planner, which would be invoked
> > heuristically every time an operator with expressions is registered in
> > MEMO. In this case, a user would not need to bother about RexNode
> > constructors. To clarify, under "rules" I do not mean heavy-weight rules
> > similar to normal rules. Instead, it might be simple pattern+method
> pairs,
> > that could even be compiled into a static program using Janino. This
> > approach could be very flexible and convenient: a single place in the
> code
> > where all rewrite happens, complete control of the optimization rules,
> > modular rules instead of monolithic code (like in RexSimplify). The
> obvious
> > downside - it woul

Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-03-19 Thread Stamatis Zampetakis
Just a quick note so that we are all on the same page.
Currently, the so-called normalization in Calcite does not really perform
any re-ordering of the operands; everything happens inside the
implementation of RexCall#equals.
For example, we are saying that $1 = 'A' is equals to 'A' = $1 by
reordering the operands when equals is called.
At the moment we cannot rely on the fact that all literals are always on
the right (or left) since no real reordering happens.

The existing system property controls how equals should behave. I would
prefer if equals didn't rely on a system property and this kind of
reordering happened elsewhere.
To that end, I agree with Vladimir O., that simplifier seems like a good
place to accommodate that logic, at least better than the RelBuilder that I
was thinking initially.
I did a quick prototype to get an idea of what happens if we went that
direction. As expected, it is not easy to cover everything just using the
simplifier as we are using it today.
It may need significant effort to get to the same point that we are today
where we reorder operands inside equals.

Summing up, I would like the idea of having simplification and
normalization controlled through rules or other means (and not hardcoded in
various places) so mostly agree with points (1) and (2) outlined by
Vladimir O.

Best,
Stamatis

On Sat, Mar 13, 2021 at 6:38 PM Vladimir Ozerov  wrote:

> Hi Julian,
>
> I agree that in your example normalization may have some different concerns
> comparing to simplification. However, both normalization and simplification
> sometimes address similar problems either. For example, the simplification
> may decrease the search space, but so does the normalization. E.g.
> normalized reordering of operands in a join condition may allow for the
> merge of equivalent nodes that otherwise would be considered
> non-equivalent. Do any of the currently implemented rules depend on some
> normalized representation?
>
> Also, as many rules (such as join reorder rules) generate filters, I would
> argue that moving the normalization to a separate phase might cause the
> unnecessary expansion of the search space.
>
> The idea I expressed above is inspired by CockroachDB (again :-)). In
> CockroachDB, expressions are part of the MEMO and treated similarly to
> relational operators, which allows for the unified rule infrastructure for
> both operators and expressions. Expressions are created using a
> context-aware builder, which knows the set of active normalization rules.
> Whenever a builder is to create a new expression (not necessarily
> the top-level), the normalization rules are invoked in a heuristic manner.
> The code generation is used to build the heuristic rule executor. Both
> normalization and simplification (in our terms) rules are invoked here. For
> example, see [1] (normalization) and [2] (simplification). Finally, the
> expression is registered in MEMO. As a result, every expression ever
> produced is always in a normalized/simplified form.
>
> I am not saying that we should follow this approach. But IMO (1) unified
> handling of simplification and normalization through rules and (2) a single
> entry point for all normalization (builder) are interesting design
> decisions, as they offer both flexibility and convenience.
>
> Regards,
> Vladimir.
>
> [1]
>
> https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/scalar.opt#L8
> [2]
>
> https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/bool.opt#L30
>
> пт, 12 мар. 2021 г. в 07:15, Julian Hyde :
>
> > Without simplifications, many trivial RelNodes would be produced. It is
> > beneficial to have those in RelBuilder; if they were in rules, the
> trivial
> > RelNodes (and equivalence sets) would still be present, increasing the
> size
> > of the search space.
> >
> > I want to draw a distinction between simplification and normalization. A
> > normalized form is relied upon throughout the system. Suppose for
> example,
> > that we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef =
> > RexLiteral’. If a rule encountered the latter case, it would not be a bug
> > if the rule failed with, say, a ClassCastException.
> >
> > So, I disagree with Vladimir that 'RexSimplify may also be considered a
> > “normalization”’. If simplification is turned off, each rule must be able
> > to deal with the unsimplified expressions.
> >
> > Also, the very idea of normalizations being optional, enabled by system
> > properties or other config, is rather disturbing, because the rules
> > probably don’t know that the normalization has been turned off.
> >
> > The only place for normalization, in my opinion, is explicitly, in a
> > particular planner phase. For example, pulling up all filters before
> > attempting to match materialized views.
> >
> > Julian
> >
> > > On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov 
> > wrote:
> > >
> > > in our practice, we also had some problems with normalization

Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

2021-04-15 Thread Julian Hyde
I have logged https://issues.apache.org/jira/browse/CALCITE-4559,
"Create 'interface RexRule', a modular rewrite for row-expressions".
Abstracting RexNode rewrites as objects would be a major step toward
achieving the goals in this thread.

Now is a great chance to give feedback on this design. The APIs for
registering rules and applying rules will be expensive to change
later.

On Sat, Mar 13, 2021 at 9:37 AM Vladimir Ozerov  wrote:
>
> Hi Julian,
>
> I agree that in your example normalization may have some different concerns
> comparing to simplification. However, both normalization and simplification
> sometimes address similar problems either. For example, the simplification
> may decrease the search space, but so does the normalization. E.g.
> normalized reordering of operands in a join condition may allow for the
> merge of equivalent nodes that otherwise would be considered
> non-equivalent. Do any of the currently implemented rules depend on some
> normalized representation?
>
> Also, as many rules (such as join reorder rules) generate filters, I would
> argue that moving the normalization to a separate phase might cause the
> unnecessary expansion of the search space.
>
> The idea I expressed above is inspired by CockroachDB (again :-)). In
> CockroachDB, expressions are part of the MEMO and treated similarly to
> relational operators, which allows for the unified rule infrastructure for
> both operators and expressions. Expressions are created using a
> context-aware builder, which knows the set of active normalization rules.
> Whenever a builder is to create a new expression (not necessarily
> the top-level), the normalization rules are invoked in a heuristic manner.
> The code generation is used to build the heuristic rule executor. Both
> normalization and simplification (in our terms) rules are invoked here. For
> example, see [1] (normalization) and [2] (simplification). Finally, the
> expression is registered in MEMO. As a result, every expression ever
> produced is always in a normalized/simplified form.
>
> I am not saying that we should follow this approach. But IMO (1) unified
> handling of simplification and normalization through rules and (2) a single
> entry point for all normalization (builder) are interesting design
> decisions, as they offer both flexibility and convenience.
>
> Regards,
> Vladimir.
>
> [1]
> https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/scalar.opt#L8
> [2]
> https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/bool.opt#L30
>
> пт, 12 мар. 2021 г. в 07:15, Julian Hyde :
>
> > Without simplifications, many trivial RelNodes would be produced. It is
> > beneficial to have those in RelBuilder; if they were in rules, the trivial
> > RelNodes (and equivalence sets) would still be present, increasing the size
> > of the search space.
> >
> > I want to draw a distinction between simplification and normalization. A
> > normalized form is relied upon throughout the system. Suppose for example,
> > that we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef =
> > RexLiteral’. If a rule encountered the latter case, it would not be a bug
> > if the rule failed with, say, a ClassCastException.
> >
> > So, I disagree with Vladimir that 'RexSimplify may also be considered a
> > “normalization”’. If simplification is turned off, each rule must be able
> > to deal with the unsimplified expressions.
> >
> > Also, the very idea of normalizations being optional, enabled by system
> > properties or other config, is rather disturbing, because the rules
> > probably don’t know that the normalization has been turned off.
> >
> > The only place for normalization, in my opinion, is explicitly, in a
> > particular planner phase. For example, pulling up all filters before
> > attempting to match materialized views.
> >
> > Julian
> >
> > > On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov 
> > wrote:
> > >
> > > in our practice, we also had some problems with normalization. First, we
> > > observed problems with the unwanted (and sometimes
> > > incorrect) simplification of expressions with CASTs and literals which
> > came
> > > from RexSimplify. I couldn't find an easy way to disable that behavior.
> > > Note, that RexSimplify may also be considered a "normalization". Second,
> > we
> > > implemented a way to avoid Project when doing join reordering but had
> > some
> > > issues with operator signatures due to lack of automatic normalization
> > for
> > > expressions for permuted inputs. These two cases demonstrate two opposite
> > > views: sometimes you want a specific normalization to happen
> > automatically,
> > > but sometimes you want to disable it.
> > >
> > > Perhaps an alternative approach could be to unify all simplification and
> > > normalization logic and split it into configurable rules. Then, we may
> > add
> > > these rules as a separate rule set to the planner, which would be invoked
> > > heuristically ev