Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are different > joins, however, they are equivalent. How is the alternative generated? I would rather check how to stop generating this alternative than reordering predicates. - Haisheng -- 发件人:Vladimir Sitnikov 日 期:2019年12月30日 15:04:25 收件人:Apache Calcite dev list 主 题:Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form Danny>How much cases are there in production ? This example itself seems very marginalized. I’m not against with it, I’m suspicious about the value of the feature. It improves JdbcTest#testJoinManyWay 2 times or so. master. JdbcTest#testJoinManyWay: 5.8sec https://travis-ci.org/apache/calcite/jobs/630602718#L1646 with normalization fix: JdbcTest#testJoinManyWay: 3.1sec https://travis-ci.org/apache/calcite/jobs/630719276#L1432 The fix is vital for the proper support of EnumerableMergeJoin as well. If EnumerableMergeJoin is activated, then JdbcTest#testJoinManyWay can't complete within 5 minutes (see PR 1702) Vladimir
Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
Danny>How much cases are there in production ? This example itself seems very marginalized. I’m not against with it, I’m suspicious about the value of the feature. It improves JdbcTest#testJoinManyWay 2 times or so. master. JdbcTest#testJoinManyWay: 5.8sec https://travis-ci.org/apache/calcite/jobs/630602718#L1646 with normalization fix: JdbcTest#testJoinManyWay: 3.1sec https://travis-ci.org/apache/calcite/jobs/630719276#L1432 The fix is vital for the proper support of EnumerableMergeJoin as well. If EnumerableMergeJoin is activated, then JdbcTest#testJoinManyWay can't complete within 5 minutes (see PR 1702) Vladimir
Some question about lattices
Hi, community I was investigating lattice recently and encountered some problems: 1. Lattice requires the estimated number of rows of cuboids (tiles) for optimization. I've found the method: `TileSuggester.StatisticsProviderImpl # getRowCount`, `LatticeStatisticProvider # cardinality`, but I found that they were not used finally, can anyone tell where the code I should dig. 2. Are there any designs on how to automatically recommend the lattice (how to make the initial recommendation, how to adjust the existing lattices based on the query frequency / existing lattice), I can not find them included in JIRA or doc(https://calcite.apache.org/docs/lattice.html) Looking forward to hearing from you. -- Regards! Aron Tao
[jira] [Created] (CALCITE-3649) Hints should be propagated correctly in planner rules if original node is transformed to different kind
Danny Chen created CALCITE-3649: --- Summary: Hints should be propagated correctly in planner rules if original node is transformed to different kind Key: CALCITE-3649 URL: https://issues.apache.org/jira/browse/CALCITE-3649 Project: Calcite Issue Type: Sub-task Components: core Affects Versions: 1.21.0 Reporter: Danny Chen Assignee: Danny Chen Fix For: 1.22.0 If the AGG was transformed to PROJECT + AGG, the hints of AGG should be propagated to the AGG node. In current implementation, the hints would lost. The perfect solution is to identify the replaced sub-tree, and for this tree, we should check which node is the right one to attach hints. But for this patch, we only consider the new transformed node with pattern of PROJECT + "node of same kind of original", this solves most of the cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3648) MySQL DECOMPRESS function support
Ritesh created CALCITE-3648: --- Summary: MySQL DECOMPRESS function support Key: CALCITE-3648 URL: https://issues.apache.org/jira/browse/CALCITE-3648 Project: Calcite Issue Type: Sub-task Reporter: Ritesh Assignee: Ritesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3647) MySQL COMPRESS function support
Ritesh created CALCITE-3647: --- Summary: MySQL COMPRESS function support Key: CALCITE-3647 URL: https://issues.apache.org/jira/browse/CALCITE-3647 Project: Calcite Issue Type: Sub-task Reporter: Ritesh Assignee: Ritesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3646) MySQL compression functions
Ritesh created CALCITE-3646: --- Summary: MySQL compression functions Key: CALCITE-3646 URL: https://issues.apache.org/jira/browse/CALCITE-3646 Project: Calcite Issue Type: New Feature Reporter: Ritesh Assignee: Ritesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3645) Add columnMappings in digest and explainTerms methods.
xzh_dz created CALCITE-3645: --- Summary: Add columnMappings in digest and explainTerms methods. Key: CALCITE-3645 URL: https://issues.apache.org/jira/browse/CALCITE-3645 Project: Calcite Issue Type: Wish Reporter: xzh_dz -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
> We do not normalize RexNodes, thus it results in excessive planning time, > especially when the planner is trying to reorder joins. > For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are > different joins, however, they are equivalent. How much cases are there in production ? This example itself seems very marginalized. I’m not against with it, I’m suspicious about the value of the feature. > It turned out "b" (sort operands in computeDigest) is easier to implement. > I've filed a PR: https://github.com/apache/calcite/pull/1703 I’m strongly -1 for this way, because it beaks the plan test where almost all of the change are meaningless. Best, Danny Chan 在 2019年12月30日 +0800 AM3:09,dev@calcite.apache.org,写道: > > We do not normalize RexNodes, thus it results in excessive planning time, > especially when the planner is trying to reorder joins. > For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are > different joins, however, they are equivalent.
[jira] [Created] (CALCITE-3644) Calc on the Intersecrt in target is not being matched
xzh_dz created CALCITE-3644: --- Summary: Calc on the Intersecrt in target is not being matched Key: CALCITE-3644 URL: https://issues.apache.org/jira/browse/CALCITE-3644 Project: Calcite Issue Type: Wish Reporter: xzh_dz {code:java} @Test public void testIntersectToCalcOnIntersect() { final String mv = "" + "select \"deptno\",\"name\" from \"emps\"\n" + "intersect all\n" + "select \"deptno\",\"name\" from \"depts\""; String mv1 = "select \"name\", \"deptno\" from (" + mv + ")"; final String query = "" + "select \"name\",\"deptno\" from \"depts\"\n" + "intersect all\n" + "select \"name\",\"deptno\" from \"emps\""; checkMaterialize(mv1, query, true); } {code} error: {code:java} java.lang.AssertionError: Expected: a string containing "EnumerableTableScan(table=[[hr, m0]])" but: was "PLAN=EnumerableIntersect(all=[true])\n EnumerableCalc(expr#0..3=[{inputs}], name=[$t1], deptno=[$t0])\n EnumerableTableScan(table=[[hr, depts]])\n EnumerableCalc(expr#0..4=[{inputs}], name=[$t2], deptno=[$t1])\n EnumerableTableScan(table=[[hr, emps]])\n\n" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6) at org.apache.calcite.test.CalciteAssert.lambda$checkResultContains$7(CalciteAssert.java:429) at org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:544) at org.apache.calcite.test.CalciteAssert$AssertQuery.lambda$returns$1(CalciteAssert.java:1514) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
>> =(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100) I don't see the value of reordering this kind of expression. In GPDB, var vs constant comparison is reordered for input ref only, without additional function calls, like $1=5, because it is transformed into range constraints for rex simplification. Perhaps Calcite doesn't need this. >> What do you think re "$n.field = 42" where $n.field is a dot operator I'm >> not fond of adding complicated checks there, however, I think I can move the >> literal to the right if the other side is a non-literal. Sounds good. I am also fine to keep it unchanged. >> "sort by numeric value" for cases like $1=$8, so that semantics would be >> more or less automatic. I think this is good enough for Calcite, and it has same effect with what I said. Because operand with smaller index always come from left relation, if the two come from different relations. - Haisheng -- 发件人:Vladimir Sitnikov 日 期:2019年12月30日 04:33:11 收件人:Apache Calcite dev list 主 题:Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form Just in case, my motivation of comparing by string length first is for the cases like below: =(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100) vs =(100, CAST(PREV(UP.$0, 0)):INTEGER NOT NULL) As for me, the second one is easier to understand, do the expression starts with simpler bits, and the complicated parts are put later. In the same way, it is not amusing to see cases like AND(...,...,...,...,,,.., null, ) Vladimir
Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
Just in case, my motivation of comparing by string length first is for the cases like below: =(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100) vs =(100, CAST(PREV(UP.$0, 0)):INTEGER NOT NULL) As for me, the second one is easier to understand, do the expression starts with simpler bits, and the complicated parts are put later. In the same way, it is not amusing to see cases like AND(...,...,...,...,,,.., null, ) Vladimir
Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
Haisheng> variable always left, constant always right for applicable binary operators; Oh, I did not think of making different behavior for literals, variables. What do you think re "$n.field = 42" where $n.field is a dot operator I'm not fond of adding complicated checks there, however, I think I can move the literal to the right if the other side is a non-literal. Haisheng>- for join conditions, left operand always comes from left relation, right operand always comes from right relation for reversible binary operators. I'm afraid it would be hard to implement :( I don't like to implement operator-specific logic. On the other hand "sort by length, then sort by string representation" is more or less the same as "sort by numeric value" for cases like $1=$8, so that semantics would be more or less automatic. Vladimir
Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
It turned out "b" (sort operands in computeDigest) is easier to implement. I've filed a PR: https://github.com/apache/calcite/pull/1703 >($0, 2) vs <(2, $0) might be less trivial to implement, but I think it is worth doing at the same time. In any case, lots of expressions will need to be updated, and it is better if we make < vs > stable as well. A similar question is if we want to normalize >($0, 2) and <(2, $0) The suggested ordering is: 1) Order by string length 2) Order by string representation In other words, >($0, 2) normalizes to <(2, $0) because 2 is shorter. And >($0, +(2,3)) is kept intact Note: I do not have an immediate demand for normalizing < vs > (and >= vs <=), however, it looks like it is worth doing to minimize the overall damage. Vladimir
Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
I recommend the way GPDB does. Normalize the the logical plan expression in the preprocessing phase: - variable always left, constant always right for applicable binary operators; - for join conditions, left operand always comes from left relation, right operand always comes from right relation for reversable binary operators. - Haisheng -- 发件人:Enrico Olivelli 日 期:2019年12月30日 03:28:38 收件人: 主 题:Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form Il dom 29 dic 2019, 20:09 Vladimir Sitnikov ha scritto: > Hi, > > We have a 1-year old issue with an idea to sort RexNode operands so they > are consistent. > > For instance, "x=5" and "5=x" have the same semantics, so it would make > sense to stick to a single implementation. > A discussion can be found in > https://issues.apache.org/jira/browse/CALCITE-2450 > > We do not normalize RexNodes, thus it results in excessive planning time, > especially when the planner is trying to reorder joins. > For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are > different joins, however, they are equivalent. > > The normalization does not seem to cost much, however, it enables me to > activate more rules (e.g. EnumerabeMergeRule), > so it is good as it enables to consider more sophisticated plans. > > I see two approaches: > a) Normalize in RexNode constructor. This seems easy to implement, however, > there's a catch > if someone assumed that the order of operands would be the same as the one > that was passed to the constructor. > I don't think there are such assumptions in the wild, but there might be. > The javadoc for the relevant methods says nothing regarding the operand > order. > However, the good thing would be RexNode would feel the same in the > debugger and in its toString representation. > > b) Normalize at RexCall#computeDigest only. > In other words, keep the operands unsorted, but make sure the digest is > created as if the operands were sorted. > This seems to be the most transparent change, however, it might surprise > that `toString` does not match to whatever is seen in the debugger. > > In any case, making `RexCall#toString` print sorted representation would > alter lots of tests. > For :core it is like 5540 tests completed, 358 failed, 91 skipped :(( > > WDYT? > I really would love this feature. Just my 2 cents Enrico > Hopefully, making the RexNode representation sorted would reduce the number > of `$1=$0` vs `$0=$1` plan diffs. > > Vladimir >
Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form
Il dom 29 dic 2019, 20:09 Vladimir Sitnikov ha scritto: > Hi, > > We have a 1-year old issue with an idea to sort RexNode operands so they > are consistent. > > For instance, "x=5" and "5=x" have the same semantics, so it would make > sense to stick to a single implementation. > A discussion can be found in > https://issues.apache.org/jira/browse/CALCITE-2450 > > We do not normalize RexNodes, thus it results in excessive planning time, > especially when the planner is trying to reorder joins. > For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are > different joins, however, they are equivalent. > > The normalization does not seem to cost much, however, it enables me to > activate more rules (e.g. EnumerabeMergeRule), > so it is good as it enables to consider more sophisticated plans. > > I see two approaches: > a) Normalize in RexNode constructor. This seems easy to implement, however, > there's a catch > if someone assumed that the order of operands would be the same as the one > that was passed to the constructor. > I don't think there are such assumptions in the wild, but there might be. > The javadoc for the relevant methods says nothing regarding the operand > order. > However, the good thing would be RexNode would feel the same in the > debugger and in its toString representation. > > b) Normalize at RexCall#computeDigest only. > In other words, keep the operands unsorted, but make sure the digest is > created as if the operands were sorted. > This seems to be the most transparent change, however, it might surprise > that `toString` does not match to whatever is seen in the debugger. > > In any case, making `RexCall#toString` print sorted representation would > alter lots of tests. > For :core it is like 5540 tests completed, 358 failed, 91 skipped :(( > > WDYT? > I really would love this feature. Just my 2 cents Enrico > Hopefully, making the RexNode representation sorted would reduce the number > of `$1=$0` vs `$0=$1` plan diffs. > > Vladimir >
[DISCUSS] CALCITE-2450 reorder predicates to a canonical form
Hi, We have a 1-year old issue with an idea to sort RexNode operands so they are consistent. For instance, "x=5" and "5=x" have the same semantics, so it would make sense to stick to a single implementation. A discussion can be found in https://issues.apache.org/jira/browse/CALCITE-2450 We do not normalize RexNodes, thus it results in excessive planning time, especially when the planner is trying to reorder joins. For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are different joins, however, they are equivalent. The normalization does not seem to cost much, however, it enables me to activate more rules (e.g. EnumerabeMergeRule), so it is good as it enables to consider more sophisticated plans. I see two approaches: a) Normalize in RexNode constructor. This seems easy to implement, however, there's a catch if someone assumed that the order of operands would be the same as the one that was passed to the constructor. I don't think there are such assumptions in the wild, but there might be. The javadoc for the relevant methods says nothing regarding the operand order. However, the good thing would be RexNode would feel the same in the debugger and in its toString representation. b) Normalize at RexCall#computeDigest only. In other words, keep the operands unsorted, but make sure the digest is created as if the operands were sorted. This seems to be the most transparent change, however, it might surprise that `toString` does not match to whatever is seen in the debugger. In any case, making `RexCall#toString` print sorted representation would alter lots of tests. For :core it is like 5540 tests completed, 358 failed, 91 skipped :(( WDYT? Hopefully, making the RexNode representation sorted would reduce the number of `$1=$0` vs `$0=$1` plan diffs. Vladimir
[DISCUSS] Revert [CALCITE-1842] Sort.computeSelfCost() calls makeCost() with arguments in wrong order
Hi, I'm inclined to revert https://github.com/apache/calcite/commit/48a20668647b5a5e86073ef0e9ce206669ad6867 Motivation can be found in https://issues.apache.org/jira/browse/CALCITE-1842?focusedCommentId=17004696&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17004696 WDYT? The question there is Sort#computeSelfCost We have (rows, cpu, io) cost fields, however, most of the time we use just **rows** to represent the costing. For instance, EnumerableHashJoin computes the cost and returns (rows, 0, 0). CALCITE-1842 adjusted Sort costing so it moved NLogN to cpu field, and it makes the sorting virtually free because the current Volcano is using rows field only when comparing the costs. Unfortunately, CALCITE-1842 has no tests, so I don't really see what was the problem. Vladimir
Re: [DISCUSS] Avatica 1.16.0 dockerfiles broken. Release 1.17.0?
Stamitis>I was thinking that if the check says that there is no problem then apply would be a noop. The current logic of 'apply' is it computes the appropriate style and overwrites the file. Do you suggest it to skip overwriting in case the only diff is line endings? What if there are other changes? What if the lines have different style? I would refrain from making such changes to check/apply unless there's a justification. So far I do not care if the source build is broken for Windows, however, I do not like how LF vs CRLF issue is ignored. Vladimir
Re: [DISCUSS] Avatica 1.16.0 dockerfiles broken. Release 1.17.0?
I was thinking that if the check says that there is no problem then apply would be a noop. I have the impression that source releases are necessary and obligatory so that the ASF is covered from a legal perspective. If I am not mistaken even companies with closed source code are obliged to release the sources (to some 3rd party) so that if for some reason the company closes the clients which bought the software have a way to recover the code and make changes if necessary. I don't know much about legal stuff but I guess something similar holds for the ASF. So even if nobody uses the source releases it is necessary to provide one. As you said it might be helpful for a few people to have the sources with different line endings but given that nobody asked for it till now we could skip it and keep our release process simpler. On Thu, Dec 26, 2019 at 10:44 AM Vladimir Sitnikov < sitnikov.vladi...@gmail.com> wrote: > Stamatis>I guess there are people who use Windows and they still have their > editors > Stamatis>configured to use LF endings. > > LF / CRLF uses Git configuration to figure out the needed line endings. > In other words, if someone configures Git to use LF rather than "platform" > line endings, > the build would pick that up and accept LF files even in case the platform > is Windows. > > Remember there's not just `check`, but there's `apply` as well. > If we make `check` tolerable to wrong line endings, what do we do with > `apply`? > Does that mean `check` would say "it is all ok", and `apply` would change > all the endings to their expected values? > > Note: if we are going to make that kind of changes, we'll need > "[CALCITE-3623] Replace Spotless with Autostyle" > https://github.com/apache/calcite/pull/1682 > > Stamatis>opensource projects it is kind of rare to release > Stamatis>source code in multiple formats with different line endings > > Frankly speaking, I would say it is rare to treat the source code as the > primary release artifact. > > AFAIK the ASF way is to treat the source as the key release item. > I recon in 99.42% of the times people who consume the source release would > commit that to their VCS (Git? Mercurial? SVN? Fossil?) > > Our source release does include .gitattributes and .gitignore, so it would > help people to import Calcite in their Git repositories. > However, for other VCS it is important to have the source files in their > "platform-expected" format. > > It is where having both CRLF-oriented and LF-oriented source releases help. > > At this point, you might say: "Vladimir, no-one imports Calcite source > releases to their VCS repositories". > It might be true, but what is the point of making a source release then? > Is the policy of making source releases outdated? > > Stamatis>other weird behavior but these should not happen very often. > > Well, having .gitattributes does help a lot, but I remember that if one > messes up with line endings, then it might be very hard to diff and > re-commit the file appropriately. > It is like the case when `git reset --hard` does not help. > That is why it really helps when you can identify those issues early. > > Vladimir >
Re: Concurrent execution of tests methods
It turned out to be more complicated than I thought. The fix of EnumerableMergeJoin uncovered a well-known infinite planning time issue https://issues.apache.org/jira/browse/CALCITE-2223 The thing is previously the rule did not even try to sort its inputs, thus it was producing value only for the case when the inputs **happen to be** sorted. I thought it would make sense to consider plans like MergeJoin(Sort(..), Sort(..)) as well, so I altered the rule so it creates Sort nodes in case the input is not sorted. So far so good, however, a trivial case like JdbcTest#testJoinManyWay degrades significantly because now it has much more nodes to explore. In current master, testJoinManyWay completes in 7 seconds (see https://travis-ci.org/apache/calcite/jobs/630585211#L1624 ) However, the test does not complete if I activate MergeJoin rule. I did try to fix that, and I have a couple of commits to make it a bit more stable (see commits in https://github.com/apache/calcite/pull/1702) Basically 1) I ensure RexNodes are canonicalized at construction time (e.g. "=($1, $0)" is always created as "=($0, $1)") ^^ This seems to produce lots of test failures, however, it would make test output more stable which is good. 2) I add a couple of call.rel(0).getConvention() == call.rel(1).getConvention(); checks to rules like FilterProjectTransposeRule. The net result is testJoinManyWay takes 140 seconds on my machine. It is not perfect, but at least it manages to complete for 6 joins. An alternative option is to have **two** (or three) flavours of MergeJoinRule: A) A rule that just reuses sorted inputs if there are any (less sort nodes => less planning space => faster planning) B) A rule that tries adding Sort nodes (more sort nodes => longer planning, but it might happen to produce creative plans) C) A rule that tries adding Sort node only in case one of its inputs is already sorted appropriately (a mix between A and B) Any thoughts? Vladimir
Calcite-Master - Build # 1525 - Still Failing
The Apache Jenkins build system has built Calcite-Master (build #1525) Status: Still Failing Check console output at https://builds.apache.org/job/Calcite-Master/1525/ to view the results.
[jira] [Created] (CALCITE-3643) Prevent matching JoinCommuteRule when both inputs are the same
Vladimir Sitnikov created CALCITE-3643: -- Summary: Prevent matching JoinCommuteRule when both inputs are the same Key: CALCITE-3643 URL: https://issues.apache.org/jira/browse/CALCITE-3643 Project: Calcite Issue Type: Improvement Components: core Reporter: Vladimir Sitnikov Assignee: Vladimir Sitnikov The conditions like =($0, $1) and =($1, $1) are equivalent, however adding that permutation increases search space and it gains nothing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Calcite-Master - Build # 1524 - Failure
The Apache Jenkins build system has built Calcite-Master (build #1524) Status: Failure Check console output at https://builds.apache.org/job/Calcite-Master/1524/ to view the results.