[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567115#comment-17567115 ] Botong Huang commented on CALCITE-4568: --- Hi [~julianhyde], a gentle reminder for this thread. Thx! > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478613#comment-17478613 ] Botong Huang edited comment on CALCITE-4568 at 1/19/22, 11:53 AM: -- [~Leo Zhou] yes, it will probably take some time to digest and review considering the amount of code here. See https://github.com/hbtoo/calcite/commits/botong [~julianhyde] please let us know if there's anything we can help. The first commit CALCITE-4737 is in already, thanks! was (Author: botong): [~Leo Zhou] yes, it will probably take some time to digest and review considering the amount of code here. [~julianhyde] please let us know if there's anything we can help. The first commit CALCITE-4737 is in already, thanks! > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478613#comment-17478613 ] Botong Huang commented on CALCITE-4568: --- [~Leo Zhou] yes, it will probably take some time to digest and review considering the amount of code here. [~julianhyde] please let us know if there's anything we can help. The first commit CALCITE-4737 is in already, thanks! > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (CALCITE-4737) Add Volcano visualizer for debugging
[ https://issues.apache.org/jira/browse/CALCITE-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417002#comment-17417002 ] Botong Huang commented on CALCITE-4737: --- [~zuozhiw] and [~thomas.rebele], any updates? Please let me know if there's anything I can help. Thanks! > Add Volcano visualizer for debugging > > > Key: CALCITE-4737 > URL: https://issues.apache.org/jira/browse/CALCITE-4737 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Add Volcano visualizer for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4737) Add Volcano visualizer for debugging
[ https://issues.apache.org/jira/browse/CALCITE-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399232#comment-17399232 ] Botong Huang commented on CALCITE-4737: --- Sounds good. We will work with [~thomas.rebele] to come up with something. > Add Volcano visualizer for debugging > > > Key: CALCITE-4737 > URL: https://issues.apache.org/jira/browse/CALCITE-4737 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Add Volcano visualizer for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-4737) Add Volcano visualizer for debugging
[ https://issues.apache.org/jira/browse/CALCITE-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398858#comment-17398858 ] Botong Huang edited comment on CALCITE-4737 at 8/13/21, 6:43 PM: - Yes, the current visualizer provides full support for tvr, and it compiles with the following two commits. So do we want to wait for these two or do we add a stripped version of the visualizer now? [59342a654|https://github.com/hbtoo/calcite/commit/59342a6547b5357ce1a83335360a81f042b77507] 2. Tempura core memo structure, rule engine, and interfaces. Newly added files only. [0d310841d|https://github.com/hbtoo/calcite/commit/0d310841de431e1971ff7330bc3928364d06a362] 1. Tempura core memo structure, rule engine, and interfaces. Modifications to existing files only. was (Author: botong): [link title|http://example.com]Yes, the current visualizer provides full support for tvr, and it compiles with the following two commits. So do we want to wait for these two or do we add a stripped version of the visualizer now? [59342a654|https://github.com/hbtoo/calcite/commit/59342a6547b5357ce1a83335360a81f042b77507] 2. Tempura core memo structure, rule engine, and interfaces. Newly added files only. [0d310841d|https://github.com/hbtoo/calcite/commit/0d310841de431e1971ff7330bc3928364d06a362] 1. Tempura core memo structure, rule engine, and interfaces. Modifications to existing files only. > Add Volcano visualizer for debugging > > > Key: CALCITE-4737 > URL: https://issues.apache.org/jira/browse/CALCITE-4737 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Add Volcano visualizer for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4737) Add Volcano visualizer for debugging
[ https://issues.apache.org/jira/browse/CALCITE-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398858#comment-17398858 ] Botong Huang commented on CALCITE-4737: --- [link title|http://example.com]Yes, the current visualizer provides full support for tvr, and it compiles with the following two commits. So do we want to wait for these two or do we add a stripped version of the visualizer now? [59342a654|https://github.com/hbtoo/calcite/commit/59342a6547b5357ce1a83335360a81f042b77507] 2. Tempura core memo structure, rule engine, and interfaces. Newly added files only. [0d310841d|https://github.com/hbtoo/calcite/commit/0d310841de431e1971ff7330bc3928364d06a362] 1. Tempura core memo structure, rule engine, and interfaces. Modifications to existing files only. > Add Volcano visualizer for debugging > > > Key: CALCITE-4737 > URL: https://issues.apache.org/jira/browse/CALCITE-4737 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Priority: Major > > Add Volcano visualizer for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397799#comment-17397799 ] Botong Huang commented on CALCITE-4568: --- Sounds good and please take your time. Our team will be participating remotely :D > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395112#comment-17395112 ] Botong Huang commented on CALCITE-4568: --- (copied from dev email) Hi all, Please find our rebased Tempura code on top of a fairly recent version of Calcite at: https://github.com/hbtoo/calcite/tree/botong There are six new commits in total: 739119282 Hack around the type issue to make tvr unit tests work, see CALCITE-4713 c84da7216 4. Full tempura integration: all TVR rules and complete optimization procedure. Newly added files only. d3217f432 3. Full tempura integration: all TVR rules and complete optimization procedure. Modifications to existing files only. 59342a654 2. Tempura core memo structure, rule engine, and interfaces. Newly added files only. 0d310841d 1. Tempura core memo structure, rule engine, and interfaces. Modifications to existing files only. c1240ca7b 0. Add volcano visualizer for debugging. The first three (0, 1, 2) is a compilable version with extended core system support: 1. Memo extension with TvrMetaSet 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as well as links in between the nodes. The next two (3 and 4) is a full version with: All changes in this feature will consist of four parts: 3. A provided set of TvrRules, written using the upgraded rule engine API. 4. TvrVolcanoPlanner that puts everything together end to end. 5. Multi-query optimization, used to find the best incremental plan involving multiple time points. With up to 4, all existing CALCITE unit tests pass. To demonstrate how Tempura works, we have added the following two example unit tests that can be run directly (with the last commit to hack around CALCITE-4713): TvrOptimizationTest.java runs the Tempura optimizer. This program produces a progressive physical plan by the Tempura optimizer that runs across several time points. The physical plan is printed out to the console in DOT format, which can be viewed using an online graphviz tool. TvrExecutionTest.java uses the Tempura optimizer in an end-to-end query. This program generates a progressive physical plan and then uses Calcite's built-in executor to run the plan. The output at each time point is printed to the console. Everyone is welcome and encouraged to take a look and play with it. Let's take some time and figure out a plan on how to incorporate Tempura into Calcite that best suits everyone. > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346265#comment-17346265 ] Botong Huang commented on CALCITE-4568: --- Hi Rui, the current version is runnable but we did not take care of all test cases. The code can be compiled with: mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dforbiddenapis.skip=true -Dcheckstyle.skip=true Yes, we are rebasing it to a more recent version of gradle based master. > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344955#comment-17344955 ] Botong Huang edited comment on CALCITE-4568 at 5/15/21, 3:11 AM: - Please find the slides introducing Tempura here: [https://github.com/alibaba/cost-based-incremental-optimizer/blob/main/Tempura_Calcite_presentation.pdf] was (Author: botong): Please find the slides introducing Tempura here: [^Tempura_Calcite_presentation.pdf] > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344955#comment-17344955 ] Botong Huang commented on CALCITE-4568: --- Please find the slides introducing Tempura here: [^Tempura_Calcite_presentation.pdf] > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-4568) Tempura: extending Calcite into an incremental query optimizer
[ https://issues.apache.org/jira/browse/CALCITE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-4568: -- Summary: Tempura: extending Calcite into an incremental query optimizer (was: Tempura: extending Calcite into a incremental query optimizer) > Tempura: extending Calcite into an incremental query optimizer > -- > > Key: CALCITE-4568 > URL: https://issues.apache.org/jira/browse/CALCITE-4568 > Project: Calcite > Issue Type: New Feature >Reporter: Botong Huang >Priority: Major > > As discussed in the email thread, this is an attempt to extend the Calcite > optimizer into a general incremental query optimizer, based on our research > paper published in VLDB 2021: > Tempura: a general cost-based optimizer framework for incremental data > processing > To our best knowledge, this is the first general cost-based incremental > optimizer that can find the best plan across multiple families of incremental > computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in > the paper) shows that the generated best plan is consistently much better > than the plans from each individual method alone. > In general, incremental query planning is central to database view > maintenance and stream processing systems, and are being adopted in active > databases, resumable query execution, approximate query processing, etc. We > are hoping that this feature can help widening the spectrum of Calcite, > solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4568) Tempura: extending Calcite into a incremental query optimizer
Botong Huang created CALCITE-4568: - Summary: Tempura: extending Calcite into a incremental query optimizer Key: CALCITE-4568 URL: https://issues.apache.org/jira/browse/CALCITE-4568 Project: Calcite Issue Type: New Feature Reporter: Botong Huang As discussed in the email thread, this is an attempt to extend the Calcite optimizer into a general incremental query optimizer, based on our research paper published in VLDB 2021: Tempura: a general cost-based optimizer framework for incremental data processing To our best knowledge, this is the first general cost-based incremental optimizer that can find the best plan across multiple families of incremental computing methods, including IVM, Streaming, DBToaster, etc. Experiments (in the paper) shows that the generated best plan is consistently much better than the plans from each individual method alone. In general, incremental query planning is central to database view maintenance and stream processing systems, and are being adopted in active databases, resumable query execution, approximate query processing, etc. We are hoping that this feature can help widening the spectrum of Calcite, solicit more use cases and adoption of Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293313#comment-17293313 ] Botong Huang commented on CALCITE-4514: --- Hi [~danny0405], openning a new Jira for it might be an overkill. Can you fix the typo for us and push? > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293313#comment-17293313 ] Botong Huang edited comment on CALCITE-4514 at 3/2/21, 3:12 AM: Hi [~danny0405], openning a new Jira for it might be an overkill. Can you fix the typo for us and push? Thx! was (Author: botong): Hi [~danny0405], openning a new Jira for it might be an overkill. Can you fix the typo for us and push? > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292624#comment-17292624 ] Botong Huang commented on CALCITE-4514: --- Looks good to me, thanks! > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292537#comment-17292537 ] Botong Huang commented on CALCITE-4514: --- Hi [~julianhyde], I just noticed some minor issues. For instance in _isSmaller()_, when _set1.parents.size() > set2.parents.size()_, we should return false rather than fall back to relset size and id. We need this additional patch: [https://github.com/apache/calcite/compare/master...hbtoo:CALCITE-4514] > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292465#comment-17292465 ] Botong Huang commented on CALCITE-4514: --- Sounds good to me, thx! > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292302#comment-17292302 ] Botong Huang commented on CALCITE-4514: --- Hi [~julianhyde], thanks! Your patch is almost the same as mine. I've simplified and integrated it in the latest patch. > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292270#comment-17292270 ] Botong Huang commented on CALCITE-4514: --- Oh I see, updated patch back to the compute from scratch. > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-4514) [CALCITE-4514] Fine tune the merge order of two RelSets
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-4514: -- Summary: [CALCITE-4514] Fine tune the merge order of two RelSets (was: Fine tune the merge order of two RelSets, cache RelSet's childSet computation) > [CALCITE-4514] Fine tune the merge order of two RelSets > --- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) Fine tune the merge order of two RelSets, cache RelSet's childSet computation
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292265#comment-17292265 ] Botong Huang commented on CALCITE-4514: --- Hi [~vladimirsitnikov] {quote}Is it really needed to compute the set? {quote} Yes I believe so, detecting the parent child relset relationship is an optimization to disable useless cycles in the memo. It is added in CALCITE-3819, adding [~hyuan] to verify just in case. {quote}caching the Set<...> looks like to be "always faster", however, it might result in an unintentional memory leak. {quote} I did a quick test over some heavy queries. With the patch, there is no perf downgrade, at the same time the improvement is barely un-noticeable as well. This is expected because the patch does not change the behavior for most merge cases. Again, we did find noticeable perf difference in the query that hits the special case, which is resolved by this patch. That said, I agree that adding the cache comes at the cost of possibly pinning some stale RelSets in memory. I don't have a strong preference between cache vs compute from scratch every time. [~julianhyde] What do you think? > Fine tune the merge order of two RelSets, cache RelSet's childSet computation > - > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) Fine tune the merge order of two RelSets, cache RelSet's childSet computation
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292245#comment-17292245 ] Botong Huang commented on CALCITE-4514: --- getChildSets is now changed to compute and maintain on demand. It is essentially a cache and for most RelSets it will be null. It is at least better than compute from scratch every time it is needed. I am not sure if there is a better way. Size of RelSet is already there RelSet.rels, we do not add anything tracking in this patch. Performance side, this Jira is originally intended to fix and improve performance in a special case (when two relsets are parent sets of each other), which the added ut covered. For this special case, the performance improvement is straightforward. But for the overall perf, I am not sure if there is an easy way for a ut to quantify the improvement, without a comprehensive benchmark of queries. > Fine tune the merge order of two RelSets, cache RelSet's childSet computation > - > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) Prefer merge new relset into old relset when they are parent set of each other
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292007#comment-17292007 ] Botong Huang commented on CALCITE-4514: --- Hi Julian, thanks for the comments. I believe I've addressed all of them. Please take an another look. > Prefer merge new relset into old relset when they are parent set of each other > -- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-4514) Fine tune the merge order of two RelSets, cache RelSet's childSet computation
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-4514: -- Summary: Fine tune the merge order of two RelSets, cache RelSet's childSet computation (was: Prefer merge new relset into old relset when they are parent set of each other) > Fine tune the merge order of two RelSets, cache RelSet's childSet computation > - > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4514) Prefer merge new relset into old relset when they are parent set of each other
[ https://issues.apache.org/jira/browse/CALCITE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291869#comment-17291869 ] Botong Huang commented on CALCITE-4514: --- Hi Vladimir, yes, you are right. Both preferences are already there. Except that in the case where both sets are parent of each other, the second preference is not considered. > Prefer merge new relset into old relset when they are parent set of each other > -- > > Key: CALCITE-4514 > URL: https://issues.apache.org/jira/browse/CALCITE-4514 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When merging two relsets, we have two preferences: > 1. Merge parent relset into child relset > 2. Merge newer relset into older relset > Currently, when the two relsets are parent set of each other, we randomly > pick a merge order without checking the second condition above. For > performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4514) Prefer merge new relset into old relset when they are parent set of each other
Botong Huang created CALCITE-4514: - Summary: Prefer merge new relset into old relset when they are parent set of each other Key: CALCITE-4514 URL: https://issues.apache.org/jira/browse/CALCITE-4514 Project: Calcite Issue Type: Improvement Reporter: Botong Huang When merging two relsets, we have two preferences: 1. Merge parent relset into child relset 2. Merge newer relset into older relset Currently, when the two relsets are parent set of each other, we randomly pick a merge order without checking the second condition above. For performance reasons, we should, to avoid unnecessary churn. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4302) Improve cost propagation in volcano to avoid re-propagation
[ https://issues.apache.org/jira/browse/CALCITE-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225047#comment-17225047 ] Botong Huang commented on CALCITE-4302: --- Yes I agree. We can handle null in Volcano if needed by considering null as infinite cost. But I do think CALCITE-4372 is a better way. > Improve cost propagation in volcano to avoid re-propagation > --- > > Key: CALCITE-4302 > URL: https://issues.apache.org/jira/browse/CALCITE-4302 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > CALCITE-3330 changed the cost propagation in volcano from DFS to BFS. > However, there is still room for improvement. A subset can be updated more > than once in a cost propagation process. For instance, A -> D, A -> B -> C -> > D. When subset A has an update, using BFS subset D (and thus all subsets > above/after D) can be updated twice, first via A -> D and then C -> D. We can > further improve the BFS by always popping the relNode with the smallest cost > from the queue, similar to the Dijkstra algorithm. So that whenever a relNode > is popped from the queue, its current best cannot be further deceased any > more. As a result, all subsets will only be propagated at most once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4302) Improve cost propagation in volcano to avoid re-propagation
[ https://issues.apache.org/jira/browse/CALCITE-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224996#comment-17224996 ] Botong Huang commented on CALCITE-4302: --- Hi [~vladimirsitnikov], thanks for letting us know, this deserves a bit more investigation. https://github.com/apache/calcite/commit/c7fdae22fb0e6b220152f05a7343ce5569283a83#diff-23f79975c4d8d77b7a09299831fe47c69c4f65fe8639bc4ceb956321a33da634L415 As you can see, before this patch, the code didn't check for null as well. I did a quick search, everywhere else using getCost(parent, mq) doesn't seem to consider the possibiliity that it can return null. Here the RelNode parent should not be a RelSubset, so the only possibility is that mq.getNonCumulativeCost(rel) inside getCost(parent, mq) returned null. Furthermore, although getNonCumulativeCost's comment says it can return null, everywhere using getNonCumulativeCost also assumed that it won't return null. To sum up, it doen't seem like a new issue, can you double check where does the null comes from? If it is always there, it should have triggered a NPE earlier already. > Improve cost propagation in volcano to avoid re-propagation > --- > > Key: CALCITE-4302 > URL: https://issues.apache.org/jira/browse/CALCITE-4302 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > CALCITE-3330 changed the cost propagation in volcano from DFS to BFS. > However, there is still room for improvement. A subset can be updated more > than once in a cost propagation process. For instance, A -> D, A -> B -> C -> > D. When subset A has an update, using BFS subset D (and thus all subsets > above/after D) can be updated twice, first via A -> D and then C -> D. We can > further improve the BFS by always popping the relNode with the smallest cost > from the queue, similar to the Dijkstra algorithm. So that whenever a relNode > is popped from the queue, its current best cannot be further deceased any > more. As a result, all subsets will only be propagated at most once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4302) Improve cost propagation in volcano to avoid re-propagation
Botong Huang created CALCITE-4302: - Summary: Improve cost propagation in volcano to avoid re-propagation Key: CALCITE-4302 URL: https://issues.apache.org/jira/browse/CALCITE-4302 Project: Calcite Issue Type: Improvement Reporter: Botong Huang CALCITE-3330 changed the cost propagation in volcano from DFS to BFS. However, there is still room for improvement. A subset can be updated more than once in a cost propagation process. For instance, A -> D, A -> B -> C -> D. When subset A has an update, using BFS subset D (and thus all subsets above/after D) can be updated twice, first via A -> D and then C -> D. We can further improve the BFS by always popping the relNode with the smallest cost from the queue, similar to the Dijkstra algorithm. So that whenever a relNode is popped from the queue, its current best cannot be further deceased any more. As a result, all subsets will only be propagated at most once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3991) The required boolean should always be provided in RelSet.getOrCreateSubset()
[ https://issues.apache.org/jira/browse/CALCITE-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3991: -- Summary: The required boolean should always be provided in RelSet.getOrCreateSubset() (was: the required boolean should always be provided in RelSet.getOrCreateSubset()) > The required boolean should always be provided in RelSet.getOrCreateSubset() > > > Key: CALCITE-3991 > URL: https://issues.apache.org/jira/browse/CALCITE-3991 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > > the required boolean should always be provided in RelSet.getOrCreateSubset(). > Deleting the old default as well as other related code cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3981) Volcano.register should not return stale/merged subset
[ https://issues.apache.org/jira/browse/CALCITE-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125322#comment-17125322 ] Botong Huang commented on CALCITE-3981: --- Thanks [~julianhyde] and [~hyuan] for the review! > Volcano.register should not return stale/merged subset > -- > > Key: CALCITE-3981 > URL: https://issues.apache.org/jira/browse/CALCITE-3981 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > When a subset is registered, registerImpl() and registerSubset() currently > simply returns the subset itself. The problem is that subset can become stale > when relSets get merged (for example in ensureRegistered() and > registerSubset() "merge(set, subset.set)"). As a result, a stale/merged > subset might be returned from registerImpl, and the newly registering subtree > might get registered recursively on top of the stale subset (see > AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is > merged into others and becomes stale, it should not be used to connect new > relNodes. > With CALCITE-3755, subsets can now be directly matched by rules. This opens > another source of stale subset leak: (1) An active subset gets matched, the > RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to > relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) > In OnMatch the rule gets the stale subset, builds new rels on top of it and > regsiter the new rels. In this case, the entire new rel subtree will be > registered on top of the stale subset as is. > Rather than returning the registering subset itself, register should always > use canonize() to find and return the equivalent active subset instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4029) ProjectRemoveRule auto pruning may prevent rules from running if mixed conventions are used in a logical plan
[ https://issues.apache.org/jira/browse/CALCITE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124143#comment-17124143 ] Botong Huang commented on CALCITE-4029: --- Exactly, closing this Jira. > ProjectRemoveRule auto pruning may prevent rules from running if mixed > conventions are used in a logical plan > -- > > Key: CALCITE-4029 > URL: https://issues.apache.org/jira/browse/CALCITE-4029 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.23.0 >Reporter: Anton Haidai >Priority: Minor > > Preconditions to reproduce the issue: > # Logical plan has mixed conventions (for example, a bottom node is a > TableScan in a final convention while other nodes are regular logical nodes > with NONE convention). > # There is a rule that expects a logical node with an input (like a rule > matching "operand(LogicalSort.class, operand(RelNode.class, any()))") > # A project over the scan is trivial (like SELECT * FROM ...) > The issue is related to https://issues.apache.org/jira/browse/CALCITE-3939, > please see comments for a detailed debugging of a real-life reproducing case. > h4. Example: > Logical plan with a leaf nodes in a custom convention: > {code:java} > LogicalSort[NONE] > LogicalProject[NONE] > CustomScan[CUSTOM_CONVENTION]{code} > A rule configured (RuleX) matches "operand(LogicalSort.class, > operand(RelNode.class, any()))". > *Without ProjectRemoveRule auto pruning* > ProjectRemoveRule recognizes LogicalProject as trivial an merges it into a > single RelSet with CustomScan. > RuleX can run on top of this change as far as LogicalProject has a logical > node (LogicalProject in RelSubset[NONE]) as an input. > > *With ProjectRemoveRule auto pruning* > ProjectRemoveRule recognizes LogicalProject as trivial but removes it with > it's RelSet so the CustomScan is the only node in it's RelSet, > RelSubset[CUSTOM_CONVENTION]. > RuleX can't run on top of this change as far as LogicalProject has an empty > input RelSubset[NONE] of the RelSet with the CustomScan. > h2. Possible workarounds > # Disable ProjectRemoveRule auto pruning. > # Use only logical nodes in a logical plan, for the example above: use > LogicalScan - > CustomScanRule - > CustomScan instead of direct use of > CustomScan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4029) ProjectRemoveRule auto pruning may prevent rules from running if mixed conventions are used in a logical plan
[ https://issues.apache.org/jira/browse/CALCITE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120768#comment-17120768 ] Botong Huang commented on CALCITE-4029: --- This hack I proposed indeed won't work. {quote} If you really want it to fire, consider changing the child operand in RuleX to match any convention. (btw, when you write operand(LogicalSort.class, operand(RelNode.class, any())) earlier, this should already match the customScan with a custom convention right? {quote} But again, I insist on this: logical project/sort having a non logical CustomScan as input is a weird design. This is the fundamental reason you are seeing this issue. > ProjectRemoveRule auto pruning may prevent rules from running if mixed > conventions are used in a logical plan > -- > > Key: CALCITE-4029 > URL: https://issues.apache.org/jira/browse/CALCITE-4029 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.23.0 >Reporter: Anton Haidai >Priority: Minor > > Preconditions to reproduce the issue: > # Logical plan has mixed conventions (for example, a bottom node is a > TableScan in a final convention while other nodes are regular logical nodes > with NONE convention). > # There is a rule that expects a logical node with an input (like a rule > matching "operand(LogicalSort.class, operand(RelNode.class, any()))") > # A project over the scan is trivial (like SELECT * FROM ...) > The issue is related to https://issues.apache.org/jira/browse/CALCITE-3939, > please see comments for a detailed debugging of a real-life reproducing case. > h4. Example: > Logical plan with a leaf nodes in a custom convention: > {code:java} > LogicalSort[NONE] > LogicalProject[NONE] > CustomScan[CUSTOM_CONVENTION]{code} > A rule configured (RuleX) matches "operand(LogicalSort.class, > operand(RelNode.class, any()))". > *Without ProjectRemoveRule auto pruning* > ProjectRemoveRule recognizes LogicalProject as trivial an merges it into a > single RelSet with CustomScan. > RuleX can run on top of this change as far as LogicalProject has a logical > node (LogicalProject in RelSubset[NONE]) as an input. > > *With ProjectRemoveRule auto pruning* > ProjectRemoveRule recognizes LogicalProject as trivial but removes it with > it's RelSet so the CustomScan is the only node in it's RelSet, > RelSubset[CUSTOM_CONVENTION]. > RuleX can't run on top of this change as far as LogicalProject has an empty > input RelSubset[NONE] of the RelSet with the CustomScan. > h2. Possible workarounds > # Disable ProjectRemoveRule auto pruning. > # Use only logical nodes in a logical plan, for the example above: use > LogicalScan - > CustomScanRule - > CustomScan instead of direct use of > CustomScan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3939) Change UnionEliminatorRule and ProjectRemoveRule to auto pruning SubstitutionRule
[ https://issues.apache.org/jira/browse/CALCITE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119765#comment-17119765 ] Botong Huang commented on CALCITE-3939: --- [~anha] Understood (we are on the same page for Step 2). But still I don't think this is a bug. 1. Both before and after the RelSet merge, if there's any other logical relNode in the same RelSet as the trivial project. It will get a match and the RuleX will fire on it. 2. Logical project/sort having a non logical CustomScan input is a weird design. This is the fundamental reason you are seeing this issue. If you really want it to fire, consider changing the child operand in RuleX to match any convention. (btw, when you write operand(LogicalSort.class, operand(RelNode.class, any())) earlier, this should already match the customScan with a custom convention right?) > Change UnionEliminatorRule and ProjectRemoveRule to auto pruning > SubstitutionRule > - > > Key: CALCITE-3939 > URL: https://issues.apache.org/jira/browse/CALCITE-3939 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Major > Fix For: 1.23.0 > > Time Spent: 50m > Remaining Estimate: 0h > > UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a > RelNode. They can also become SubstitutionRule with autoprune enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3939) Change UnionEliminatorRule and ProjectRemoveRule to auto pruning SubstitutionRule
[ https://issues.apache.org/jira/browse/CALCITE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118941#comment-17118941 ] Botong Huang commented on CALCITE-3939: --- Thanks [~anha] for reporting the issue. However, as [~hyuan] mentioned, something is missing in your scenario. Let's say the LogicalSort, TrivialProject, CustomScan are in RelSet 1, 2, 3 respectively. Here's what it should have happened. # RuleX matches LogicalSort and TrivialProject (RuleX.matches called with ). This RuleMatch1 gets queued in RuleQueue. # ProjectRemoveRule fires, it marks TrivialProject as stale, triggers RelSet 2 and 3 to merge, say into RelSet 2. Now both TrivialProject and CustomScan are in RelSet 2. # The planner.fireRules in RelSet.mergeWith will trigger a new round of RuleMatch for all relNodes in RelSet 2. Specifically RuleX should have a new match with LogicalSort and CustomScan (RuleX.matches called with ). This RuleMatch2 is also queued in RuleQueue. # RuleMatch1 pops from ruleQueue, and since TrivialProject is stale, this match is skipped and onMatch not called. # RuleMatch2 pops from ruleQueue, and RuleX.onMatch called with . Please let us know what went wrong here. > Change UnionEliminatorRule and ProjectRemoveRule to auto pruning > SubstitutionRule > - > > Key: CALCITE-3939 > URL: https://issues.apache.org/jira/browse/CALCITE-3939 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Major > Fix For: 1.23.0 > > Time Spent: 50m > Remaining Estimate: 0h > > UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a > RelNode. They can also become SubstitutionRule with autoprune enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3991) the required boolean should always be provided in RelSet.getOrCreateSubset()
Botong Huang created CALCITE-3991: - Summary: the required boolean should always be provided in RelSet.getOrCreateSubset() Key: CALCITE-3991 URL: https://issues.apache.org/jira/browse/CALCITE-3991 Project: Calcite Issue Type: Bug Reporter: Botong Huang the required boolean should always be provided in RelSet.getOrCreateSubset(). Deleting the old default as well as other related code cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3961) VolcanoPlanner.prunedNodes information is lost when duplicate relNode is discarded
[ https://issues.apache.org/jira/browse/CALCITE-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103628#comment-17103628 ] Botong Huang commented on CALCITE-3961: --- Thanks [~hyuan] for the review! > VolcanoPlanner.prunedNodes information is lost when duplicate relNode is > discarded > -- > > Key: CALCITE-3961 > URL: https://issues.apache.org/jira/browse/CALCITE-3961 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Fix For: 1.23.0 > > Time Spent: 50m > Remaining Estimate: 0h > > VolcanoPlanner.prunedNodes stores the list of relNodes that are marked > useless. Whenever the planner see two identical relNode (e.g. when Relsets > are merged), one of them are discarded. However, when the preserved node is > not in the pruned list while the discarded one is, this pruned information is > lost. In general, we should preserve this info whenever duplicate relNodes > are discarded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3981) Volcano.register should not return stale/merged subset
[ https://issues.apache.org/jira/browse/CALCITE-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3981: -- Description: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset (see AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the registering subset itself, register should always use canonize() to find the equivalent active subset and return it instead. was: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset (see AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. > Volcano.register should not return stale/merged subset > -- > > Key: CALCITE-3981 > URL: https://issues.apache.org/jira/browse/CALCITE-3981 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > > When a subset is registered, registerImpl() and registerSubset() currently > simply returns the subset itself. The problem is that subset can become stale > when relSets get merged (for example in ensureRegistered() and > registerSubset() "merge(set, subset.set)"). As a result, a stale/merged > subset might be returned from registerImpl, and the newly registering subtree > might get registered recursively on top of the stale subset (see > AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is > merged into others and becomes stale, it should not be used to connect new > relNodes. > With CALCITE-3755, subsets can now be directly matched by rules. This opens > another source of stale subset leak: (1) An active subset gets matched, the > RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to > relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) > In OnMatch the rule gets the stale subset, builds new rels on top of it and > regsiter the new rels. In this case, the entire new rel subtree will be > registered on top of the stale subset as is. > Rather than returning the registering subset itself, register should always > use canonize() to find the equivalent active subset and return it instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3981) Volcano.register should not return stale/merged subset
[ https://issues.apache.org/jira/browse/CALCITE-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3981: -- Description: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset (see AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the registering subset itself, register should always use canonize() to find and return the equivalent active subset instead. was: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset (see AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the registering subset itself, register should always use canonize() to find the equivalent active subset and return it instead. > Volcano.register should not return stale/merged subset > -- > > Key: CALCITE-3981 > URL: https://issues.apache.org/jira/browse/CALCITE-3981 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > > When a subset is registered, registerImpl() and registerSubset() currently > simply returns the subset itself. The problem is that subset can become stale > when relSets get merged (for example in ensureRegistered() and > registerSubset() "merge(set, subset.set)"). As a result, a stale/merged > subset might be returned from registerImpl, and the newly registering subtree > might get registered recursively on top of the stale subset (see > AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is > merged into others and becomes stale, it should not be used to connect new > relNodes. > With CALCITE-3755, subsets can now be directly matched by rules. This opens > another source of stale subset leak: (1) An active subset gets matched, the > RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to > relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) > In OnMatch the rule gets the stale subset, builds new rels on top of it and > regsiter the new rels. In this case, the entire new rel subtree will be > registered on top of the stale subset as is. > Rather than returning the registering subset itself, register should always > use canonize() to find and return the equivalent active subset instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3981) Volcano.register should not return stale/merged subset
[ https://issues.apache.org/jira/browse/CALCITE-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3981: -- Description: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset (see AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. was: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset. This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. > Volcano.register should not return stale/merged subset > -- > > Key: CALCITE-3981 > URL: https://issues.apache.org/jira/browse/CALCITE-3981 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > > When a subset is registered, registerImpl() and registerSubset() currently > simply returns the subset itself. The problem is that subset can become stale > when relSets get merged (for example in ensureRegistered() and > registerSubset() "merge(set, subset.set)"). As a result, a stale/merged > subset might be returned from registerImpl, and the newly registering subtree > might get registered recursively on top of the stale subset (see > AbstractRelNode.onRegister()). This is a leak because once a relSet/subset is > merged into others and becomes stale, it should not be used to connect new > relNodes. > With CALCITE-3755, subsets can now be directly matched by rules. This opens > another source of stale subset leak: (1) An active subset gets matched, the > RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to > relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) > In OnMatch the rule gets the stale subset, builds new rels on top of it and > regsiter the new rels. In this case, the entire new rel subtree will be > registered on top of the stale subset as is. > Rather than returning the subset as it, register should always use canonize() > to find the equivalent active subset and return it instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3981) Volcano.register should not return stale/merged subset
[ https://issues.apache.org/jira/browse/CALCITE-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3981: -- Description: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in ensureRegistered() and registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset. This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. was: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset. This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. > Volcano.register should not return stale/merged subset > -- > > Key: CALCITE-3981 > URL: https://issues.apache.org/jira/browse/CALCITE-3981 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > > When a subset is registered, registerImpl() and registerSubset() currently > simply returns the subset itself. The problem is that subset can become stale > when relSets get merged (for example in ensureRegistered() and > registerSubset() "merge(set, subset.set)"). As a result, a stale/merged > subset might be returned from registerImpl, and the newly registering subtree > might get registered recursively on top of the stale subset. This is a leak > because once a relSet/subset is merged into others and becomes stale, it > should not be used to connect new relNodes. > With CALCITE-3755, subsets can now be directly matched by rules. This opens > another source of stale subset leak: (1) An active subset gets matched, the > RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to > relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) > In OnMatch the rule gets the stale subset, builds new rels on top of it and > regsiter the new rels. In this case, the entire new rel subtree will be > registered on top of the stale subset as is. > Rather than returning the subset as it, register should always use canonize() > to find the equivalent active subset and return it instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3981) Volcano.register should not return stale/merged subset
[ https://issues.apache.org/jira/browse/CALCITE-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3981: -- Description: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset. This is a leak because once a relSet/subset is merged into others and becomes stale, it should not be used to connect new relNodes. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. was: When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. > Volcano.register should not return stale/merged subset > -- > > Key: CALCITE-3981 > URL: https://issues.apache.org/jira/browse/CALCITE-3981 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > > When a subset is registered, registerImpl() and registerSubset() currently > simply returns the subset itself. The problem is that subset can become stale > when relSets get merged (for example in registerSubset() "merge(set, > subset.set)"). As a result, a stale/merged subset might be returned from > registerImpl, and the newly registering subtree might get registered > recursively on top of the stale subset. This is a leak because once a > relSet/subset is merged into others and becomes stale, it should not be used > to connect new relNodes. > With CALCITE-3755, subsets can now be directly matched by rules. This opens > another source of stale subset leak: (1) An active subset gets matched, the > RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to > relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) > In OnMatch the rule gets the stale subset, builds new rels on top of it and > regsiter the new rels. In this case, the entire new rel subtree will be > registered on top of the stale subset as is. > Rather than returning the subset as it, register should always use canonize() > to find the equivalent active subset and return it instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3981) Volcano.register should not return stale/merged subset
Botong Huang created CALCITE-3981: - Summary: Volcano.register should not return stale/merged subset Key: CALCITE-3981 URL: https://issues.apache.org/jira/browse/CALCITE-3981 Project: Calcite Issue Type: Bug Reporter: Botong Huang When a subset is registered, registerImpl() and registerSubset() currently simply returns the subset itself. The problem is that subset can become stale when relSets get merged (for example in registerSubset() "merge(set, subset.set)"). As a result, a stale/merged subset might be returned from registerImpl, and the newly registering subtree might get registered recursively on top of the stale subset. With CALCITE-3755, subsets can now be directly matched by rules. This opens another source of stale subset leak: (1) An active subset gets matched, the RuleMatch gets queued in RuleQueue. (2) The subset becomes stale due to relSet merge. (3) The rule match in (1) is popped from queue and fired. (4) In OnMatch the rule gets the stale subset, builds new rels on top of it and regsiter the new rels. In this case, the entire new rel subtree will be registered on top of the stale subset as is. Rather than returning the subset as it, register should always use canonize() to find the equivalent active subset and return it instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3947) AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match order is deterministic across runs
[ https://issues.apache.org/jira/browse/CALCITE-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093848#comment-17093848 ] Botong Huang commented on CALCITE-3947: --- Thanks [~julianhyde] for the detailed comment! Let me elaborate a bit and see if it makes sense to you. Firstly, the point of this patch is to make sure the behavior of the planner is identical across runs for easy debugging. Existing code did try to fulfill this purpose, that's why lots of LinkedHashSet/Map are used in the planner already. I simply found a leak in AbstractRelOptPlanner.classes, now a HashSet, whose iteration order does impact the order rules are matched and fired (see description for details), and thus making the behavior underterministic. Secondly, yes keeping it sorted, say with TreeSet, serves the purpose as well. Considering TreeSet is more costly than LinkedHashSet, and that currently there's no additional benefit of keeping it sorted for now, I think LinkedHashSet is a better choice. > AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match > order is deterministic across runs > --- > > Key: CALCITE-3947 > URL: https://issues.apache.org/jira/browse/CALCITE-3947 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > AbstractRelOptPlanner.classes is used by subClasses() to determine things to > put into VolcanoPlanner.classOperands, which is then used in > VolcanoPlanner.fireRules(). Since AbstractRelOptPlanner.classes is now a > HashSet, its iteration order is not deterministic across runs, making > debugging hard. It should be LinkedHashSet just like many other fields in the > planner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3961) VolcanoPlanner.prunedNodes information is lost when duplicate relNode is discarded
Botong Huang created CALCITE-3961: - Summary: VolcanoPlanner.prunedNodes information is lost when duplicate relNode is discarded Key: CALCITE-3961 URL: https://issues.apache.org/jira/browse/CALCITE-3961 Project: Calcite Issue Type: Bug Reporter: Botong Huang VolcanoPlanner.prunedNodes stores the list of relNodes that are marked useless. Whenever the planner see two identical relNode (e.g. when Relsets are merged), one of them are discarded. However, when the preserved node is not in the pruned list while the discarded one is, this pruned information is lost. In general, we should preserve this info whenever duplicate relNodes are discarded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3948) Improve operand's RelSubset matching handling in VolcanoRuleCall
[ https://issues.apache.org/jira/browse/CALCITE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090103#comment-17090103 ] Botong Huang commented on CALCITE-3948: --- Thanks [~hyuan] for the review! > Improve operand's RelSubset matching handling in VolcanoRuleCall > > > Key: CALCITE-3948 > URL: https://issues.apache.org/jira/browse/CALCITE-3948 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Major > Fix For: 1.23.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > For operands matching for a RelSubset, more handling under various cases are > needed to be consistent in VolcanoRuleCall -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3947) AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match order is deterministic across runs
[ https://issues.apache.org/jira/browse/CALCITE-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089093#comment-17089093 ] Botong Huang commented on CALCITE-3947: --- Thanks [~hyuan] and [~julianhyde] for the quick review! > AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match > order is deterministic across runs > --- > > Key: CALCITE-3947 > URL: https://issues.apache.org/jira/browse/CALCITE-3947 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Fix For: 1.23.0 > > Time Spent: 10m > Remaining Estimate: 0h > > AbstractRelOptPlanner.classes is used by subClasses() to determine things to > put into VolcanoPlanner.classOperands, which is then used in > VolcanoPlanner.fireRules(). Since AbstractRelOptPlanner.classes is now a > HashSet, its iteration order is not deterministic across runs, making > debugging hard. It should be LinkedHashSet just like many other fields in the > planner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3947) AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match order is deterministic across runs
[ https://issues.apache.org/jira/browse/CALCITE-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089083#comment-17089083 ] Botong Huang commented on CALCITE-3947: --- Yes it will. Since the iteration order changes across runs, the order of rule matched and thus fired are different. > AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match > order is deterministic across runs > --- > > Key: CALCITE-3947 > URL: https://issues.apache.org/jira/browse/CALCITE-3947 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > AbstractRelOptPlanner.classes is used by subClasses() to determine things to > put into VolcanoPlanner.classOperands, which is then used in > VolcanoPlanner.fireRules(). Since AbstractRelOptPlanner.classes is now a > HashSet, its iteration order is not deterministic across runs, making > debugging hard. It should be LinkedHashSet just like many other fields in the > planner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3948) Improve operand's RelSubset matching handling in VolcanoRuleCall
Botong Huang created CALCITE-3948: - Summary: Improve operand's RelSubset matching handling in VolcanoRuleCall Key: CALCITE-3948 URL: https://issues.apache.org/jira/browse/CALCITE-3948 Project: Calcite Issue Type: Improvement Reporter: Botong Huang For operands matching for a RelSubset, more handling under various cases are needed to be consistent in VolcanoRuleCall -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3947) AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match order is deterministic across runs
Botong Huang created CALCITE-3947: - Summary: AbstractRelOptPlanner.classes should be LinkedHashSet so that rule match order is deterministic across runs Key: CALCITE-3947 URL: https://issues.apache.org/jira/browse/CALCITE-3947 Project: Calcite Issue Type: Improvement Reporter: Botong Huang AbstractRelOptPlanner.classes is used by subClasses() to determine things to put into VolcanoPlanner.classOperands, which is then used in VolcanoPlanner.fireRules(). Since AbstractRelOptPlanner.classes is now a HashSet, its iteration order is not deterministic across runs, making debugging hard. It should be LinkedHashSet just like many other fields in the planner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3927) RelSubset is not fired for rule when set gets merged
[ https://issues.apache.org/jira/browse/CALCITE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088982#comment-17088982 ] Botong Huang commented on CALCITE-3927: --- Thanks [~hyuan] and everyone for the review and comments! > RelSubset is not fired for rule when set gets merged > > > Key: CALCITE-3927 > URL: https://issues.apache.org/jira/browse/CALCITE-3927 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Fix For: 1.23.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > In VolcanoPlanner, when set gets merged, planner fires rules again for > RelNodes in both sets, but not for RelSubset. We might miss something because > of this. > If all the logical transformation rules and physical implementation rules are > separated out in different stage and physical rules don't do logical work, we > might be OK. But the reality is that all the things are mixed together at the > moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3927) RelSubset is not fired for rule when set gets merged
[ https://issues.apache.org/jira/browse/CALCITE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088895#comment-17088895 ] Botong Huang edited comment on CALCITE-3927 at 4/21/20, 5:35 PM: - The unit test gives an example of what can happen in real case, where PhysSingleSubsetRule is a physical rule that has an operand matching for RelSubset. When two RelSets having none identical physical traits get merged (e.g. PHYS_CALLING_CONVENTION3 that is in the new RelSet but not in the original RelSet), the RelSubset with the new physical trait that gets merged in will never trigger the rule match again, but it should. In reality, the two physical trait PHYS_CALLING_CONVENTION and PHYS_CALLING_CONVENTION3 with the satisfy relationship can be something like "any" and "hash[0]" was (Author: botong): The unit test gives an example of what can happen in real case, where PhysSingleSubsetRule is a physical rule that has an operand matching for RelSubset. When two RelSets having none identical physical traits get merged (e.g. PHYS_CALLING_CONVENTION3 that is in the new RelSet but not in the original RelSet), the RelSubset with the new physical trait that gets merged in will never trigger the rule match again, but it should. > RelSubset is not fired for rule when set gets merged > > > Key: CALCITE-3927 > URL: https://issues.apache.org/jira/browse/CALCITE-3927 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In VolcanoPlanner, when set gets merged, planner fires rules again for > RelNodes in both sets, but not for RelSubset. We might miss something because > of this. > If all the logical transformation rules and physical implementation rules are > separated out in different stage and physical rules don't do logical work, we > might be OK. But the reality is that all the things are mixed together at the > moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3927) RelSubset is not fired for rule when set gets merged
[ https://issues.apache.org/jira/browse/CALCITE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088895#comment-17088895 ] Botong Huang commented on CALCITE-3927: --- The unit test gives an example of what can happen in real case, where PhysSingleSubsetRule is a physical rule that has an operand matching for RelSubset. When two RelSets having none identical physical traits get merged (e.g. PHYS_CALLING_CONVENTION3 that is in the new RelSet but not in the original RelSet), the RelSubset with the new physical trait that gets merged in will never trigger the rule match again, but it should. > RelSubset is not fired for rule when set gets merged > > > Key: CALCITE-3927 > URL: https://issues.apache.org/jira/browse/CALCITE-3927 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In VolcanoPlanner, when set gets merged, planner fires rules again for > RelNodes in both sets, but not for RelSubset. We might miss something because > of this. > If all the logical transformation rules and physical implementation rules are > separated out in different stage and physical rules don't do logical work, we > might be OK. But the reality is that all the things are mixed together at the > moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3939) Change UnionEliminatorRule and ProjectRemoveRule to auto pruning SubstitutionRule
[ https://issues.apache.org/jira/browse/CALCITE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087987#comment-17087987 ] Botong Huang commented on CALCITE-3939: --- Title changed, please let me know if it is clearer, thx! > Change UnionEliminatorRule and ProjectRemoveRule to auto pruning > SubstitutionRule > - > > Key: CALCITE-3939 > URL: https://issues.apache.org/jira/browse/CALCITE-3939 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a > RelNode. They can also become SubstitutionRule with autoprune enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3939) Change UnionEliminatorRule and ProjectRemoveRule to auto pruning SubstitutionRule
[ https://issues.apache.org/jira/browse/CALCITE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3939: -- Summary: Change UnionEliminatorRule and ProjectRemoveRule to auto pruning SubstitutionRule (was: more auto pruning rules after SubstitutionRule is introduced) > Change UnionEliminatorRule and ProjectRemoveRule to auto pruning > SubstitutionRule > - > > Key: CALCITE-3939 > URL: https://issues.apache.org/jira/browse/CALCITE-3939 > Project: Calcite > Issue Type: Improvement >Reporter: Botong Huang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a > RelNode. They can also become SubstitutionRule with autoprune enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3939) more auto pruning rules after SubstitutionRule is introduced
Botong Huang created CALCITE-3939: - Summary: more auto pruning rules after SubstitutionRule is introduced Key: CALCITE-3939 URL: https://issues.apache.org/jira/browse/CALCITE-3939 Project: Calcite Issue Type: Improvement Reporter: Botong Huang UnionEliminatorRule and ProjectRemoveRule are both pruning rules for a RelNode. They can also become SubstitutionRule with autoprune enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3927) RelSubset is not fired for rule when set gets merged
[ https://issues.apache.org/jira/browse/CALCITE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085120#comment-17085120 ] Botong Huang commented on CALCITE-3927: --- Uploaded a patch with test case, please take a look. > RelSubset is not fired for rule when set gets merged > > > Key: CALCITE-3927 > URL: https://issues.apache.org/jira/browse/CALCITE-3927 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In VolcanoPlanner, when set gets merged, planner fires rules again for > RelNodes in both sets, but not for RelSubset. We might miss something because > of this. > If all the logical transformation rules and physical implementation rules are > separated out in different stage and physical rules don't do logical work, we > might be OK. But the reality is that all the things are mixed together at the > moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3227) IndexOutOfBound when checking candidate parent match's input ordinal in VolcanoRuleCall
Botong Huang created CALCITE-3227: - Summary: IndexOutOfBound when checking candidate parent match's input ordinal in VolcanoRuleCall Key: CALCITE-3227 URL: https://issues.apache.org/jira/browse/CALCITE-3227 Project: Calcite Issue Type: Bug Reporter: Botong Huang In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the candidate parent relNode indeed has the previously matched relNode as a child with the right ordinal. However, some candidate parent can have less number of inputs than the parent operand, and thus we hit IndexOutOfBound when trying to grab the correct child for checking. In the added unit test that repro the bug, we have a union with two inputs of class PhysLeafRel. The rule however, matches a union with three inputs, with the third child operand matching for PhysLeafRel.class. When a child relNode gets matched to the third child operand, we go up trying to see whether the union relNode can match the parent. Trying to access the union's third input hits the IndexOutOfBound error because the union only has two inputs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (CALCITE-3118) VolcanoRuleCall should look at RelSubset rather than RelSet when checking child ordinal of a parent operand
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3118: -- Summary: VolcanoRuleCall should look at RelSubset rather than RelSet when checking child ordinal of a parent operand (was: Properly check child ordinal when matching parent operand in VolcanoRuleCall) > VolcanoRuleCall should look at RelSubset rather than RelSet when checking > child ordinal of a parent operand > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched RelNode as a child *with the* *expected > child ordinal*. > However, there is a bug in this child ordinal check: > {noformat} > 333if (ascending && operand.childPolicy != > RelOptRuleOperandChildPolicy.UNORDERED) { > 334 // We know that the previous operand was *a* child of its parent, > 335 // but now check that it is the *correct* child. > 336 final RelSubset input = > 337 (RelSubset) rel.getInput(previousOperand.ordinalInParent); > 338 List inputRels = input.set.getRelsFromAllSubsets(); > 339 if (!inputRels.contains(previous)) { > 340continue; > 341 } > 342} > {noformat} > We intend to make sure that "previous" is in Subset "input". However line 338 > looked at RelNodes from the entire RelSet, rather than RelNodes only in > Subset "input". As a result, in some cases, incorrect parent is not skipped > as expected and is matched incorrectly. > The unit test case included is a case that triggers this bug, where a second > child RelNode incorrectly get matched as the first child of the parent > RelNode. > -- > Here's a detailed explanation of the test case that triggers the bug > We constructed a RelNode structure: > {noformat} > PhysBiRel > / \ > Subset1 Subset2 > | | > leftPhyrightPhy > {noformat} > Where PhysBiRel has two inputs: leftPhy and rightPhy, both are logically > equivalent, but with different traits (Convention in this test case), and > thus are in different subsets. > (Two Rels in two subsets in the same RelSet is a necessary condition to > trigger this bug. ) > A rule AssertOperandsDifferentRule is constructed as follows: > {noformat} > operand(PhysBiRel.class, > operand(PhysLeafRel.class, any()), > operand(PhysLeafRel.class, any())) > {noformat} > Obviously the correct match would be: > {noformat} > PhysBiRel > / \ > leftPhyrightPhy > {noformat} > However, with the bug, another match is also returned: > {noformat} > PhysBiRel > / \ > rightPhyrightPhy > {noformat} > *This is wrong because rightPhy is not PhysBiRel's first input at all, and > should not match as parent operand's first child.* > > Here's how the incorrect match occurs. > 1. When rightPhy of class PhysLeafRel is registered, we attempt to match it > to both the left and right child operands of AssertOperandsDifferentRule > above. This is expected. > 2. When matched to the right child operand, it eventually leads to the > correct match result above. > 3. When matched to the left child operand, it should have skipped/failed > matching the parent operand to PhysBiRel because rightPhy is *NOT* > PhysBiRel's first input. But because of the bug, the match succeeded. After > parent is matched, then it attempt to match the right child operand, and > again matched the rightPhy. As a result, both child operand end up matching > the same RelNode rightPhy. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (CALCITE-3118) Properly check child ordinal when matching parent operand in VolcanoRuleCall
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3118: -- Summary: Properly check child ordinal when matching parent operand in VolcanoRuleCall (was: VolcanoRuleCall match parent child ordinal not properly checked) > Properly check child ordinal when matching parent operand in VolcanoRuleCall > > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched RelNode as a child *with the* *expected > child ordinal*. > However, there is a bug in this child ordinal check: > {noformat} > 333if (ascending && operand.childPolicy != > RelOptRuleOperandChildPolicy.UNORDERED) { > 334 // We know that the previous operand was *a* child of its parent, > 335 // but now check that it is the *correct* child. > 336 final RelSubset input = > 337 (RelSubset) rel.getInput(previousOperand.ordinalInParent); > 338 List inputRels = input.set.getRelsFromAllSubsets(); > 339 if (!inputRels.contains(previous)) { > 340continue; > 341 } > 342} > {noformat} > We intend to make sure that "previous" is in Subset "input". However line 338 > looked at RelNodes from the entire RelSet, rather than RelNodes only in > Subset "input". As a result, in some cases, incorrect parent is not skipped > as expected and is matched incorrectly. > The unit test case included is a case that triggers this bug, where a second > child RelNode incorrectly get matched as the first child of the parent > RelNode. > -- > Here's a detailed explanation of the test case that triggers the bug > We constructed a RelNode structure: > {noformat} > PhysBiRel > / \ > Subset1 Subset2 > | | > leftPhyrightPhy > {noformat} > Where PhysBiRel has two inputs: leftPhy and rightPhy, both are logically > equivalent, but with different traits (Convention in this test case), and > thus are in different subsets. > (Two Rels in two subsets in the same RelSet is a necessary condition to > trigger this bug. ) > A rule AssertOperandsDifferentRule is constructed as follows: > {noformat} > operand(PhysBiRel.class, > operand(PhysLeafRel.class, any()), > operand(PhysLeafRel.class, any())) > {noformat} > Obviously the correct match would be: > {noformat} > PhysBiRel > / \ > leftPhyrightPhy > {noformat} > However, with the bug, another match is also returned: > {noformat} > PhysBiRel > / \ > rightPhyrightPhy > {noformat} > *This is wrong because rightPhy is not PhysBiRel's first input at all, and > should not match as parent operand's first child.* > > Here's how the incorrect match occurs. > 1. When rightPhy of class PhysLeafRel is registered, we attempt to match it > to both the left and right child operands of AssertOperandsDifferentRule > above. This is expected. > 2. When matched to the right child operand, it eventually leads to the > correct match result above. > 3. When matched to the left child operand, it should have skipped/failed > matching the parent operand to PhysBiRel because rightPhy is *NOT* > PhysBiRel's first input. But because of the bug, the match succeeded. After > parent is matched, then it attempt to match the right child operand, and > again matched the rightPhy. As a result, both child operand end up matching > the same RelNode rightPhy. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3118: -- Description: In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched RelNode as a child *with the* *expected child ordinal*. However, there is a bug in this child ordinal check: {noformat} 333if (ascending && operand.childPolicy != RelOptRuleOperandChildPolicy.UNORDERED) { 334 // We know that the previous operand was *a* child of its parent, 335 // but now check that it is the *correct* child. 336 final RelSubset input = 337 (RelSubset) rel.getInput(previousOperand.ordinalInParent); 338 List inputRels = input.set.getRelsFromAllSubsets(); 339 if (!inputRels.contains(previous)) { 340continue; 341 } 342} {noformat} We intend to make sure that "previous" is in Subset "input". However line 338 looked at RelNodes from the entire RelSet, rather than RelNodes only in Subset "input". As a result, in some cases, incorrect parent is not skipped as expected and is matched incorrectly. The unit test case included is a case that triggers this bug, where a second child RelNode incorrectly get matched as the first child of the parent RelNode. -- Here's a detailed explanation of the test case that triggers the bug We constructed a RelNode structure: {noformat} PhysBiRel / \ Subset1 Subset2 | | leftPhyrightPhy {noformat} Where PhysBiRel has two inputs: leftPhy and rightPhy, both are logically equivalent, but with different traits (Convention in this test case), and thus are in different subsets. (Two Rels in two subsets in the same RelSet is a necessary condition to trigger this bug. ) A rule AssertOperandsDifferentRule is constructed as follows: {noformat} operand(PhysBiRel.class, operand(PhysLeafRel.class, any()), operand(PhysLeafRel.class, any())) {noformat} Obviously the correct match would be: {noformat} PhysBiRel / \ leftPhyrightPhy {noformat} However, with the bug, another match is also returned: {noformat} PhysBiRel / \ rightPhyrightPhy {noformat} *This is wrong because rightPhy is not PhysBiRel's first input at all, and should not match as parent operand's first child.* Here's how the incorrect match occurs. 1. When rightPhy of class PhysLeafRel is registered, we attempt to match it to both the left and right child operands of AssertOperandsDifferentRule above. This is expected. 2. When matched to the right child operand, it eventually leads to the correct match result above. 3. When matched to the left child operand, it should have skipped/failed matching the parent operand to PhysBiRel because rightPhy is *NOT* PhysBiRel's first input. But because of the bug, the match succeeded. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy. As a result, both child operand end up matching the same RelNode rightPhy. was: In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched RelNode as a child *with the* *expected child ordinal*. However, there is a bug in this child ordinal check: {noformat} 333if (ascending && operand.childPolicy != RelOptRuleOperandChildPolicy.UNORDERED) { 334 // We know that the previous operand was *a* child of its parent, 335 // but now check that it is the *correct* child. 336 final RelSubset input = 337 (RelSubset) rel.getInput(previousOperand.ordinalInParent); 338 List inputRels = input.set.getRelsFromAllSubsets(); 339 if (!inputRels.contains(previous)) { 340continue; 341 } 342} {noformat} We intend to make sure that "previous" is in Subset "input". However line 338 looked at RelNodes from the entire RelSet, rather than RelNodes only in Subset "input". As a result, in some cases, incorrect parent is not skipped as expected and matched incorrectly. The unit test case included is a case that triggers this bug, where a second child RelNode incorrectly get matched as the first child of the parent RelNode. -- Here's a detailed explanation of the test case that triggers the bug We constructed a RelNode structure: {noformat} PhysBiRel / \ Subset1 Subset2 | | leftPhyrightPhy {noformat} Where PhysBiRel has two inputs: leftPhy and rightPhy, both are logically equivalent, but with different traits (Convention in this test case), and thus are in different subsets.
[jira] [Updated] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3118: -- Description: In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched RelNode as a child *with the* *expected child ordinal*. However, there is a bug in this child ordinal check: {noformat} 333if (ascending && operand.childPolicy != RelOptRuleOperandChildPolicy.UNORDERED) { 334 // We know that the previous operand was *a* child of its parent, 335 // but now check that it is the *correct* child. 336 final RelSubset input = 337 (RelSubset) rel.getInput(previousOperand.ordinalInParent); 338 List inputRels = input.set.getRelsFromAllSubsets(); 339 if (!inputRels.contains(previous)) { 340continue; 341 } 342} {noformat} We intend to make sure that "previous" is in Subset "input". However line 338 looked at RelNodes from the entire RelSet, rather than RelNodes only in Subset "input". As a result, in some cases, incorrect parent is not skipped as expected and matched incorrectly. The unit test case included is a case that triggers this bug, where a second child RelNode incorrectly get matched as the first child of the parent RelNode. -- Here's a detailed explanation of the test case that triggers the bug We constructed a RelNode structure: {noformat} PhysBiRel / \ Subset1 Subset2 | | leftPhyrightPhy {noformat} Where PhysBiRel has two inputs: leftPhy and rightPhy, both are logically equivalent, but with different traits (Convention in this test case), and thus are in different subsets. (Two Rels in two subsets in the same RelSet is a necessary condition to trigger this bug. ) A rule AssertOperandsDifferentRule is constructed as follows: {noformat} operand(PhysBiRel.class, operand(PhysLeafRel.class, any()), operand(PhysLeafRel.class, any())) {noformat} Obviously the correct match would be: {noformat} PhysBiRel / \ leftPhyrightPhy {noformat} However, with the bug, another match is also returned: {noformat} PhysBiRel / \ rightPhyrightPhy {noformat} *This is wrong because rightPhy is not PhysBiRel's first input at all, and should not match as parent operand's first child.* Here's how the incorrect match occurs. 1. When rightPhy is registered, we attempt to match it to both the left and right child operand of AssertOperandsDifferentRule above. 2. When matched to the right child operand, it eventually leads to the correct match result above. 3. When matched to the left child operand, it should have skipped/failed matching the parent operand to PhysBiRel because rightPhy is not!!! PhysBiRel's first input. But because of the bug, the match succeeded. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy as expected. As a result, both child operand end up matching the same RelNode rightPhy. was:In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched child RelNode as a child with the expected ordinal. However, there is a bug in this check. As a result, some incorrect parent is not skipped as expected and matched incorrectly. See unit test included in PR for a case that triggers this bug, where a second child RelNode get matched as first child of the parent RelNode. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched RelNode as a child *with the* *expected > child ordinal*. > However, there is a bug in this child ordinal check: > {noformat} > 333if (ascending && operand.childPolicy != > RelOptRuleOperandChildPolicy.UNORDERED) { > 334 // We know that the previous operand was *a* child of its parent, > 335 // but now check that it is the *correct* child. > 336 final RelSubset input = > 337 (RelSubset) rel.getInput(previousOperand.ordinalInParent); > 338 List inputRels = input.set
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859312#comment-16859312 ] Botong Huang edited comment on CALCITE-3118 at 6/8/19 9:29 PM: --- {quote} Firstly you construct a RelNode tree like what the pattern describes, then you register that the `leftPhy` and `rightPhy` are equivalent. It's not that surprising you got a tree like: PhysBiRel / \ rightPhy rightPhy This tree also satisfied AssertOperandsDifferentRule patttern. {quote} leftPhy and rightPhy are indeed logically equivalent, but they have different traits (Convention in this test case), and thus are in different subsets. The actual Rel structure is: {noformat} PhysBiRel / \ Subset1 Subset2 | | leftPhyrightPhy {noformat} Two Rels in two subsets in the same RelSet is a necessary condition to trigger this bug. The reason that I think rightPhy should not match the left child operand is because of rightPhy's child ordinal in parent. When AssertOperandsDifferentRule says {noformat} operand(PhysBiRel.class, operand(PhysLeafRel.class, any()), operand(PhysLeafRel.class, any())) {noformat} it implies RelOptRuleOperandChildPolicy.SOME, whose javadoc says "Signifies that the operand's children must precisely match its child operands, in order." rightPhy is PhysBiRel's second input, and it should not match as the first child of the parent operand. It violates the "precisely match in order" part. In fact, rightPhy is not PhysBiRel's first input at all. I really don't think this is a valid match. Enforcing child ordinal is exactly what line 334-341 in VolcanoRuleCall is trying to do, as the code comment says: {noformat} // We know that the previous operand was *a* child of its parent, // but now check that it is the *correct* child. {noformat} Except that in this unit test's case, the code failed to detect and skip (continue at line 340) this wrong match because it looked at all Rels in the RelSet instead of looking at the Rels in the Subset only. This is why I insist this is a bug. Hopefully it is clearer now. was (Author: botong): {quote} Firstly you construct a RelNode tree like what the pattern describes, then you register that the `leftPhy` and `rightPhy` are equivalent. It's not that surprising you got a tree like: PhysBiRel / \ rightPhy rightPhy This tree also satisfied AssertOperandsDifferentRule patttern. {quote} leftPhy and rightPhy are indeed logically equivalent, but they have different traits (Convention in this test case), and thus are in different subsets. The actual Rel structure is: {noformat} PhysBiRel / \ Subset1 Subset2 | | leftPhyrightPhy {noformat} Two Rels in two subsets in the same RelSet is a necessary condition to trigger this bug. The reason that I think rightPhy should not match the left child operand is because of rightPhy's child ordinal in parent. When AssertOperandsDifferentRule says {noformat} operand(PhysBiRel.class, operand(PhysLeafRel.class, any()), operand(PhysLeafRel.class, any())) {noformat} it implies RelOptRuleOperandChildPolicy.SOME, whose javadoc says "Signifies that the operand's children must precisely match its child operands, in order." rightPhy is PhysBiRel's second input, and it should not match as the first child of the parent operand. It violates the "precisely match in order" part. Enforcing child ordinal is exactly what line 334-341 in VolcanoRuleCall is trying to do, as the code comment says: {noformat} // We know that the previous operand was *a* child of its parent, // but now check that it is the *correct* child. {noformat} Except that in this unit test's case, the code failed to detect and skip (continue at line 340) this wrong match because it looked at all Rels in the RelSet instead of looking at the Rels in the Subset only. This is why I insist this is a bug. Hopefully it is clearer now. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case t
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859312#comment-16859312 ] Botong Huang commented on CALCITE-3118: --- {quote} Firstly you construct a RelNode tree like what the pattern describes, then you register that the `leftPhy` and `rightPhy` are equivalent. It's not that surprising you got a tree like: PhysBiRel / \ rightPhy rightPhy This tree also satisfied AssertOperandsDifferentRule patttern. {quote} leftPhy and rightPhy are indeed logically equivalent, but they have different traits (Convention in this test case), and thus are in different subsets. The actual Rel structure is: {noformat} PhysBiRel / \ Subset1 Subset2 | | leftPhyrightPhy {noformat} Two Rels in two subsets in the same RelSet is a necessary condition to trigger this bug. The reason that I think rightPhy should not match the left child operand is because of rightPhy's child ordinal in parent. When AssertOperandsDifferentRule says {noformat} operand(PhysBiRel.class, operand(PhysLeafRel.class, any()), operand(PhysLeafRel.class, any())) {noformat} it implies RelOptRuleOperandChildPolicy.SOME, whose javadoc says "Signifies that the operand's children must precisely match its child operands, in order." rightPhy is PhysBiRel's second input, and it should not match as the first child of the parent operand. It violates the "precisely match in order" part. Enforcing child ordinal is exactly what line 334-341 in VolcanoRuleCall is trying to do, as the code comment says: {noformat} // We know that the previous operand was *a* child of its parent, // but now check that it is the *correct* child. {noformat} Except that in this unit test's case, the code failed to detect and skip (continue at line 340) this wrong match because it looked at all Rels in the RelSet instead of looking at the Rels in the Subset only. This is why I insist this is a bug. Hopefully it is clearer now. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where a second child > RelNode get matched as first child of the parent RelNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858693#comment-16858693 ] Botong Huang edited comment on CALCITE-3118 at 6/8/19 4:13 AM: --- {quote}It is not wrong provided the children are identical {quote} The leftPhy and rightPhy are logically equivalent, but has different traits and are in different subsets. I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* Rel1's first child. was (Author: botong): {quote}It is not wrong provided the children are identical {quote} I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* Rel1's first child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where a second child > RelNode get matched as first child of the parent RelNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated CALCITE-3118: -- Description: In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched child RelNode as a child with the expected ordinal. However, there is a bug in this check. As a result, some incorrect parent is not skipped as expected and matched incorrectly. See unit test included in PR for a case that triggers this bug, where a second child RelNode get matched as first child of the parent RelNode. (was: In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched child RelNode as a child with the expected ordinal. However, there is a bug in this check. As a result, some incorrect parent is not skipped as expected and matched incorrectly. See unit test included in PR for a case that triggers this bug, where the same RelNode get matched for two operands at the same time. ) > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where a second child > RelNode get matched as first child of the parent RelNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858699#comment-16858699 ] Botong Huang commented on CALCITE-3118: --- bq. Or it might happen you could rewrite the test-case in easy to understand way. The unit test is indeed involved because of the tricky condition of the bug. However I don't know if there's an easier way to trigger it... At least I hope you agree with the one line fix... > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858178#comment-16858178 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 2:24 PM: --- Agree in general. But in this test case, the rule in this unit test (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a parent relNode and its second child twice. was (Author: botong): Agree in general. But in this test case, the rule in this unit test (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a parent relNode and both its second child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858693#comment-16858693 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 2:20 PM: --- {quote}It is not wrong provided the children are identical {quote} I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* Rel1's first child. was (Author: botong): {quote}It is not wrong provided the children are identical {quote} I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* A's first child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858693#comment-16858693 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 2:19 PM: --- {quote}It is not wrong provided the children are identical {quote} I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* A's first child. was (Author: botong): bq. It is not wrong provided the children are identical I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* A's first child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858693#comment-16858693 ] Botong Huang commented on CALCITE-3118: --- bq. It is not wrong provided the children are identical I think you missed my point. The structure the _AssertOperandsDifferentRule_ is looking for is: A / \ B C where *B is A's first child*. What the match returned was: Rel1 / \ Rel2.1 Rel2.2 where Rel2.1 is *NOT* A's first child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858653#comment-16858653 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 1:39 PM: --- Thanks [~danny0405] for reviewing! Having two relNodes of the same class in two RelSubsets in the same RelSet (thus the equivalent line) is the necessary condition to trigger this bug. However, using convention in the unit test is not a must. It's merely to produce the two subSets. Any two traits that are different would suffice. Also as in AssertOperandsDifferentRule, it is not trying to match any convention. We are simply trying to match a parent with its first and second children being PhysLeafRel.class, it is wrong to return a match with a parent relNode and its second children twice. was (Author: botong): Thanks [~danny0405] for reviewing! Having two relNodes of the same class in two RelSubsets in the same RelSet (thus the equivalent line) is the necessary condition to trigger this bug. However, using convention in the unit test is not a must. It's merely to produce the two subSets. Any two traits that are different would suffice. Also as in AssertOperandsDifferentRule, it is not trying to match any convention. We are simply trying to match a parent with its first and second children being PhysLeafRel.class, it is wrong to return a match with a parent relNode and both its second children. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858053#comment-16858053 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 1:38 PM: --- Sure, supposedly this rule should only match once, with left child operand == leftPhy (PHYS, label = a) and right child operand == rightPhy (PHYS_2, label == b). However, without the fix, when rightPhy gets generated, it triggers two match attempts. The first one is expected as above. In the second one, rightPhy (PHYS_2, label == b) matches the left child operand, then we attempt to match the PhyBiRel as parent operand. Supposedly this parent should not match, because rightPhy is PhyBiRel's second child, but rightPhy's matched left child operand is parent operand's first child. Because of the bug, it didn't skip as expected and still proceed with the matching. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy (PHYS_2, label == b), as expected because it is PhyBiRel's second child. Overall as a result, both child operand end up matching the same RelNode rightPhy (PHYS_2, label == b). To recap, AssertOperandsDifferentRule is trying to match a parent with its first and second children being PhysLeafRel.class, it is wrong to return a match with a parent relNode and both its second children. Hope this is clearer now. was (Author: botong): Sure, supposedly this rule should only match once, with left child operand == leftPhy (PHYS, label = a) and right child operand == rightPhy (PHYS_2, label == b). However, without the fix, when rightPhy gets generated, it triggers two match attempts. The first one is expected as above. In the second one, rightPhy (PHYS_2, label == b) matches the left child operand, then we attempt to match the PhyBiRel as parent operand. Supposedly this parent should not match, because rightPhy is PhyBiRel's second child, but rightPhy's matched left child operand is parent operand's first child. Because of the bug, it didn't skip as expected and still proceed with the matching. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy (PHYS_2, label == b), as expected because it is PhyBiRel's second child. Overall as a result, both child operand end up matching the same RelNode rightPhy (PHYS_2, label == b). Hope this is clearer now. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858053#comment-16858053 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 1:39 PM: --- Sure, supposedly this rule should only match once, with left child operand == leftPhy (PHYS, label = a) and right child operand == rightPhy (PHYS_2, label == b). However, without the fix, when rightPhy gets generated, it triggers two match attempts. The first one is expected as above. In the second one, rightPhy (PHYS_2, label == b) matches the left child operand, then we attempt to match the PhyBiRel as parent operand. Supposedly this parent should not match, because rightPhy is PhyBiRel's second child, but rightPhy's matched left child operand is parent operand's first child. Because of the bug, it didn't skip as expected and still proceed with the matching. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy (PHYS_2, label == b), as expected because it is PhyBiRel's second child. Overall as a result, both child operand end up matching the same RelNode rightPhy (PHYS_2, label == b). To recap, AssertOperandsDifferentRule is trying to match a parent with its first and second children being PhysLeafRel.class, it is wrong to return a match with a parent relNode and its second children twice. Hope this is clearer now. was (Author: botong): Sure, supposedly this rule should only match once, with left child operand == leftPhy (PHYS, label = a) and right child operand == rightPhy (PHYS_2, label == b). However, without the fix, when rightPhy gets generated, it triggers two match attempts. The first one is expected as above. In the second one, rightPhy (PHYS_2, label == b) matches the left child operand, then we attempt to match the PhyBiRel as parent operand. Supposedly this parent should not match, because rightPhy is PhyBiRel's second child, but rightPhy's matched left child operand is parent operand's first child. Because of the bug, it didn't skip as expected and still proceed with the matching. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy (PHYS_2, label == b), as expected because it is PhyBiRel's second child. Overall as a result, both child operand end up matching the same RelNode rightPhy (PHYS_2, label == b). To recap, AssertOperandsDifferentRule is trying to match a parent with its first and second children being PhysLeafRel.class, it is wrong to return a match with a parent relNode and both its second children. Hope this is clearer now. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858653#comment-16858653 ] Botong Huang commented on CALCITE-3118: --- Thanks [~danny0405] for reviewing! Having two relNodes of the same class in two RelSubsets in the same RelSet (thus the equivalent line) is the necessary condition to trigger this bug. However, using convention in the unit test is not a must. It's merely to produce the two subSets. Any two traits that are different would suffice. Also as in AssertOperandsDifferentRule, it is not trying to match any convention. We are simply trying to match a parent with its first and second children being PhysLeafRel.class, it is wrong to return a match with a parent relNode and both its second children. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858178#comment-16858178 ] Botong Huang edited comment on CALCITE-3118 at 6/7/19 1:27 PM: --- Agree in general. But in this test case, the rule in this unit test (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a parent relNode and both its second child. was (Author: botong): Agree in general. But in this test case, the rule in this unit test (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a relNode and both its second child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858178#comment-16858178 ] Botong Huang edited comment on CALCITE-3118 at 6/6/19 11:29 PM: Agree in general. But in this test case, the rule in this unit test (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a relNode and both its second child. was (Author: botong): Agree in general. But in this test case, the rule (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a relNode and both its second child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858178#comment-16858178 ] Botong Huang commented on CALCITE-3118: --- Agree in general. But in this test case, the rule (AssertOperandsDifferentRule) is to match a parent and its first and second child, it is wrong to return a match with a relNode and both its second child. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858053#comment-16858053 ] Botong Huang commented on CALCITE-3118: --- Sure, supposedly this rule should only match once, with left child operand == leftPhy (PHYS, label = a) and right child operand == rightPhy (PHYS_2, label == b). However, without the fix, when rightPhy gets generated, it triggers two match attempts. The first one is expected as above. In the second one, rightPhy (PHYS_2, label == b) matches the left child operand, then we attempt to match the PhyBiRel as parent operand. Supposedly this parent should not match, because rightPhy is PhyBiRel's second child, but rightPhy's matched left child operand is parent operand's first child. Because of the bug, it didn't skip as expected and still proceed with the matching. After parent is matched, then it attempt to match the right child operand, and again matched the rightPhy (PHYS_2, label == b), as expected because it is PhyBiRel's second child. Overall as a result, both child operand end up matching the same RelNode rightPhy (PHYS_2, label == b). Hope this is clearer now. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858038#comment-16858038 ] Botong Huang commented on CALCITE-3118: --- New version pushed, can you take another look and see if it is better now? > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858020#comment-16858020 ] Botong Huang commented on CALCITE-3118: --- Sure, let me try make the unit test more readable. On a side note, is unit test required for this kind of fix? > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857997#comment-16857997 ] Botong Huang edited comment on CALCITE-3118 at 6/6/19 7:08 PM: --- Thanks [~vladimirsitnikov] for reviewing. The real check in the unit test is the assert inside TwoChildrenSameSetMatchRule. Without the one line fix, this assert will fail because it ends up matching the same PhysLeafRel for both child operands. I can add some comments in the unit test in needed. was (Author: botong): Thanks [~vladimirsitnikov] for reviewing. The real check in the unit test is the assert inside TwoChildrenSameSetMatchRule. Without the one line fix, this assert will fail because it ends up matching the same PhysLeafRel as both child operand. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
[ https://issues.apache.org/jira/browse/CALCITE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857997#comment-16857997 ] Botong Huang commented on CALCITE-3118: --- Thanks [~vladimirsitnikov] for reviewing. The real check in the unit test is the assert inside TwoChildrenSameSetMatchRule. Without the one line fix, this assert will fail because it ends up matching the same PhysLeafRel as both child operand. > VolcanoRuleCall match parent child ordinal not properly checked > --- > > Key: CALCITE-3118 > URL: https://issues.apache.org/jira/browse/CALCITE-3118 > Project: Calcite > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Labels: pull-request-available > > In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, > looking for parent operand match), we want to check that the matched parent > indeed has the previously matched child RelNode as a child with the expected > ordinal. However, there is a bug in this check. As a result, some incorrect > parent is not skipped as expected and matched incorrectly. See unit test > included in PR for a case that triggers this bug, where the same RelNode get > matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-3118) VolcanoRuleCall match parent child ordinal not properly checked
Botong Huang created CALCITE-3118: - Summary: VolcanoRuleCall match parent child ordinal not properly checked Key: CALCITE-3118 URL: https://issues.apache.org/jira/browse/CALCITE-3118 Project: Calcite Issue Type: Bug Reporter: Botong Huang In VolcanoRuleCall.matchRecurse(), when ascending (child operand is matched, looking for parent operand match), we want to check that the matched parent indeed has the previously matched child RelNode as a child with the expected ordinal. However, there is a bug in this check. As a result, some incorrect parent is not skipped as expected and matched incorrectly. See unit test included in PR for a case that triggers this bug, where the same RelNode get matched for two operands at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)