[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ulrich Kramer (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813084#comment-17813084
 ] 

Ulrich Kramer commented on CALCITE-6236:


I closed 3660

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6193) If a query has more than one subexpression that matches a materialized view, only the first is substituted

2024-01-31 Thread Mou Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813016#comment-17813016
 ] 

Mou Wu commented on CALCITE-6193:
-

[~jiajunbernoulli] Did you review my comments in your commit?

> If a query has more than one subexpression that matches a materialized view, 
> only the first is substituted
> --
>
> Key: CALCITE-6193
> URL: https://issues.apache.org/jira/browse/CALCITE-6193
> Project: Calcite
>  Issue Type: Bug
>Reporter: Mou Wu
>Assignee: Mou Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.37.0
>
>
> {code:java}
> @Test void testStopTryIncorrectSubtree() {
>   final String mv = ""
>   + "select \"empid\", \"deptno\"\n"
>   + "from \"emps\"\n"
>   + "where \"salary\" > 1000\n"
>   + "group by \"empid\", \"deptno\"";
>   final String query = ""
>   + "select t1.\"deptno\"\n"
>   + "from (\n"
>   + "select \"deptno\"\n"
>   + "from \"emps\"\n"
>   + "where \"salary\" > 1000\n"
>   + "union all\n"
>   + "select \"deptno\"\n"
>   + "from \"emps\"\n"
>   + "where \"salary\" > 1000\n"
>   + "group by \"deptno\"\n"
>   + ") as t1 inner join (\n"
>   + "select \"deptno\"\n"
>   + "from \"emps\"\n"
>   + "where \"salary\" > 1000\n"
>   + "group by \"deptno\"\n"
>   + ") as t2 on t1.\"deptno\" = t2.\"deptno\"\n";
>   sql(mv, query)
>   .checkingThatResultContains(""
>   + "LogicalCalc(expr#0..1=[{inputs}], deptno=[$t0])\n"
>   + "  LogicalJoin(condition=[=($0, $1)], joinType=[inner])\n"
>   + "LogicalUnion(all=[true])\n"
>   + "  LogicalCalc(expr#0..4=[{inputs}], expr#5=[1000], 
> expr#6=[>($t3, $t5)], deptno=[$t1], $condition=[$t6])\n"
>   + "LogicalTableScan(table=[[hr, emps]])\n"
>   + "  LogicalAggregate(group=[{1}])\n"
>   + "EnumerableTableScan(table=[[hr, MV0]])\n"
>   + "LogicalAggregate(group=[{1}])\n"
>   + "  EnumerableTableScan(table=[[hr, MV0]])"
>   ).ok();
> }{code}
> The test case above will fail because the second mv0 not be matched.
> Two conditions these kind of bug matchs:
> (a) There is a third expression, earlier in the view, that has a 
> subexpression in common with the matching fragments but does not match the 
> view.
> (b) The matching fragments require some compensation.
> The root cause is that SubstitutionVisitor replace child nodes with 
> targetDescendant node itself, not a deep-copy replica, so they may share the 
> same node and the same parent node, thus the incorrect parent relationship 
> may occur, it will make stopTrying be wrong.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-2067) RexLiteral cannot represent accurately floating point values, including NaN, Infinity

2024-01-31 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu updated CALCITE-2067:
-
Summary: RexLiteral cannot represent accurately floating point values, 
including NaN, Infinity  (was: RexBuilder can't handle NaN,Infinity double 
constants)

> RexLiteral cannot represent accurately floating point values, including NaN, 
> Infinity
> -
>
> Key: CALCITE-2067
> URL: https://issues.apache.org/jira/browse/CALCITE-2067
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Mihai Budiu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we are working in Drill to allow support for Nan an Infinity values 
> . 
> When using such values with functions we see the NumberFormatException 
> exception. 
> Query example: _select sin(cast('NaN' as float))_
> RexBuilder#clean and other RexBuilder's methods use BigDecimal for handling 
> Double values, that's why we get NumberFormatException for nan,inf values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-2067) RexBuilder can't handle NaN,Infinity double constants

2024-01-31 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu reassigned CALCITE-2067:


Assignee: Mihai Budiu

> RexBuilder can't handle NaN,Infinity double constants
> -
>
> Key: CALCITE-2067
> URL: https://issues.apache.org/jira/browse/CALCITE-2067
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Mihai Budiu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we are working in Drill to allow support for Nan an Infinity values 
> . 
> When using such values with functions we see the NumberFormatException 
> exception. 
> Query example: _select sin(cast('NaN' as float))_
> RexBuilder#clean and other RexBuilder's methods use BigDecimal for handling 
> Double values, that's why we get NumberFormatException for nan,inf values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812807#comment-17812807
 ] 

Mihai Budiu commented on CALCITE-6236:
--

Should 3660 be closed?

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812799#comment-17812799
 ] 

Ruben Q L commented on CALCITE-6236:


See https://github.com/apache/calcite/pull/3661

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812730#comment-17812730
 ] 

Ruben Q L edited comment on CALCITE-6236 at 1/31/24 3:23 PM:
-

[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original plan, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the BNLJ's Right Hand Side there is another join, and the 
BatchNestedLoop filter has been pushed inside this inside join (by the relevant 
rule for that purpose), shall we examine the Left or the Right hand side of 
this inside join? What if this inside join is also a BatchNestedLoop with its 
own filter inside, how can we distinguish the outside BNL filter from the 
inside's?

We faced this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin".
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin's rowCount and cost estimation to 
use the same rowCount value as the original join that generated it. I think it 
would be a valid approach to fix your problem.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider it in order to fix this situation.


was (Author: rubenql):
[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original plan, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the BNLJ's Right Hand Side there is another join, and the 
BatchNestedLoop filter has been pushed inside this inside join (by the relevant 
rule for that purpose), shall we examine the Left or the Right hand side of 
this inside join? What if this inside join is also a BatchNestedLoop with its 
own filter inside, how can we distinguish the outside BNL filter from the 
inside's?

We faced this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin".
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it. I think it 
would be a valid approach to fix your problem.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> 

[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812747#comment-17812747
 ] 

Ruben Q L commented on CALCITE-6236:


Working on it...

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ulrich Kramer (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812741#comment-17812741
 ] 

Ulrich Kramer commented on CALCITE-6236:


It would be great, if you could prepare a PR.

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812730#comment-17812730
 ] 

Ruben Q L edited comment on CALCITE-6236 at 1/31/24 3:03 PM:
-

[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original plan, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the BNLJ's Right Hand Side there is another join, and the 
BatchNestedLoop filter has been pushed inside this inside join (by the relevant 
rule for that purpose), shall we examine the Left or the Right hand side of 
this inside join? What if this inside join is also a BatchNestedLoop with its 
own filter inside, how can we distinguish the outside BNL filter from the 
inside's?

We faced this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin".
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it. I think it 
would be a valid approach to fix your problem.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.


was (Author: rubenql):
[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original plan, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the RHS there is another join, and the BatchNestedLoop filter has 
been pushed inside the join (by the relevant rule for that purpose), shall we 
examine the Left or the Right hand side of this inner join? What if this inner 
join is also a BatchNestedLoop with its own filter inside, how can we 
distinguish the outer BNL filter from the inner's?

We face this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin"
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 

[jira] [Comment Edited] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812730#comment-17812730
 ] 

Ruben Q L edited comment on CALCITE-6236 at 1/31/24 3:01 PM:
-

[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original plan, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the RHS there is another join, and the BatchNestedLoop filter has 
been pushed inside the join (by the relevant rule for that purpose), shall we 
examine the Left or the Right hand side of this inner join? What if this inner 
join is also a BatchNestedLoop with its own filter inside, how can we 
distinguish the outer BNL filter from the inner's?

We face this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin"
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.


was (Author: rubenql):
[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original one, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the RHS there is another join, and the BatchNestedLoop filter has 
been pushed inside the join (by the relevant rule for that purpose), shall we 
examine the Left or the Right hand side of this inner join? What if this inner 
join is also a BatchNestedLoop with its own filter inside, how can we 
distinguish the outer BNL filter from the inner's?

We face this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin"
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 

[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812730#comment-17812730
 ] 

Ruben Q L commented on CALCITE-6236:


[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original one, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the RHS there is another join, and the BatchNestedLoop filter has 
been pushed inside the join (by the relevant rule for that purpose), shall we 
examine the Left or the Right hand side of this inner join? What if this inner 
join is also a BatchNestedLoop with its own filter inside, how can we 
distinguish the outer BNL filter from the inner's?

We face this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin"
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-6236:

Labels: pull-request-available  (was: )

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> ---
>
> Key: CALCITE-6236
> URL: https://issues.apache.org/jira/browse/CALCITE-6236
> Project: Calcite
>  Issue Type: Bug
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
> JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>   JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>   JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
> JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>   JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6221) JDBC adapter generates invalid query when the same table is joined multiple times

2024-01-31 Thread Ulrich Kramer (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812646#comment-17812646
 ] 

Ulrich Kramer commented on CALCITE-6221:


I will try to modify the PR in this direction

> JDBC adapter generates invalid query when the same table is joined multiple 
> times
> -
>
> Key: CALCITE-6221
> URL: https://issues.apache.org/jira/browse/CALCITE-6221
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.36.0
> Environment: Local development
>Reporter: Ulrich Kramer
>Priority: Major
>  Labels: pull-request-available
>
> Adding the following unit test to {{JdbcAdapterTest}}
> {code:java}
>   @Test void testUnknownColumn() {
> CalciteAssert.model(JdbcTest.SCOTT_MODEL)
> .query("SELECT\n" +
> "\"content-format-owner\",\n" +
> "\"content-owner\"\n" +
> "FROM\n" +
> "(\n" +
> "SELECT\n" +
> "d1.dname AS \"content-format-owner\",\n" +
> "d2.dname || ' ' AS \"content-owner\"\n" +
> "FROM\n" +
> "scott.emp e1\n" +
> "left outer join scott.dept d1 on e1.deptno = 
> d1.deptno\n" +
> "left outer join scott.dept d2 on e1.deptno = 
> d2.deptno\n" +
> "left outer join scott.emp e2 on e1.deptno = 
> e2.deptno\n" +
> "GROUP BY\n" +
> "d1.dname,\n" +
> "d2.dname\n" +
> ")\n" +
> "WHERE\n" +
> "\"content-owner\" IN (?)")
> .runs();
>   }
> {code}
> Fails because the following SQL is sent to the underlying database
> {code:SQL}
> SELECT
> "t2"."DNAME" AS "content-format-owner",
> "t2"."DNAME0" || ' ' AS "content-owner"
> FROM
> (
> SELECT
> *
> FROM
> (
> SELECT
> "DEPTNO"
> FROM
> "SCOTT"."EMP"
> ) AS "t"
> LEFT JOIN (
> SELECT
> "DEPTNO",
> "DNAME"
> FROM
> "SCOTT"."DEPT"
> ) AS "t0" ON "t"."DEPTNO" = "t0"."DEPTNO"
> LEFT JOIN (
> SELECT
> "DEPTNO",
> "DNAME"
> FROM
> "SCOTT"."DEPT"
> ) AS "t1" ON "t"."DEPTNO" = "t1"."DEPTNO"
> WHERE
> "t1"."DNAME" || ' ' = ?
> ) AS "t2"
> LEFT JOIN (
> SELECT
> "DEPTNO"
> FROM
> "SCOTT"."EMP"
> ) AS "t3" ON "t2"."DEPTNO" = "t3"."DEPTNO"
> GROUP BY
> "t2"."DNAME",
> "t2"."DNAME0"
> {code}
> The column {{"t2"."DNAME0"}} does not exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

2024-01-31 Thread Ulrich Kramer (Jira)
Ulrich Kramer created CALCITE-6236:
--

 Summary: EnumerableBatchNestedLoopJoin uses wrong row count for 
cost calculation
 Key: CALCITE-6236
 URL: https://issues.apache.org/jira/browse/CALCITE-6236
 Project: Calcite
  Issue Type: Bug
Reporter: Ulrich Kramer


{{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
relation.
This filter reduces the number of rows by it's selectivity (in our case by a 
factor of 4).
Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower compared 
to the one returned for a {{JdbcJoin}}. 
This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} is 
preferred over {{JdbcJoin}}.

This is an example for the different costs

{code}
EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
  EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
cumulative_costs=1005.0
JdbcToEnumerableConverter rows=100.0 self_costs=10.0 cumulative_costs=190.0
  JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
  JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
{code}

vs.

{code}
JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
  JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
  JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)