[ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812730#comment-17812730
 ] 

Ruben Q L commented on CALCITE-6236:
------------------------------------

[~kramerul], I have taken a quick look at the 
[PR#3660|https://github.com/apache/calcite/pull/3660], I'm afraid this solution 
might not be 100% bullet-proof:
- What if the filter that you find is not the filter introduced by the 
BatchNestedLoop? (but a different one that was part of the original one, or 
maybe a combination of both after FilterMergeRule was applied).
- What if in the RHS there is another join, and the BatchNestedLoop filter has 
been pushed inside the join (by the relevant rule for that purpose), shall we 
examine the Left or the Right hand side of this inner join? What if this inner 
join is also a BatchNestedLoop with its own filter inside, how can we 
distinguish the outer BNL filter from the inner's?

We face this issue in our project, the solution that we put in place was:
- Inside EnumerableBatchNestedLoop add a new field "originalJoin", include it 
on the constructor and create methods, add a getter for it.
- In EnumerableBatchNestedLoopRule, when it creates the 
EnumerableBatchNestedLoopJoin, pass the Join that fired the rule as 
"originalJoin"
- In RelMdRowCount, "override" the rowCount computation for BNLJ so that:
{code}
public Double getRowCount(EnumerableBatchNestedLoopJoin join, RelMetadataQuery 
mq)
{
  return mq.getRowCount(join.getOriginalJoin());
}
{code}

This results in EnumerableBatchNestedLoopJoin rowCount and cost estimation to 
use the same rowCount value as the original join that generated it.

I can prepare a PR with this solution, if it is accepted by the community we 
can consider is in order to fix this situation.

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-6236
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6236
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Ulrich Kramer
>            Priority: Major
>              Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
>     JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>       JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>         JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
>     JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>       JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
>         JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
>     JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>       JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
>     JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to