[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

Ruben Q L (Jira) Thu, 01 Feb 2024 07:07:04 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813276#comment-17813276
 ]


Ruben Q L commented on CALCITE-6236:
------------------------------------

{quote}Rules create semantically equivalent plans. Someone could argue that 
equivalent means that they should have the same number of rows/costs.
{quote}
I'd argue that they would have the same rowCount, but can have different costs 
(e.g. a MergeJoin can have higher cost than its equivalent HashJoin, since the 
former requires the inputs to be sorted).

Circling back to the "correction factor" approach for EBNLJ, what if:
 - When creating the EBNLJ, we store the selectivity of the correlate filter 
upon the (original) RHS.
 - We know that, from that point on, the EBNLJ's new RHS will have its rowCount 
reduced due to the correlate filter that has been applied.
 - For the rowCount estimation of the EBNLJ, we can get back the original 
rowCount of the RHS by doing something like:
{code:java}
adjusted_rowCount_RHS = rowCount_RHS / selectivity_of_correlate_filter
{code}
And we use that adjustedRowCount in the computation of EBNLJ's rowCount?

> EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-6236
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6236
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Ulrich Kramer
>            Priority: Major
>              Labels: pull-request-available
>
> {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right 
> relation.
> This filter reduces the number of rows by it's selectivity (in our case by a 
> factor of 4).
> Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower 
> compared to the one returned for a {{JdbcJoin}}. 
> This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} 
> is preferred over {{JdbcJoin}}.
> This is an example for the different costs
> {code}
> EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0
>   EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 
> cumulative_costs=1005.0
>     JdbcToEnumerableConverter rows=100.0 self_costs=10.0 
> cumulative_costs=190.0
>       JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>         JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
>     JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5
>       JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0
>         JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}
> vs.
> {code}
> JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5
>   JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0
>     JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0
>       JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
>     JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6236) EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation

Reply via email to