[ https://issues.apache.org/jira/browse/CALCITE-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813276#comment-17813276 ]
Ruben Q L commented on CALCITE-6236: ------------------------------------ {quote}Rules create semantically equivalent plans. Someone could argue that equivalent means that they should have the same number of rows/costs. {quote} I'd argue that they would have the same rowCount, but can have different costs (e.g. a MergeJoin can have higher cost than its equivalent HashJoin, since the former requires the inputs to be sorted). Circling back to the "correction factor" approach for EBNLJ, what if: - When creating the EBNLJ, we store the selectivity of the correlate filter upon the (original) RHS. - We know that, from that point on, the EBNLJ's new RHS will have its rowCount reduced due to the correlate filter that has been applied. - For the rowCount estimation of the EBNLJ, we can get back the original rowCount of the RHS by doing something like: {code:java} adjusted_rowCount_RHS = rowCount_RHS / selectivity_of_correlate_filter {code} And we use that adjustedRowCount in the computation of EBNLJ's rowCount? > EnumerableBatchNestedLoopJoin uses wrong row count for cost calculation > ----------------------------------------------------------------------- > > Key: CALCITE-6236 > URL: https://issues.apache.org/jira/browse/CALCITE-6236 > Project: Calcite > Issue Type: Bug > Reporter: Ulrich Kramer > Priority: Major > Labels: pull-request-available > > {{EnumerableBatchNestedLoopJoin}} always adds a {{Filter}} on the right > relation. > This filter reduces the number of rows by it's selectivity (in our case by a > factor of 4). > Therefore, {{RelMdUtil.getJoinRowCount}} returns a value 4 times lower > compared to the one returned for a {{JdbcJoin}}. > This leads to the fact that in most cases {{EnumerableBatchNestedLoopJoin}} > is preferred over {{JdbcJoin}}. > This is an example for the different costs > {code} > EnumerableProject rows=460.0 self_costs=460.0 cumulative_costs=1465.0 > EnumerableBatchNestedLoopJoin rows=460.0 self_costs=687.5 > cumulative_costs=1005.0 > JdbcToEnumerableConverter rows=100.0 self_costs=10.0 > cumulative_costs=190.0 > JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0 > JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0 > JdbcToEnumerableConverter rows=25.0 self_costs=2.5 cumulative_costs=127.5 > JdbcFilter rows=25.0 self_costs=25.0 cumulative_costs=125.0 > JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0 > {code} > vs. > {code} > JdbcToEnumerableConverter rows=1585.0 self_costs=158.5 cumulative_costs=2023.5 > JdbcJoin rows=1585.0 self_costs=1585.0 cumulative_costs=1865.0 > JdbcProject rows=100.0 self_costs=80.0 cumulative_costs=180.0 > JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0 > JdbcTableScan rows=100.0 self_costs=100.0 cumulative_costs=100.0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)