[
https://issues.apache.org/jira/browse/IGNITE-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Steshin updated IGNITE-25218:
--------------------------------------
Ignite Flags: Release Notes Required (was: Docs Required,Release Notes
Required)
> Calcite. Revise joins rows estimation.
> --------------------------------------
>
> Key: IGNITE-25218
> URL: https://issues.apache.org/jira/browse/IGNITE-25218
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladimir Steshin
> Priority: Minor
>
> Current TPCH queries #5, #7 are slow (scale 0.1). Let's consider {*}#5{*}.
> The *plan* is:
> {code:java}
> IgniteSort(sort0=[$1], dir0=[DESC-nulls-last])
> IgniteColocatedHashAggregate(group=[{0}], REVENUE=[SUM($1)])
> IgniteProject(N_NAME=[$12], $f1=[*($7, -(1, $8))])
> [1] IgniteNestedLoopJoin(condition=[AND(=($4, $10), =($5, $0))],
> joinType=[inner])
> [2] IgniteMergeJoin(condition=[=($3, $1)], joinType=[inner],
> leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0
> ASC-nulls-first]])
> IgniteExchange(distribution=[single])
> IgniteIndexScan(table=[[PUBLIC, ORDERS]], index=[O_CK_proxy],
> filters=[AND(>=($t2, 1994-01-01), <($t2, +(1994-01-01, *(12:INTERVAL YEAR,
> 1))))], requiredColumns=[{2, 3, 6}], collation=[[3 ASC-nulls-first, 2
> ASC-nulls-first]])
> IgniteExchange(distribution=[single])
> IgniteIndexScan(table=[[PUBLIC, CUSTOMER]],
> index=[_key_PK_proxy], requiredColumns=[{2, 5}], collation=[[2
> ASC-nulls-first]])
> [3] IgniteNestedLoopJoin(condition=[=($1, $4)], joinType=[inner])
> IgniteExchange(distribution=[single])
> IgniteTableScan(table=[[PUBLIC, LINEITEM]], requiredColumns=[{2,
> 4, 7, 8}])
> [4] IgniteMergeJoin(condition=[=($1, $2)], joinType=[inner],
> leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0
> ASC-nulls-first]])
> IgniteExchange(distribution=[single])
> IgniteIndexScan(table=[[PUBLIC, SUPPLIER]], index=[S_NK_proxy],
> requiredColumns=[{2, 5}], collation=[[5 ASC-nulls-first, 2 ASC-nulls-first]])
> [5] IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner])
> IgniteExchange(distribution=[single])
> IgniteIndexScan(table=[[PUBLIC, NATION]],
> index=[_key_PK_proxy], requiredColumns=[{2, 3, 4}], collation=[[2
> ASC-nulls-first]])
> IgniteExchange(distribution=[single])
> IgniteTableScan(table=[[PUBLIC, REGION]], filters=[=($t1,
> _UTF-8'ASIA')], requiredColumns=[{2, 3}])
> {code}
> **
> *The problem* is the NL join [1].
> Calcite chooses NL because it *estimanes only few rows* producted by another
> joins in the plan.
> Compare estimations of join rows number against real issued join rows:
> ||Join||Estimated rows||Created rows||
> |[1]|0.0225|865|
> |[2]|2250|22958|
> |[3]|0.15|134374|
> |[4]|0.15|225|
> |[5]|0.15|5|
>
> {*}Workaround{*}:
> /*+ MERGE_JOIN */
--
This message was sent by Atlassian Jira
(v8.20.10#820010)