Vladimir Steshin created IGNITE-25218:
-----------------------------------------
Summary: Calcite. Revise joins rows estimation.
Key: IGNITE-25218
URL: https://issues.apache.org/jira/browse/IGNITE-25218
Project: Ignite
Issue Type: Improvement
Environment: *strong text*
Reporter: Vladimir Steshin
Current TPCH queries #5, #7 are slow (scale 0.1). Let's consider *#5*.
The *plan* is:
{code:java}
IgniteSort(sort0=[$1], dir0=[DESC-nulls-last])
IgniteColocatedHashAggregate(group=[{0}], REVENUE=[SUM($1)])
IgniteProject(N_NAME=[$12], $f1=[*($7, -(1, $8))])
[1] IgniteNestedLoopJoin(condition=[AND(=($4, $10), =($5, $0))],
joinType=[inner])
[2] IgniteMergeJoin(condition=[=($3, $1)], joinType=[inner],
leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0
ASC-nulls-first]])
IgniteExchange(distribution=[single])
IgniteIndexScan(table=[[PUBLIC, ORDERS]], index=[O_CK_proxy],
filters=[AND(>=($t2, 1994-01-01), <($t2, +(1994-01-01, *(12:INTERVAL YEAR,
1))))], requiredColumns=[{2, 3, 6}], collation=[[3 ASC-nulls-first, 2
ASC-nulls-first]])
IgniteExchange(distribution=[single])
IgniteIndexScan(table=[[PUBLIC, CUSTOMER]], index=[_key_PK_proxy],
requiredColumns=[{2, 5}], collation=[[2 ASC-nulls-first]])
[3] IgniteNestedLoopJoin(condition=[=($1, $4)], joinType=[inner])
IgniteExchange(distribution=[single])
IgniteTableScan(table=[[PUBLIC, LINEITEM]], requiredColumns=[{2, 4,
7, 8}])
[4] IgniteMergeJoin(condition=[=($1, $2)], joinType=[inner],
leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0
ASC-nulls-first]])
IgniteExchange(distribution=[single])
IgniteIndexScan(table=[[PUBLIC, SUPPLIER]], index=[S_NK_proxy],
requiredColumns=[{2, 5}], collation=[[5 ASC-nulls-first, 2 ASC-nulls-first]])
[5] IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner])
IgniteExchange(distribution=[single])
IgniteIndexScan(table=[[PUBLIC, NATION]],
index=[_key_PK_proxy], requiredColumns=[{2, 3, 4}], collation=[[2
ASC-nulls-first]])
IgniteExchange(distribution=[single])
IgniteTableScan(table=[[PUBLIC, REGION]], filters=[=($t1,
_UTF-8'ASIA')], requiredColumns=[{2, 3}])
{code}
The *problem is the NL join [1]*. /*+ MERGE_JOIN */ fixes the issue.
Calcite chooses NL because *estimanes only few rows* on its inputs. Consider
estimations of join rows production against real issued join rows numbers of
this query:
|| Join || Estimated rows || Created rows ||
| [1] | 0.0225 | 865 |
| [2] | 2250 | 22958 |
| [3] | 0.15 | 134374 |
| [4] | 0.15 | 225 |
| [5] | 0.0225 | 5 |
--
This message was sent by Atlassian Jira
(v8.20.10#820010)