Konstantin Orlov created IGNITE-24678:
-----------------------------------------
Summary: Sql. Introduce heuristic to exclude NLJ when HJ may be
applied
Key: IGNITE-24678
URL: https://issues.apache.org/jira/browse/IGNITE-24678
Project: Ignite
Issue Type: Improvement
Components: sql
Reporter: Konstantin Orlov
Currently, we have very primitive statistics which includes only table size.
Moreover, they are gathered with some sort of throttling, preventing updating
statistics for the same table more often than once per minute.
The problem arises, when heavy query is executed immediately after all data has
been uploaded to a table (which is actually every benchmark scenario): the
first insert triggers gathering of table stats, resulting in table size close
to 1 to be cached in statistic manager. During planning phase, cost-based
optimizer makes wrong choices due to misleading statistics. The most expensive
one is choosing NestedLoopJoin over HashJoin. For instance. the query 5 from
TPC-H suite and scale factor 0.1, which normally completes under 1 second
(373ms on my laptop), takes tens of minutes to complete with wrong join
algorithm (it didn't finish in 15min, so I killed it).
To mitigate the issue, we may introduce heuristic to avoid using NLJ for joins
that can be executed with HJ.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)