Konstantin Orlov created IGNITE-24678:
-----------------------------------------

             Summary: Sql. Introduce heuristic to exclude NLJ when HJ may be 
applied
                 Key: IGNITE-24678
                 URL: https://issues.apache.org/jira/browse/IGNITE-24678
             Project: Ignite
          Issue Type: Improvement
          Components: sql
            Reporter: Konstantin Orlov


Currently, we have very primitive statistics which includes only table size. 
Moreover, they are gathered with some sort of throttling, preventing updating 
statistics for the same table more often than once per minute.

The problem arises, when heavy query is executed immediately after all data has 
been uploaded to a table (which is actually every benchmark scenario): the 
first insert triggers gathering of table stats, resulting in table size close 
to 1 to be cached in statistic manager. During planning phase, cost-based 
optimizer makes wrong choices due to misleading statistics. The most expensive 
one is choosing NestedLoopJoin over HashJoin. For instance. the query 5 from 
TPC-H suite and scale factor 0.1, which normally completes under 1 second 
(373ms on my laptop), takes tens of minutes to complete with wrong join 
algorithm (it didn't finish in 15min, so I killed it).

To mitigate the issue, we may introduce heuristic to avoid using NLJ for joins 
that can be executed with HJ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to