[ 
https://issues.apache.org/jira/browse/IGNITE-24678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Orlov reassigned IGNITE-24678:
-----------------------------------------

    Assignee: Konstantin Orlov

> Sql. Introduce heuristic to exclude NLJ when HJ may be applied
> --------------------------------------------------------------
>
>                 Key: IGNITE-24678
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24678
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Konstantin Orlov
>            Assignee: Konstantin Orlov
>            Priority: Major
>              Labels: ignite-3
>
> Currently, we have very primitive statistics which includes only table size. 
> Moreover, they are gathered with some sort of throttling, preventing updating 
> statistics for the same table more often than once per minute.
> The problem arises, when heavy query is executed immediately after all data 
> has been uploaded to a table (which is actually every benchmark scenario): 
> the first insert triggers gathering of table stats, resulting in table size 
> close to 1 to be cached in statistic manager. During planning phase, 
> cost-based optimizer makes wrong choices due to misleading statistics. The 
> most expensive one is choosing NestedLoopJoin over HashJoin. For instance. 
> the query 5 from TPC-H suite and scale factor 0.1, which normally completes 
> under 1 second (373ms on my laptop), takes tens of minutes to complete with 
> wrong join algorithm (it didn't finish in 15min, so I killed it).
> To mitigate the issue, we may introduce heuristic to avoid using NLJ for 
> joins that can be executed with HJ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to