[ 
https://issues.apache.org/jira/browse/IMPALA-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taras Bobrovytsky resolved IMPALA-6388.
---------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

{noformat}
commit f8b406222de8f41765ef1d130e2debbd8ab06369
Author: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Date: Thu Jan 11 17:01:07 2018 -0800

IMPALA-6388: Fix the Union node number of hosts estimation

Before this patch, we would estimate the number of hosts for the union
node by looking only at the first union operand. This is obviously
incorrect and lead us to underestimate the value.

We fix the problem by setting the estimate to be the maximum of its
children.

Testing:
- Added a planner test that reproduces the issue

Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0
Reviewed-on: http://gerrit.cloudera.org:8080/9017
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
Tested-by: Impala Public Jenkins
{noformat}

> UnionNode sets the number of nodes incorrectly
> ----------------------------------------------
>
>                 Key: IMPALA-6388
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6388
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, 
> Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>            Reporter: Alexander Behm
>            Assignee: Taras Bobrovytsky
>            Priority: Critical
>              Labels: planner
>             Fix For: Impala 2.12.0
>
>
> The UnionNode plan node incorrectly sets the number of nodes based on its 
> first child. An inaccurate number of nodes can lead to bad planning 
> decisions, e.g. wrong join order or strategy.
> A better policy would be to set the number of nodes based on the max nodes 
> over all the union's children. That number might still underestimate the real 
> number of nodes, but significantly less so.
> Getting a more accurate estimate would involve keeping track of the actual 
> list of hosts in all plan nodes. Let's focus on the simpler solution outlined 
> above first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to