[ 
https://issues.apache.org/jira/browse/IMPALA-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430877#comment-17430877
 ] 

ASF subversion and git services commented on IMPALA-10973:
----------------------------------------------------------

Commit 7bf39968bb95ac3aa66ff50b03495d4bdb97293b in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7bf3996 ]

IMPALA-10973: Do not schedule empty scan nodes to coordinator

Until now fragments with scan nodes that have no scan ranges were
scheduled to the coordinator, even if it is an exclusive coordinator.

This could possibly lead to a lot of work to be scheduled to the
coordinator. This patch changes the logic to choose a random executor
instead.

Change-Id: Ie31df3861aad2e3e91cab621ff122a4f721905ef
Reviewed-on: http://gerrit.cloudera.org:8080/17954
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Abhishek Rawat <[email protected]>
Reviewed-by: Bikramjeet Vig <[email protected]>


> Empty scan nodes are scheduled to the (exclusive) coordinator
> -------------------------------------------------------------
>
>                 Key: IMPALA-10973
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10973
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: scalability, scheduler
>
> Currently fragments with scan nodes that have no scan ranges are scheduled to 
> the coordinator, even if it is an exclusive coordinator:
> https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L805
> As "parent" fragments are often scheduled to be collocated with their 
> children, the condition of "being scheduled to the coordinator" can spread 
> through the plan tree.
> This can be disastrous to scalability in clusters with lot of executors but 
> few coordinators and is also very counter-intuitive, as scanning an empty 
> table shouldn't have a major effect on the query. 
>  
> To reproduce locally:
> bin/start-impala-cluster.py --use_exclusive_coordinators -c 1
> in Impala shell:
> select id from functional.alltypes;
> profile; -- scan nodes will be scheduled to 2 hosts
> select f2 from functional.emptytable union all select id from 
> functional.alltypes;
> profile; --  scan nodes will be scheduled to 3 hosts



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to