[ 
https://issues.apache.org/jira/browse/IMPALA-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595215#comment-16595215
 ] 

Tim Armstrong commented on IMPALA-7496:
---------------------------------------

We want to do something like this at some point but it's a big change with 
various pros and cons to consider and lots of testing required.

WRT to the specific situation described, the question I'd ask is why the memory 
isn't being freed up on those particular nodes. Are the queries actually 
actively executing or are they just being held open by a client? Is this a 
version of Impala with IMPALA-1575?

REPLICA_PREFERENCE=remote and SCHEDULE_RANDOM_REPLICA=true would avoid this but 
at the cost of locality. (random distribution doesn't *guarantee* that 
something is scheduled on an idle node, but statistically if query mem_limits 
are relatively small compared to the process mem_limit then it will even out 
the workload enough that the expected queueing delay isn't likely to be that 
much higher than a resource-aware implementation).

> Schedule query taking in account the mem available on the impalad nodes
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-7496
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7496
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>            Reporter: Adriano
>            Priority: Major
>              Labels: admission-control, scheduler
>
> Environment description: cluster scale (50/100/150 nodes and terabyte of ram 
> available) - Admission Control enabled.
> Issue description:
> Despite the coordinator chosen (with data and statistics unchanged) a query 
> will be planned always in the same way based on the metainfo that the 
> coordinator have.
> The query will be scheduled always on the same nodes if the memory 
> requirements for the admission are satisfied:
> https://github.com/cloudera/Impala/blob/cdh5-2.7.0_5.9.1/be/src/scheduling/admission-controller.cc#L307-L333
> Equal queries are planned/scheduled always in the same way (to hit always the 
> same nodes). 
> This often lead to queue the queries that are hitting the same nodes are 
> queued (not admitted) as on those nodes there's no more memory available 
> within its process limit despite the pool have lot of free memory and the 
> overall cluster load is low.
> When the plan is finished and the query can be evaluated to be admitted often 
> happen that the admission is denied because one of the node have not enough 
> memory to run the query operation (and the query is moved in the pool queue) 
> despite the cluster have 50/100/150 nodes and terabyte of ram available.
> Why the scheduler does not take in consideration the memory available on the 
> nodes involved in the query before to buid the schedule, (maybe preferring a 
> remote read/operation on a free memory node instead to include in the plan 
> always the same nodes that will end to be:
> 1- overloaded
> 2- the query will be not immediately admitted, risking to be timedout in the 
> pool queue
> Since 2.7 the REPLICA_PREFERENCE can possibly help, but it's not good enough 
> as it does not prevent the scheduler to choose busy nodes (with the same 
> potential effect: query queued for lack of resource on specific node despite 
> there are terabytes of free memory).
> Feature Request:
> It would be good if Impala had an option to execute queries (even with worse 
> performance) excluding the nodes overloaded and including different nodes in 
> order to get the query immediately admitted and executed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to