[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated MAPREDUCE-5489:
-----------------------------------

    Attachment: MAPREDUCE-5489.1.patch

I've created the patch to make AM send blacklist nodes to RM. Basically the 
logical is described as follows:

1. Add blacklistAdditions and blacklistRemovals to remember the blacklisted 
nodes added or removed between two allocate calls. The two collections will be 
sent to RM in upcoming allocate call.

2. Whenever a container fails on a host, the host will be blacklisted, and will 
add to blacklistAdditions if blacklist is not ignored.

3. When changing from not ignoring blacklist to ignoring, we added all the 
blacklist nodes  to blacklistRemovals.

4. When changing from ignoring blacklist to not ignoring, we added all the 
blacklist nodes  to blacklistAdditions.

5.  Switching between ignoring and not ignoring blacklist nodes will not effect 
until the upcoming allocate call, but anyway, it will effect eventually.

Test cases have been modified test whether RM is aware of the blacklisted nodes.

> MR jobs hangs as it does not use the node-blacklisting feature in RM requests
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Zhijie Shen
>         Attachments: MAPREDUCE-5489.1.patch
>
>
> When RM restarted, if during restart one NM went bad (bad disk), NM got 
> blacklisted by AM and RM keeps giving the containers on the same node even 
> though AM doesn't want it there.
> Need to change AM to specifically blacklist node in the RM requests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to