[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

Bilwa S T (Jira) Sat, 09 May 2020 10:21:17 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103381#comment-17103381
 ]


Bilwa S T commented on MAPREDUCE-7169:
--------------------------------------

Hi [~ahussein]

What we are trying to achieve here is speculative attempt shouldn't be launched 
on faulty node. So even if task gets killed there is no point launching it on 
that node as it will slow.This is expected behaviour
{quote} 
 * Assuming that a new speculative attempt is created. Following the 
implementation, the new attempt X will have blacklisted nodes and skipped racks 
relevant to the original taskAttempt Y
 * Assuming taskAttempt Y is killed before attempt X gets assigned.
 * The RMContainerAllocator would still assign a host to attemptX based on the 
dated blacklists.
 Is this the expected behavior? or it is supposed to clear attemptX' 
blacklisted nodes?{quote}
Yes i think these two cases should be handled
{quote} * Should that object be synchronized? I believe there are more than one 
thread reading/writing to that object. Perhaps changing 
{{taskAttemptToEventMapping}} to {{concurrentHashMap}} would be sufficient. 
What do you think?
{quote}* In {{taskAttemptToEventMapping}}, the data is only removed when the 
taskAttempt is assigned. If taskAttempt is killed before being assigned, 
{{taskAttemptToEventMapping}} would still have the taskAttempt.
{quote}{quote}
Will update this
{quote} * Racks are going to be black listed too. Not just nodes. I believe 
that the javadoc and description in default.xml should emphasize that enabling 
the flag also avoids the local rack unless no other rack is available for 
scheduling.{quote}
Actually when task attempt is killed by default Avataar is VIRGIN. this is 
defect which needs to be addressed. If speculative task attempt is killed it is 
launched as normal task attempt
{quote} * why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has 
a field called {{avataar}} ?{quote}
How do you get taskattempt details in RMContainerAllocator??
{quote} - That's a design issue. One would expect that RequestEvent's lifetime 
should not survive {{handle()}} call. Therefore, the metadata should be 
consumed by the handlers. In the patch, 
{{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. 
Then you won't need {{TaskAttemptBlacklistManager}} class.{quote}

> Speculative attempts should not run on the same node
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-7169
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: yarn
>    Affects Versions: 2.7.2
>            Reporter: Lee chen
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>           I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>          In our cluster （version 2.7.2，2700 nodes），this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

Reply via email to