[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592379#comment-13592379
 ] 

Jason Lowe commented on MAPREDUCE-5042:
---------------------------------------

Sorry, I was wrong.  It appears it will happen without security as well.  The 
problem is that the job token is rolled from scratch each time the AM starts 
up, so the subsequent AM attempt has no idea what job token was used by the 
previous attempt.  My non-secure cluster was only one node, and any node that 
launches a container for the new AM attempt will smash the old shuffle token 
with the new one.  Any node that only ran tasks for the old AM attempt will 
report shuffle verification failures from reduce tasks launched by the new AM 
attempt.
                
> Reducer unable to fetch for a map task that was recovered
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-5042
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, security
>    Affects Versions: 0.23.7, 2.0.4-beta
>            Reporter: Jason Lowe
>            Priority: Blocker
>
> If an application attempt fails and is relaunched the AM will try to recover 
> previously completed tasks.  If a reducer needs to fetch the output of a map 
> task attempt that was recovered then it will fail with a 401 error like this:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_000016_0
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231)
>       at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156)
> {noformat}
> Looking at the corresponding NM's logs, we see the shuffle failed due to 
> "Verification of the hashReply failed".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to