[jira] [Commented] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered

Vinod Kumar Vavilapalli (JIRA) Thu, 07 Mar 2013 11:57:38 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596289#comment-13596289
 ]


Vinod Kumar Vavilapalli commented on MAPREDUCE-5042:
----------------------------------------------------

In my prelim security work, I once had the JobClient generate the secret and 
then later had the MR AM generate the tokens and reupload the tokens file into 
the submit directory. That was another hop to DFS and we changed that since, 
but this recovery code bug fell through. So there are multiple solutions:
 - Have a single secret but let the client generate it
 - Have a single secret but upload the tokens file for future app-attempts
 - Have multiple tokens

It's future proof to separate the task and shuffle security secrets, but not 
sure that is tied in directly to this one if we consider the reupload solution.

I don't feel strongly about any solution, but one thing we should keep in mind 
is to move as much stuff into the AM so that the client is thinner and enables 
us to do submits via web services.
                
> Reducer unable to fetch for a map task that was recovered
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-5042
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, security
>    Affects Versions: 0.23.7, 2.0.5-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch
>
>
> If an application attempt fails and is relaunched the AM will try to recover 
> previously completed tasks.  If a reducer needs to fetch the output of a map 
> task attempt that was recovered then it will fail with a 401 error like this:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_000016_0
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231)
>       at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156)
> {noformat}
> Looking at the corresponding NM's logs, we see the shuffle failed due to 
> "Verification of the hashReply failed".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered

Reply via email to