[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

Jason Lowe (JIRA) Fri, 14 Feb 2014 11:26:33 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901821#comment-13901821
 ]


Jason Lowe commented on MAPREDUCE-5641:
---------------------------------------

bq. Do you have any alternatives on how to allow the JHS to have access to 
those files?

Outside of imposing new restrictions on where the staging directory can be and 
how it has to be configured, no I don't know of an easy way to do that.  To 
allow the JHS to access these files, we'd minimally have to require the user 
directories in the staging area to have their group set to the "hadoop" group 
(see 
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
 for details on that group) and have permissions of 0750 all the way down to 
the specific staging directory for a job.  Read permission is required so the 
history server can scan for the proper jhist file to grab, since a job with 
multiple AM attempts means the JHS can't just know what the name of the correct 
JHS file is -- it would have to scan to see which is the latest.  That would 
relax the permissions on a user's staging files to include the hadoop group.  
That's probably OK and far better than letting everyone in, but I haven't 
thought through all of the security ramifications of doing so.

bq. Or to somehow get those files into the done_intermediate dir?

A proper way to do this would be to have something run by the user of the job 
do this, as that doesn't require any additional security beyond what's already 
done today.  However that probably involves adding the ability in YARN for a 
specified task to run when an application is failed/killed to cleanup after the 
unsuccessful run.  It's a non-trivial task, but it would also help solve the 
problem we have today where staging directories are leaked for applications 
that are killed before the AM launches.

> History for failed Application Masters should be made available to the Job 
> History Server
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5641
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, jobhistoryserver
>    Affects Versions: 2.2.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

Reply via email to