[
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649065#action_12649065
]
Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------
Before beginning discussions on approach, I wanted to summarize my
understanding of this task, and also start discussion on a few points that I
have some questions on.
The following are some salient points:
# We want to run tasks as the user who submitted the job, rather than as the
user running the daemon.
# I think we also don't want to run the daemon as a privileged user (such as
root) in order to solve this requirement. Right ?
# The directories and files used by the task should have appropriate
permissions. Currently these directories and files are mostly created by the
daemons, but used by the task. A few are used/accessed by the daemons also.
Some of these directories and files are the following:
## mapred.local.dir/taskTracker/archive - directories containing distributed
cache archives
## mapred.local.dir/taskTracker/jobcache/$jobid/ - Include work (which is a
scratch space), jars (containing the job jars), job.xml.
## mapred.local.dir/taskTracker/jobcache/$jobid/$taskid - Include job.xml,
output (intermediate files), work (current working dir) and temp (work/tmp)
directories for the task.
## mapred.local.dir/taskTracker/pids/$taskid - Written by the shell launching
the task, but read by the daemons.
# What should 'appropriate' permissions mean ? I guess read/write/execute (on
directories) for the owner of the job is required. What should the permissions
be for others ? If the task is the only consumer, then the permissions for
others can be turned off. However, there are cases where the daemon / other
processes might read the files. For instance:
## The distributed cache files can be shared across jobs.
## Jetty seems to require read permissions on the intermediate files to serve
them to the reducers.
In the above cases, can we make these world readable ?
## Task logs are currently generated under ${hadoop.log.dir}/userlogs/$taskid.
These are served from the TaskLogServlet of the TaskTracker.
# Apart from launching the task itself, we may need some other actions to be
performed as the job owner. For instance:
## Killing of a task
## Maybe setting up and cleaning up of the directories / files
## Running the debug script - {{mapred.map|reduce.task.debug.script}}
Is there anything that I am missing ? Comments on the questions of shared
directories / files - distributed cache, intermediate outputs, log files ?
> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
> Key: HADOOP-4490
> URL: https://issues.apache.org/jira/browse/HADOOP-4490
> Project: Hadoop Core
> Issue Type: Sub-task
> Components: mapred, security
> Reporter: Arun C Murthy
> Assignee: Hemanth Yamijala
> Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.