[
https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655207#action_12655207
]
Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------
I had an offline discussion with Devaraj regarding the implementation, and we
also went over the impact this would have when clubbed with JVM reuse.
A few comments from him that I am documenting here:
- Task directories under the tasktracker system or root directory to which
files (such as intermediate outputs) are copied after task completion should be
in the same disk as the original user's task directories. This is to prevent
across disk copies.
- Regarding the problem of serving log outputs which I've mentioned
[here|#action_12653375], we discussed one approach could be to have a command
in the executable to read the data and return to the TaskLogServlet on demand.
This would happen reasonably rarely and does not affect any other
functionality. Hence it seems like the performance overhead can be ignored.
- Another comment was to reduce the number of times the executable is launched.
For e.g. *without* JVM reuse, I can setup the directories, run the task, and
then move the outputs with a single launch of the executable. This is possible
because all actions are per task, and there is one JVM per task. Hence the
lifecycle of the task fits well with the setuid changes.
With JVM reuse though, the last point becomes problematic. We can easily setup
the directories and move the output before and after the task. However, that
needs to be done with a separate launch of the executable - three times
actually. The performance impact this would have (and would it offset the
advantage of JVM reuse) is something to measure and see.
> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
> Key: HADOOP-4490
> URL: https://issues.apache.org/jira/browse/HADOOP-4490
> Project: Hadoop Core
> Issue Type: Sub-task
> Components: mapred, security
> Reporter: Arun C Murthy
> Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them
> running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.