[ 
https://issues.apache.org/jira/browse/HADOOP-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696382#action_12696382
 ] 

Amar Kamat commented on HADOOP-3578:
------------------------------------

Here is the proposal :

_Terms :_
# mapred.system.dir : the common location where the users (jobclient) uploads 
job files (job split and job jars). This dir will have rwx-w--w- permissions.
# mapred.system.dir/jobtracker : jobtracker's private scratch space with 
rwx------ permissions. This is the place where the jobtracker moves files upon 
successful job submission (upload + validation).

The process of job submission is as follows
# jobclient/user asks jobtracker for a new jobid
# jobclient generates a new x digit random number and upload the job files 
(split and jar) to mapred.system.dir/jobid-random-number
# jobclient/user pass this information and the jobconf to the jobtracker via 
the rpc (submitJob api). 
# jobtracker loads the conf via the rpc, does the acls check and only then the 
job is *accepted* (moved to mapred.system.dir/jobtracker)
# jobtracker serializes the job.xml (changing the location of split and jar 
file info in the conf)  to mapred.system.dir/jobtracker/jobid, moves job.jar 
and job.split to mapred.system.dir/jobtracker/jobid (this is imp for 
tasktracker rely on the information in the conf for job.jar and job.split). 
# Upon restart all the jobs that are present in mapred.system.dir/jobtracker/ 
will be blindly loaded and jobs in mapred.system.dir/ will be queued for 
cleanup.

_Benefits :_
# guessing job-dir will be hard as random number will be appended 
# separation between faulty jobs (jobs failing on access etc) and accepted jobs 
will be clear (helps in recovery)
# jobtracker system dir will be clean and cannot be garbled 
# jobconf need not be read from fs as it wil be passed via rpc, this helps in 
making quick decisions whether the job is faulty or not
# re-initing jobtracker is as simple as deleting jobtracker's system.dir 
(mapred.system.dir/jobtracker) without touching the mapred.system.dir

_Questions :_
# Should default api assume that the job.xml, job.jar and job.xml are still 
present in mapred.system.dir/jobid?

----
Thoughts? Comments?

> mapred.system.dir should be accessible only to hadoop daemons 
> --------------------------------------------------------------
>
>                 Key: HADOOP-3578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3578
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to