[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870835#comment-13870835
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
-----------------------------------------------

The Oozie server is responsible for obtaining all the tokens the main job may 
need:

* tokens to run the job (working dir, jobtokens)
* tokens for the Input and Output data (typically HDFS tokens, but they can be 
for different file systems, for Hbase, for HCatalog, etc).

For the typical case of running an MR job (directly or via Pig/Hive), the 
tokens of launcher job are sufficient for the main job. They just need to be 
propagated. The Oozie server makes sure the 
"mapreduce.job.complete.cancel.delegation.tokens" property is set to FALSE for 
the launcher job (Oozie gets rid of the launcher job for MR jobs once the main 
job is running).

For scenarios where the main job needs to interact with different services, 
Oozie must acquire them in advance. For HDFS this is done by simply setting the 
"MRJobConfig.JOB_NAMENODES" property, then the launcher job submission will get 
those tokens. For Hbase or HCatalog, Oozie has a CredentialsProvider that 
obtains those tokens (the requirement here is that Oozie is configured as proxy 
user in those services in order to get tokens for the user submitting the job).

>From what it seems you are after generalizing this. If think we should do it 
>with a slightly twist from what you are proposing:

* DelegationTokens should be always requested by the client, security enabled 
or not, computing the splits on the client or not.
* DelegationTokens fetching should be done regardless of the IF/OF 
implementation (take the case of talking with Hbase or HCatalog, job working 
dir service).
* DelegationTokens fetching should not be tied to split computation.

We could have a utility class that we pass a UGI, list of service URIs and 
returns a populated Credentials with tokens for all the specified services.

The IF/OF/Job would have to be able to extract the required URIs for the job.

Also, this mechanism could be used to obtain ALL tokens the AM needs.


> Add an interface to Input/Ouput Formats to obtain delegation tokens
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5663
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Michael Weng
>         Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
> MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
> MAPREDUCE-5663.patch.txt3
>
>
> Currently, delegation tokens are obtained as part of the getSplits / 
> checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
> This works as long as the splits are generated on a node with kerberos 
> credentials. For split generation elsewhere (AM for example), an explicit 
> interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to