[
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872542#comment-13872542
]
Alejandro Abdelnur commented on MAPREDUCE-5663:
-----------------------------------------------
bq. ... I’m not too sure about - mainly from the perspective of services not
handling getToken requests correctly if security is disabled
We are moving away from this, in Yarn we always use tokens, regardless of the
security configuration. Oozie needs tokens to be there in order to work
correctly.
bq. ... The JobClient currently doesn't do this, at least for HDFS.
Actually, yes it does do this if you set the {{MRJobConfig.JOB_NAMENODES}}
property, this is done in the {{JobSubmitter#populateTokenCache()}} method
which is called by {{JobSubmitter#submitJobInternal()}} which is called by
{{JobSubmitter#submit()}}. All this is done in the main execution path, thus
always done when doing a submit. It is independent of split computations.
bq. ... For HBase / HCatalog sources which are outside of the IF/OF for a MR
job - I don't think we have the capability for fetching tokens, and rely on the
user providing them up front.
Actually, we are fetching them upfront only because this was needed for MR
jobs, but MR shouldn’t be a special case. Oozie has the concept of
{{CredentialsProvider}} for this very same reason. And I think with this JIRA
we can fix this in a general case.
bq. ... Would this utility class know how to handle all kinds of URIs ?
Yes, based on registered handlers for different schemes, more on this follows.
My thinking on how to address this is to use the same pattern we are doing
today for loading/registering {{FileSystem}}, {{CompressionCodec}},
{{TokenRenewers}}, {{SecurityInfo}} implementations. Using JDK’s
{{ServiceLoader}} mechanism to load all available implementations of the
following interface:
{code}
/**
* Implementations must be thread-safe.
*/
public interface CredentialsProvider {
/**
* Reports the scheme being supported by this provider.
*/
public String getScheme();
/**
* Obtains delegations tokens for the provided URIs.
*
* @param conf configuration used to initialize the components that connect to
the specified URIs.
* @param uris URIs of services to obtain delegation tokens from.
* @ param targetCredentials credentials to add the fetched delegation tokens.
*/
public void obtainCredentials(Configuration conf, URI[] uris, Credentials
targetCredentials) throws IOException;
{code}
Then we would have a {{CredentialsProvider}} class that would use a
{{ServiceLoader}} to load all credentials available in the classpatch (via the
ServiceLoader mechanism, the nice thing about this is that you drop a JAR file
with a service implementation and you don’t have to configure anything, it just
works provided you have the META-INF/services/... file for it). This would be
done in a class static block initialization.
the {{CredentialsProvider}} would have a static method
{{fetchCredentials(Configuration, URI[], Credentials)}} which sorts out the
URIs by scheme and then invokes the corresponding {{CredentialsProvider}} impl
for it.
Then the different Yarn applications define a property in the conf to indicate
the URIs of the services to get tokens and their client submission code does it
(like the {{JobSubmitter}} does with {{MRJobConfig.JOB_NAMENODES}} but in a
general way. Frameworks may chose to be smarter (in the case of MR get the URIS
from the splits an the output dir and get the tokens automatically).
> Add an interface to Input/Ouput Formats to obtain delegation tokens
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-5663
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Michael Weng
> Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt,
> MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2,
> MAPREDUCE-5663.patch.txt3
>
>
> Currently, delegation tokens are obtained as part of the getSplits /
> checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
> This works as long as the splits are generated on a node with kerberos
> credentials. For split generation elsewhere (AM for example), an explicit
> interface is required.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)