[
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969374#comment-15969374
]
Michael Gummelt edited comment on MAPREDUCE-6876 at 4/14/17 6:48 PM:
---------------------------------------------------------------------
bq. The input format must obtain the necessary tokens for the tasks to be able
to access the input splits, and this is how FileInputFormat accomplishes that.
But the {{FileInputFormat}} is just fetching split information. It doesn't
create tasks. So it shouldn't need to fetch delegation tokens. That should be
the responsibility of the job submitting code.
As it is, client code that is just creating a {{FileInputFormat}} in order to
fetch split information, such as we do in Spark, wouldn't need to fetch
delegation tokens.
I'm not saying that delegation tokens aren't eventually needed for MapReduce
jobs, it's just that this seems like the wrong place to fetch them.
was (Author: mgummelt):
bq. The input format must obtain the necessary tokens for the tasks to be able
to access the input splits, and this is how FileInputFormat accomplishes that.
But the {{FileInputFormat}} is just fetching split information. It doesn't
create tasks. So it shouldn't need to fetch delegation tokens. That should be
the responsibility of the job submitting code.
As it is, client code that is just creating a {{FileInputFormat}} in order to
fetch split information, such as we do in Spark, wouldn't need to fetch
delegation tokens.
> FileInputFormat.listStatus should not fetch delegation tokens
> -------------------------------------------------------------
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens:
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary. {{listStatus}} doesn't delegate those tokens to
> another process. This is causing issues described in the attached Spark
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is
> used to fetch the delegation tokens, assumes that certain MapReduce
> configuration variables are set, which isn't true in the Spark calling code.
> This is a separate problem, but nonetheless it wouldn't have arisen if
> {{listStatus}} weren't fetching delegation tokens.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]