[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969374#comment-15969374
 ] 

Michael Gummelt edited comment on MAPREDUCE-6876 at 4/14/17 6:42 PM:
---------------------------------------------------------------------

bq. The input format must obtain the necessary tokens for the tasks to be able 
to access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just return split information.  It don't create 
tasks.  So it shouldn't need to fetch delegation tokens.  That should be the 
responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.


was (Author: mgummelt):
> The input format must obtain the necessary tokens for the tasks to be able to 
> access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just return split information.  It don't create 
tasks.  So it shouldn't need to fetch delegation tokens.  That should be the 
responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-6876
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to