[ 
https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847923#comment-16847923
 ] 

Dhruve Ashar commented on SPARK-24149:
--------------------------------------

 

Spark should be agnostic about the namenodes for which it should get the 
tokens. It is either hdfs which figures this out or the user who should specify 
this explicitly.

You should get tokens for only those namenodes which you are going to access. 
If the namespaces are related, viewfs does this for you. If they are unrelated 
the user explicitly provides them. Unrelated namespaces may or may not use HDFS 
federation and there can also be more namespaces in the federation which a user 
job might not access at all.

 

 

 

> Automatic namespaces discovery in HDFS federation
> -------------------------------------------------
>
>                 Key: SPARK-24149
>                 URL: https://issues.apache.org/jira/browse/SPARK-24149
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.4.0
>            Reporter: Marco Gaido
>            Assignee: Marco Gaido
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> Hadoop 3 introduced HDFS federation.
> Spark fails to write on different namespaces when Hadoop federation is turned 
> on and the cluster is secure. This happens because Spark looks for the 
> delegation token only for the defaultFS configured and not for all the 
> available namespaces. A workaround is the usage of the property 
> {{spark.yarn.access.hadoopFileSystems}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to