[ 
https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850096#comment-16850096
 ] 

Dhruve Ashar commented on SPARK-24149:
--------------------------------------

Thanks for the missing context.

The current behavior doesn't seem to address the use case mentioned. We could 
have a table which is partitioned across a different namespace which is on a 
different cluster (NN). In this case the user has to know about the underlying 
namespace and their corresponding NN to get the tokens.

The current logic seems to address only a specific use case where a table is 
stored across multiple namespaces (configured without viewfs) and in this case 
they luckily happen to be on the same cluster (using HDFS federation). What if 
these are on a different cluster?

I would expect that if data for a given table is to be stored across different 
namespaces, then these namespaces be related and addressed using viewfs. This 
has the advantage of getting the tokens for all the NN the data resides on 
irrespective of the use case if the partitions happen to reside on the same or 
a different cluster and is much better from a user transparency standpoint as 
well, since it covers all the use cases.

> Automatic namespaces discovery in HDFS federation
> -------------------------------------------------
>
>                 Key: SPARK-24149
>                 URL: https://issues.apache.org/jira/browse/SPARK-24149
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.4.0
>            Reporter: Marco Gaido
>            Assignee: Marco Gaido
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> Hadoop 3 introduced HDFS federation.
> Spark fails to write on different namespaces when Hadoop federation is turned 
> on and the cluster is secure. This happens because Spark looks for the 
> delegation token only for the defaultFS configured and not for all the 
> available namespaces. A workaround is the usage of the property 
> {{spark.yarn.access.hadoopFileSystems}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to