paul mackles created SPARK-22528:
------------------------------------

             Summary: History service and non-HDFS filesystems
                 Key: SPARK-22528
                 URL: https://issues.apache.org/jira/browse/SPARK-22528
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: paul mackles
            Priority: Minor


We are using Azure Data Lake (ADL) to store our event logs. This worked fine in 
2.1.x but in 2.2.0, the event logs are no longer visible to the history server. 
I tracked it down to the call to:

{code}
SparkHadoopUtil.get.checkAccessPermission()
{code}

which was added to "FSHistoryProvider" in 2.2.0.

I was able to workaround it by:
* setting the files to world readable
* setting HADOOP_PROXY to the Azure objectId of the service principal that owns 
file

Neither of those workaround are particularly desirable in our environment. That 
said, I am not sure how this should be addressed:
* Is this an issue with the Azure/Hadoop bindings not setting up the user 
context correctly so that the "checkAccessPermission()" call succeeds w/out 
having to use the username under which the process is running?
* Is this an issue with "checkAccessPermission()" not really accounting for all 
of the possible FileSystem implementations? If so, I would imagine that there 
are similar issues with using S3.

In spite of this check, I know the files are accessible through the underlying 
FileSystem object so it feels like the latter but I don't that the FileSystem 
object alone could be used to implement this check.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to