paul mackles created SPARK-22528: ------------------------------------ Summary: History service and non-HDFS filesystems Key: SPARK-22528 URL: https://issues.apache.org/jira/browse/SPARK-22528 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: paul mackles Priority: Minor
We are using Azure Data Lake (ADL) to store our event logs. This worked fine in 2.1.x but in 2.2.0, the event logs are no longer visible to the history server. I tracked it down to the call to: {code} SparkHadoopUtil.get.checkAccessPermission() {code} which was added to "FSHistoryProvider" in 2.2.0. I was able to workaround it by: * setting the files to world readable * setting HADOOP_PROXY to the Azure objectId of the service principal that owns file Neither of those workaround are particularly desirable in our environment. That said, I am not sure how this should be addressed: * Is this an issue with the Azure/Hadoop bindings not setting up the user context correctly so that the "checkAccessPermission()" call succeeds w/out having to use the username under which the process is running? * Is this an issue with "checkAccessPermission()" not really accounting for all of the possible FileSystem implementations? If so, I would imagine that there are similar issues with using S3. In spite of this check, I know the files are accessible through the underlying FileSystem object so it feels like the latter but I don't that the FileSystem object alone could be used to implement this check. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org