[ https://issues.apache.org/jira/browse/HUDI-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
leesf closed HUDI-539. ---------------------- > RO Path filter does not pick up hadoop configs from the spark context > --------------------------------------------------------------------- > > Key: HUDI-539 > URL: https://issues.apache.org/jira/browse/HUDI-539 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core > Affects Versions: 0.5.1 > Environment: Spark version : 2.4.4 > Hadoop version : 2.7.3 > Databricks Runtime: 6.1 > Reporter: Sam Somuah > Assignee: Vinoth Chandar > Priority: Major > Labels: bug-bash-0.6.0, pull-request-available > Fix For: 0.6.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Hi, > I'm trying to use hudi to write to one of the Azure storage container file > systems, ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file > schemes. The issue I'm facing is that in {{HoodieROTablePathFilter}} it tries > to get a file path passing in a blank hadoop configuration. This manifests as > {{java.io.IOException: No FileSystem for scheme: abfss}} because it doesn't > have any of the configuration in the environment. > The problematic line is > [https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96] > > {code:java} > Stacktrace > java.io.IOException: No FileSystem for scheme: abfss > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96) > at > org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)