[ https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495731#comment-16495731 ]
ASF GitHub Bot commented on DRILL-5365: --------------------------------------- ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem setConf in constructor. DrillFileSystem c… URL: https://github.com/apache/drill/pull/796#discussion_r191933248 ########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java ########## @@ -74,6 +74,7 @@ public FileSystemPlugin(FileSystemConfig config, DrillbitContext context, String fsConf.set(s, config.config.get(s)); } } + fsConf.set("fs.default.name", config.connection); Review comment: Okay I think I have a better handle on this now. The original issue was that Drill's hive storage plugin had a configuration option of fs.default.name = file:// . Somehow when a hive table was dropped and then recreated with a ctas statement in drill, the CTAS statement picked up the fs.default.name configuration from the hive storage plugin and passed that on to DrillFileSystem. And apparently if both **fs.default.name** and **fs.defaultFS** are present with different values the value for **fs.default.name** wins even though it is deprecated. So the CTAS statement would end up creating the table on a drill node's local filesystem. I believe the crux of this PR is to force "fs.default.name" to have to correct value in the event that a different value is defined in the HiveStorage plugin. With that said, there are several questions. 1. How the heck does a property in the HiveStoragePlugin make it's way into the FileSystem configuration? I spent a good amount of time looking at the code and for the life of me I can't figure that out. 2. The follow up to (1) is do we actually want that behavior? We can force fs.default.name to have the right value but what about other properties we might suck in from a HiveStoragePlugin configuration? 3. If we don't want this behavior what would be the real fix? In the face of all this ambiguity I think we should move forward with a minimal PR that forces fs.default.name to be correct now. We can have a follow up Jira that actually fixes the underlying problem of sucking in stray configs down the road if someone complains about it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > FileNotFoundException when reading a parquet file > ------------------------------------------------- > > Key: DRILL-5365 > URL: https://issues.apache.org/jira/browse/DRILL-5365 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive > Affects Versions: 1.10.0 > Reporter: Chun Chang > Assignee: Timothy Farkas > Priority: Major > Fix For: 1.14.0 > > > The parquet file is generated through the following CTAS. > To reproduce the issue: 1) two or more nodes cluster; 2) enable > impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; > 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; > 6) ctas from a large enough hive table as source to recreate the table/file; > 7) query the table from node A should work; 8) query from node B as same user > should reproduce the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)