[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495731#comment-16495731
 ] 

ASF GitHub Bot commented on DRILL-5365:
---------------------------------------

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191933248
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 ##########
 @@ -74,6 +74,7 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
           fsConf.set(s, config.config.get(s));
         }
       }
+      fsConf.set("fs.default.name", config.connection);
 
 Review comment:
   Okay I think I have a better handle on this now. The original issue was that 
Drill's hive storage plugin had a configuration option of fs.default.name = 
file:// . Somehow when a hive table was dropped and then recreated with a ctas 
statement in drill, the CTAS statement picked up the fs.default.name 
configuration from the hive storage plugin and passed that on to 
DrillFileSystem. And apparently if both **fs.default.name** and 
**fs.defaultFS** are present with different values the value for 
**fs.default.name** wins even though it is deprecated. So the CTAS statement 
would end up creating the table on a drill node's local filesystem.
   
   I believe the crux of this PR is to force "fs.default.name" to have to 
correct value in the event that a different value is defined in the HiveStorage 
plugin.
   
   With that said, there are several questions. 
   
    1. How the heck does a property in the HiveStoragePlugin make it's way into 
the FileSystem configuration? I spent a good amount of time looking at the code 
and for the life of me I can't figure that out.
    2. The follow up to (1) is do we actually want that behavior? We can force 
fs.default.name to have the right value but what about other properties we 
might suck in from a HiveStoragePlugin configuration?
    3. If we don't want this behavior what would be the real fix?
   
   In the face of all this ambiguity I think we should move forward with a 
minimal PR that forces fs.default.name to be correct now. We can have a follow 
up Jira that actually fixes the underlying problem of sucking in stray configs 
down the road if someone complains about it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -------------------------------------------------
>
>                 Key: DRILL-5365
>                 URL: https://issues.apache.org/jira/browse/DRILL-5365
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive
>    Affects Versions: 1.10.0
>            Reporter: Chun Chang
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to