[jira] [Commented] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files: FileNotFoundException when reading a parquet file

Sorabh Hamirwasia (JIRA) Mon, 27 Aug 2018 08:56:26 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593865#comment-16593865
 ]


Sorabh Hamirwasia commented on DRILL-6609:
------------------------------------------

So the _fs.default.name_ in HiveStoragePlugin is pointing to file:/// which is 
local file system. This means the table is only available on node A. Then why 
step 8 is an issue here ? There is no guarantee here that it will work since it 
will vary depending upon the node on which scan minor fragment is scheduled to 
run. If Scan minor fragments are located on node A then this will work 
otherwise it won't.

You should either use distributed file system for HiveStoragePlugin or copy the 
table on node A to all other nodes.

> Investigate Creation of FileSystem Configuration for Hive Parquet Files: 
> FileNotFoundException when reading a parquet file
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6609
>                 URL: https://issues.apache.org/jira/browse/DRILL-6609
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>
> Currently when reading a parquet file in Hive we try to speed things up by 
> doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When 
> retrieving the FileSystem Configuration to use in 
> HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined 
> for the HiveStoragePlugin. This could cause a misconfiguration in the 
> HiveStoragePlugin to influence the configuration of our FileSystem.
> Currently it is unclear if this was desired behavior or not. If it is desired 
> we need to document why it was done. If it is not desired we need to fix the 
> issue.
> This may be the root cause of the issue discovered by chun
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files: FileNotFoundException when reading a parquet file

Reply via email to