[jira] [Commented] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files: FileNotFoundException when reading a parquet file

Timothy Farkas (JIRA) Mon, 27 Aug 2018 22:03:50 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594523#comment-16594523
 ]


Timothy Farkas commented on DRILL-6609:
---------------------------------------

[~shamirwasia] I still think there is an issue here. In the repro done by Chun 
the files were stored on a distributed filesystem. Since they are on a 
distributed filesystem they are never accessible on the local file system. So 
if the local filesystem is being used as the location to read the parquet 
files, the query should fail in all cases. However, the query only fails in the 
particular circumstance described in the summary. If you are interested in more 
details about the issue I can share more info with you offline.

However, after thinking about it again 
HiveDrillNativeParquetRowGroupScan.getFsConf should never return the value of 
fs.default.name configured in the HiveStoragePlugin itself, it should only 
return the fs.default.name saved for the table in the HiveMetaStore! I'm not 
sure if this would be the cause of this particular issue, but I'm again 
starting to think that there is something wrong with the logic in 
HiveDrillNativeParquetRowGroupScan.getFsConf.

> Investigate Creation of FileSystem Configuration for Hive Parquet Files: 
> FileNotFoundException when reading a parquet file
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6609
>                 URL: https://issues.apache.org/jira/browse/DRILL-6609
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>
> Currently when reading a parquet file in Hive we try to speed things up by 
> doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When 
> retrieving the FileSystem Configuration to use in 
> HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined 
> for the HiveStoragePlugin. This could cause a misconfiguration in the 
> HiveStoragePlugin to influence the configuration of our FileSystem.
> Currently it is unclear if this was desired behavior or not. If it is desired 
> we need to document why it was done. If it is not desired we need to fix the 
> issue.
> This may be the root cause of the issue discovered by chun
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files: FileNotFoundException when reading a parquet file

Reply via email to