[ https://issues.apache.org/jira/browse/DRILL-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594523#comment-16594523 ]
Timothy Farkas commented on DRILL-6609: --------------------------------------- [~shamirwasia] I still think there is an issue here. In the repro done by Chun the files were stored on a distributed filesystem. Since they are on a distributed filesystem they are never accessible on the local file system. So if the local filesystem is being used as the location to read the parquet files, the query should fail in all cases. However, the query only fails in the particular circumstance described in the summary. If you are interested in more details about the issue I can share more info with you offline. However, after thinking about it again HiveDrillNativeParquetRowGroupScan.getFsConf should never return the value of fs.default.name configured in the HiveStoragePlugin itself, it should only return the fs.default.name saved for the table in the HiveMetaStore! I'm not sure if this would be the cause of this particular issue, but I'm again starting to think that there is something wrong with the logic in HiveDrillNativeParquetRowGroupScan.getFsConf. > Investigate Creation of FileSystem Configuration for Hive Parquet Files: > FileNotFoundException when reading a parquet file > -------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-6609 > URL: https://issues.apache.org/jira/browse/DRILL-6609 > Project: Apache Drill > Issue Type: Task > Reporter: Timothy Farkas > Assignee: Timothy Farkas > Priority: Major > > Currently when reading a parquet file in Hive we try to speed things up by > doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When > retrieving the FileSystem Configuration to use in > HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined > for the HiveStoragePlugin. This could cause a misconfiguration in the > HiveStoragePlugin to influence the configuration of our FileSystem. > Currently it is unclear if this was desired behavior or not. If it is desired > we need to document why it was done. If it is not desired we need to fix the > issue. > This may be the root cause of the issue discovered by chun > To reproduce the issue: 1) two or more nodes cluster; 2) enable > impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; > 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; > 6) ctas from a large enough hive table as source to recreate the table/file; > 7) query the table from node A should work; 8) query from node B as same user > should reproduce the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)