Hi Tim,

I don't have an answer. But, I can point out some factors to consider.

Hive describes a set of data in a specific file system. Would make sense to 
associate that file system with the Hive configuration. Else, I could use a 
Hive metastore for FS A, with a DFS configured for FS B, and have nothing work 
for reasons that would be hard to figure out.

Further, isn't Hive its own storage plugin, and thus would be referenced as, 
say, "myHive.customers"? What would be the implied relationship between the 
Hive plugin config and the DFS plugin config?

Suppose I had two Hive plugin configs, Hive1 and Hive2. And, two DFS configs: 
DFS1 and DFS2. What is the implied relationship (if any) between Hive1 and 
either DFS1 or DFS2? Between Hive2 and DFS1 or DFS2?

Given these ambiguities, it would seem to explain why Hive's HDFS URL is 
configured with Hive and is distinct from other a similar HDFS URL defined for 
DFS.

Can you suggest a way to avoid duplication and link the two? Perhaps, in Hive 
config, name a DFS config rather than duplicating the HDFS config for Hive?

Thanks,
- Paul

 

    On Wednesday, August 22, 2018, 4:41:37 PM PDT, Timothy Farkas 
<[email protected]> wrote:  
 
 Hi All,

I'm a bit confused and I was hoping to get some clarification about how the
HiveStoragePlugin interacts with the FileSystem plugin. Currently the
HiveStoragePlugin allows the user to configure their own value for
fs.defaultFS in the plugin properties, which overrides the defaultFS used
when doing a native parquet scan for Hive. Is this intentional? Also what
is the high level theory about how Hive and the FileSystem plugins
interact? Specifically does Drill support querying Hive when Hive is using
a different FileSystem than the one specified in the file system plugin? Or
does Drill assume that the Hive is using the same FileSystem as the one
defined in the Drill FileSystem plugin?

Thanks,
Tim
  

Reply via email to