Hi Tim, Some comments from me.
*HiveStoragePlugin* *fs.defaultFS *is Hive specific property. This is the URI used by Hive Metastore to point where tables are placed. There is no need to specify this property, if default value from *core-site.xml* is acceptable, see more: https://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/core-default.xml *Hive Native readers. * Currently Drill has two Hive Native readers: Parquet and MapR Json. Both of them use appropriate default File Format Plugins. It is a limitation and there is no way for now to change FormatPlugins config for them. There is Jira ticket for it: https://issues.apache.org/jira/browse/DRILL-6621 Kind regards Vitalii On Thu, Aug 23, 2018 at 3:02 AM Paul Rogers <[email protected]> wrote: > Hi Tim, > > I don't have an answer. But, I can point out some factors to consider. > > Hive describes a set of data in a specific file system. Would make sense > to associate that file system with the Hive configuration. Else, I could > use a Hive metastore for FS A, with a DFS configured for FS B, and have > nothing work for reasons that would be hard to figure out. > > Further, isn't Hive its own storage plugin, and thus would be referenced > as, say, "myHive.customers"? What would be the implied relationship between > the Hive plugin config and the DFS plugin config? > > Suppose I had two Hive plugin configs, Hive1 and Hive2. And, two DFS > configs: DFS1 and DFS2. What is the implied relationship (if any) between > Hive1 and either DFS1 or DFS2? Between Hive2 and DFS1 or DFS2? > > Given these ambiguities, it would seem to explain why Hive's HDFS URL is > configured with Hive and is distinct from other a similar HDFS URL defined > for DFS. > > Can you suggest a way to avoid duplication and link the two? Perhaps, in > Hive config, name a DFS config rather than duplicating the HDFS config for > Hive? > > Thanks, > - Paul > > > > On Wednesday, August 22, 2018, 4:41:37 PM PDT, Timothy Farkas < > [email protected]> wrote: > > Hi All, > > I'm a bit confused and I was hoping to get some clarification about how the > HiveStoragePlugin interacts with the FileSystem plugin. Currently the > HiveStoragePlugin allows the user to configure their own value for > fs.defaultFS in the plugin properties, which overrides the defaultFS used > when doing a native parquet scan for Hive. Is this intentional? Also what > is the high level theory about how Hive and the FileSystem plugins > interact? Specifically does Drill support querying Hive when Hive is using > a different FileSystem than the one specified in the file system plugin? Or > does Drill assume that the Hive is using the same FileSystem as the one > defined in the Drill FileSystem plugin? > > Thanks, > Tim >
