Re: [Question] HiveStoragePlugin and NativeParquetRowGroupScan

Vitalii Diravka Thu, 23 Aug 2018 02:48:01 -0700

Hi Tim,

Some comments from me.


*HiveStoragePlugin*
*fs.defaultFS *is Hive specific property. This is the URI used by Hive
Metastore to point where tables are placed. There is no need to specify
this property, if default value from *core-site.xml* is acceptable, see
more:
https://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/core-default.xml

*Hive Native readers. *
Currently Drill has two Hive Native readers: Parquet and MapR Json. Both of
them use appropriate default File Format Plugins. It is a limitation and
there is no way for now to change FormatPlugins config for them.
There is Jira ticket for it:
https://issues.apache.org/jira/browse/DRILL-6621


Kind regards
Vitalii


On Thu, Aug 23, 2018 at 3:02 AM Paul Rogers <[email protected]>
wrote:

> Hi Tim,
>
> I don't have an answer. But, I can point out some factors to consider.
>
> Hive describes a set of data in a specific file system. Would make sense
> to associate that file system with the Hive configuration. Else, I could
> use a Hive metastore for FS A, with a DFS configured for FS B, and have
> nothing work for reasons that would be hard to figure out.
>
> Further, isn't Hive its own storage plugin, and thus would be referenced
> as, say, "myHive.customers"? What would be the implied relationship between
> the Hive plugin config and the DFS plugin config?
>
> Suppose I had two Hive plugin configs, Hive1 and Hive2. And, two DFS
> configs: DFS1 and DFS2. What is the implied relationship (if any) between
> Hive1 and either DFS1 or DFS2? Between Hive2 and DFS1 or DFS2?
>
> Given these ambiguities, it would seem to explain why Hive's HDFS URL is
> configured with Hive and is distinct from other a similar HDFS URL defined
> for DFS.
>
> Can you suggest a way to avoid duplication and link the two? Perhaps, in
> Hive config, name a DFS config rather than duplicating the HDFS config for
> Hive?
>
> Thanks,
> - Paul
>
>
>
>     On Wednesday, August 22, 2018, 4:41:37 PM PDT, Timothy Farkas <
> [email protected]> wrote:
>
>  Hi All,
>
> I'm a bit confused and I was hoping to get some clarification about how the
> HiveStoragePlugin interacts with the FileSystem plugin. Currently the
> HiveStoragePlugin allows the user to configure their own value for
> fs.defaultFS in the plugin properties, which overrides the defaultFS used
> when doing a native parquet scan for Hive. Is this intentional? Also what
> is the high level theory about how Hive and the FileSystem plugins
> interact? Specifically does Drill support querying Hive when Hive is using
> a different FileSystem than the one specified in the file system plugin? Or
> does Drill assume that the Hive is using the same FileSystem as the one
> defined in the Drill FileSystem plugin?
>
> Thanks,
> Tim
>

Re: [Question] HiveStoragePlugin and NativeParquetRowGroupScan

Reply via email to