Hi Paul / Vitalii Thanks for the info. I was asking about this because of https://issues.apache.org/jira/browse/DRILL-6609 in which some strange behavior was observed if the user defined fs.default.name in the HivePlugin config. I also saw that the filesystem specified in the HivePlugin config influences the FileSystem used for native scans. This happens because in HiveDrillNativeParquetRowGroupScan.getFsConf we use the HiveStoragePlugin to create the filesystem configuration, which is then used by DrillFileSystem.
However, based on your feedback it looks like this is desirable behavior, since the user may want to define a different filesystem for the HivePlugin along with different format plugins. Which means the root cause of https://issues.apache.org/jira/browse/DRILL-6609 is something else then. I'll probably abandon that issue at this point since it's not reproducible and I have no further leads as to what could cause it. Thanks, Tim On Thu, Aug 23, 2018 at 2:46 AM, Vitalii Diravka <[email protected]> wrote: > Hi Tim, > > Some comments from me. > > *HiveStoragePlugin* > *fs.defaultFS *is Hive specific property. This is the URI used by Hive > Metastore to point where tables are placed. There is no need to specify > this property, if default value from *core-site.xml* is acceptable, see > more: > https://urldefense.proofpoint.com/v2/url?u=https-3A__hadoop. > apache.org_docs_r3.1.0_hadoop-2Dproject-2Ddist_hadoop- > 2Dcommon_core-2Ddefault.xml&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r= > 4eQVr8zB8ZBff-yxTimdOQ&m=Y3D0V12MikEpxfG9ybUeW6KLgeJcCD > N8jXEur5IyORo&s=iJjg-o08kFjMfaxGHOZ9QAiTnk2KhkwPofQ3jEVjtyw&e= > > *Hive Native readers. * > Currently Drill has two Hive Native readers: Parquet and MapR Json. Both of > them use appropriate default File Format Plugins. It is a limitation and > there is no way for now to change FormatPlugins config for them. > There is Jira ticket for it: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. > apache.org_jira_browse_DRILL-2D6621&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r= > 4eQVr8zB8ZBff-yxTimdOQ&m=Y3D0V12MikEpxfG9ybUeW6KLgeJcCDN8jXEur5IyORo&s= > QDZyPZEwolNN1wu5z4QMwajvdQ3iQPPQ0yycxhUUKw0&e= > > > Kind regards > Vitalii > > > On Thu, Aug 23, 2018 at 3:02 AM Paul Rogers <[email protected]> > wrote: > > > Hi Tim, > > > > I don't have an answer. But, I can point out some factors to consider. > > > > Hive describes a set of data in a specific file system. Would make sense > > to associate that file system with the Hive configuration. Else, I could > > use a Hive metastore for FS A, with a DFS configured for FS B, and have > > nothing work for reasons that would be hard to figure out. > > > > Further, isn't Hive its own storage plugin, and thus would be referenced > > as, say, "myHive.customers"? What would be the implied relationship > between > > the Hive plugin config and the DFS plugin config? > > > > Suppose I had two Hive plugin configs, Hive1 and Hive2. And, two DFS > > configs: DFS1 and DFS2. What is the implied relationship (if any) between > > Hive1 and either DFS1 or DFS2? Between Hive2 and DFS1 or DFS2? > > > > Given these ambiguities, it would seem to explain why Hive's HDFS URL is > > configured with Hive and is distinct from other a similar HDFS URL > defined > > for DFS. > > > > Can you suggest a way to avoid duplication and link the two? Perhaps, in > > Hive config, name a DFS config rather than duplicating the HDFS config > for > > Hive? > > > > Thanks, > > - Paul > > > > > > > > On Wednesday, August 22, 2018, 4:41:37 PM PDT, Timothy Farkas < > > [email protected]> wrote: > > > > Hi All, > > > > I'm a bit confused and I was hoping to get some clarification about how > the > > HiveStoragePlugin interacts with the FileSystem plugin. Currently the > > HiveStoragePlugin allows the user to configure their own value for > > fs.defaultFS in the plugin properties, which overrides the defaultFS used > > when doing a native parquet scan for Hive. Is this intentional? Also what > > is the high level theory about how Hive and the FileSystem plugins > > interact? Specifically does Drill support querying Hive when Hive is > using > > a different FileSystem than the one specified in the file system plugin? > Or > > does Drill assume that the Hive is using the same FileSystem as the one > > defined in the Drill FileSystem plugin? > > > > Thanks, > > Tim > > >
