Hi Qian, You are right on the choice of tools for 2 and 3. But for 1, if you want to do a 1-time bulk load, you can look into options on the migration guide http://hudi.apache.org/migration_guide.html (HiveSyncTool is orthogonal to this, it simply registers a Hudi dataset to Hive metastore)
On your questions 1. You need the appropriate hudi bundle jar to write data http://hudi.apache.org/writing_data.html . For reading also, there are similar instructions depending on query engine and yes, you would copy a bundle jar and install it. 2. You can choose to use Hudi without HiveMetastore and it will give you access to ReadOptimized and Incremental Views (Not realtime view, that needs Hive atm). Hudi can use Hive JDBC to talk to metastore if thats what you are asking. 3. Hudi saves metadata on a special .hoodie folder on your DFS itself. Its usef for building features like incremental pull Hope that helps On Wed, Oct 2, 2019 at 3:12 PM Qian Wang <[email protected]> wrote: > Hi Kabeer, > > I plan to do an incremental query PoC. My use case including: > > 1. Load one big Hive table located in HDFS to Hudi as a history table (I > think should use HiveSyncTool) > 2. Sink streaming data from Kafka to Hudi as real time table(use > HoodieDeltaStreamer?) > 3. Join both of two table get the incremental metrics (Spark SQL?) > > My questions: > > 1. Do I just copy the Hudi packages to the server client for deployment? > 2. Does Hudi must require access to HiveMetastore? My company has > restricted to access HiveMetastore? Can Hudi use Hive JDBC to get metadata? > 3. What is the HoodieTableMeta use for? Where is the HoodieTableMeta saved? > > > Best, > Qian > On Oct 2, 2019, 2:59 PM -0700, Kabeer Ahmed <[email protected]>, wrote: > > Qian > > > > Welcome! > > Are you able to tell us a bit more about your use case? Eg: type of the > project, industry, complexity of the pipeline that you plan to write (eg: > pulling data from external APIs like New York taxi dataset and writing them > into Hive for analysis) etc. > > This will give us a bit more context. > > Thanks > > Kabeer. > > > > On Oct 2 2019, at 10:55 pm, Vinoth Chandar <[email protected]> wrote: > > > edit: > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed > ? > > > with the ? at the end > > > > > > On Wed, Oct 2, 2019 at 2:54 PM Vinoth Chandar <[email protected]> > wrote: > > > > Hi Qian, > > > > Welcome! Does > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed > ? > > > > help ? > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:18 AM Qian Wang <[email protected]> > wrote: > > > > > Hi, > > > > > I am new to Apache Hudi. Currently I am working on a PoC using > Hudi and > > > > > anyone can give me some documents what how to deploy Apache Hudi? > Thanks. > > > > > > > > > > Best, > > > > > Eric > > > > > > > > > > > > >
