Hi Vinoth, As you said Hudi can use Hive JDBC to talk to metastore. Was this functionality added after version 0.4.5? Because I am getting this error Exception in thread "main" com.uber.hoodie.com.beust.jcommander.ParameterException: Was passed main parameter '--use-jdbc' but no main parameter was defined in your arg class
Thanks, Jaimin On Thu, 3 Oct 2019 at 10:56, Vinoth Chandar <[email protected]> wrote: > Hi Qian, > > You are right on the choice of tools for 2 and 3. But for 1, if you want to > do a 1-time bulk load, you can look into options on the migration guide > http://hudi.apache.org/migration_guide.html (HiveSyncTool is orthogonal to > this, it simply registers a Hudi dataset to Hive metastore) > > On your questions > 1. You need the appropriate hudi bundle jar to write data > http://hudi.apache.org/writing_data.html . For reading also, there are > similar instructions depending on query engine and yes, you would copy a > bundle jar and install it. > 2. You can choose to use Hudi without HiveMetastore and it will give you > access to ReadOptimized and Incremental Views (Not realtime view, that > needs Hive atm). Hudi can use Hive JDBC to talk to metastore if thats what > you are asking. > 3. Hudi saves metadata on a special .hoodie folder on your DFS itself. Its > usef for building features like incremental pull > > Hope that helps > > > On Wed, Oct 2, 2019 at 3:12 PM Qian Wang <[email protected]> wrote: > > > Hi Kabeer, > > > > I plan to do an incremental query PoC. My use case including: > > > > 1. Load one big Hive table located in HDFS to Hudi as a history table (I > > think should use HiveSyncTool) > > 2. Sink streaming data from Kafka to Hudi as real time table(use > > HoodieDeltaStreamer?) > > 3. Join both of two table get the incremental metrics (Spark SQL?) > > > > My questions: > > > > 1. Do I just copy the Hudi packages to the server client for deployment? > > 2. Does Hudi must require access to HiveMetastore? My company has > > restricted to access HiveMetastore? Can Hudi use Hive JDBC to get > metadata? > > 3. What is the HoodieTableMeta use for? Where is the HoodieTableMeta > saved? > > > > > > Best, > > Qian > > On Oct 2, 2019, 2:59 PM -0700, Kabeer Ahmed <[email protected]>, > wrote: > > > Qian > > > > > > Welcome! > > > Are you able to tell us a bit more about your use case? Eg: type of the > > project, industry, complexity of the pipeline that you plan to write (eg: > > pulling data from external APIs like New York taxi dataset and writing > them > > into Hive for analysis) etc. > > > This will give us a bit more context. > > > Thanks > > > Kabeer. > > > > > > On Oct 2 2019, at 10:55 pm, Vinoth Chandar <[email protected]> wrote: > > > > edit: > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed > > ? > > > > with the ? at the end > > > > > > > > On Wed, Oct 2, 2019 at 2:54 PM Vinoth Chandar <[email protected]> > > wrote: > > > > > Hi Qian, > > > > > Welcome! Does > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed > > ? > > > > > help ? > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:18 AM Qian Wang <[email protected]> > > wrote: > > > > > > Hi, > > > > > > I am new to Apache Hudi. Currently I am working on a PoC using > > Hudi and > > > > > > anyone can give me some documents what how to deploy Apache Hudi? > > Thanks. > > > > > > > > > > > > Best, > > > > > > Eric > > > > > > > > > > > > > > > > > > >
