Re: How to deploy Hudi

Vinoth Chandar Wed, 02 Oct 2019 22:27:51 -0700

Hi Qian,

You are right on the choice of tools for 2 and 3. But for 1, if you want to
do a 1-time bulk load, you can look into options on the migration guide
http://hudi.apache.org/migration_guide.html (HiveSyncTool is orthogonal to
this, it simply registers a Hudi dataset to Hive metastore)


On your questions
1. You need the appropriate hudi bundle jar to write data
http://hudi.apache.org/writing_data.html . For reading also, there are
similar instructions depending on query engine and yes, you would copy a
bundle jar and install it.
2. You can choose to use Hudi without HiveMetastore and it will give you
access to ReadOptimized and Incremental Views (Not realtime view, that
needs Hive atm).  Hudi can use Hive JDBC to talk to metastore if thats what
you are asking.
3. Hudi saves metadata on a special .hoodie folder on your DFS itself. Its
usef for building features like incremental pull

Hope that helps


On Wed, Oct 2, 2019 at 3:12 PM Qian Wang <[email protected]> wrote:

> Hi Kabeer,
>
> I plan to do an incremental query PoC. My use case including:
>
> 1. Load one big Hive table located in HDFS to Hudi as a history table (I
> think should use HiveSyncTool)
> 2. Sink streaming data from Kafka to  Hudi as real time table(use
> HoodieDeltaStreamer?)
> 3. Join both of two table get the incremental metrics (Spark SQL?)
>
> My questions:
>
> 1. Do I just copy the Hudi packages to the server client for deployment?
> 2. Does Hudi must require access to HiveMetastore? My company has
> restricted to access HiveMetastore? Can Hudi use Hive JDBC to get metadata?
> 3. What is the HoodieTableMeta use for? Where is the HoodieTableMeta saved?
>
>
> Best,
> Qian
> On Oct 2, 2019, 2:59 PM -0700, Kabeer Ahmed <[email protected]>, wrote:
> > Qian
> >
> > Welcome!
> > Are you able to tell us a bit more about your use case? Eg: type of the
> project, industry, complexity of the pipeline that you plan to write (eg:
> pulling data from external APIs like New York taxi dataset and writing them
> into Hive for analysis) etc.
> > This will give us a bit more context.
> > Thanks
> > Kabeer.
> >
> > On Oct 2 2019, at 10:55 pm, Vinoth Chandar <[email protected]> wrote:
> > > edit:
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed
> ?
> > > with the ? at the end
> > >
> > > On Wed, Oct 2, 2019 at 2:54 PM Vinoth Chandar <[email protected]>
> wrote:
> > > > Hi Qian,
> > > > Welcome! Does
> > > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed
> ?
> > > > help ?
> > > >
> > > >
> > > > On Wed, Oct 2, 2019 at 10:18 AM Qian Wang <[email protected]>
> wrote:
> > > > > Hi,
> > > > > I am new to Apache Hudi. Currently I am working on a PoC using
> Hudi and
> > > > > anyone can give me some documents what how to deploy Apache Hudi?
> Thanks.
> > > > >
> > > > > Best,
> > > > > Eric
> > > >
> > >
> > >
> >
>

Re: How to deploy Hudi

Reply via email to