Hi Syed, as Vinoth mentioned, the HoodieSnapshotCopier is meant for this
purpose

You may also read more on the RFC-9, which plans to introduce a
backward-compatible tool to cover HoodieSnapshotCopier
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+09+%3A+Hudi+Dataset+Snapshot+Exporter
Unfortunately I'm not actively working on this. If you're interested free
feel to pick it up. I'd be happy to help with that.

On Wed, Feb 12, 2020 at 7:25 PM Vinoth Chandar <[email protected]> wrote:

> Hi Syed,
>
> Apologies for the delay.  If you are using copy-on-write, you can look into
> savepoints (although I realize its only exposed at the rdd api level).. We
> do have a tool called HoodieSnapshotCopier in hoodie-utilities, to take
> periodic copies/snapshots of a table for backup purposes, as of a given
> commit. Raymond (if you arr here) , has an RFC to enhance that even..
> Running the copier (please test it first, since its not used in OSS that
> much IIUC) periodically, say every day would achieve your goals I believe..
>
>
> https://github.com/apache/incubator-hudi/blob/c2c0f6b13d5b72b3098ed1b343b0a89679f854b3/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java
>
> Any issues in the tool would be simple to fix. Tool itself is couple
> hundred lines, that all.
>
> Thanks
> Vinoth
>
> On Mon, Feb 10, 2020 at 3:56 AM Syed Abdul Kather <[email protected]>
> wrote:
>
> > Yes. Also for restoring the data from cold storage.
> >
> > Use case here :
> > We stream data using debezium and push to Kafka we have retention in
> Kafka
> > as 7 days. In case the destination table created using the hudi got
> crashed
> > or we need to repopulate then we need a way that can help us restore the
> > data.
> >
> > Thanks and Regards,
> > S SYED ABDUL KATHER
> > *Data platform Lead @ Tathastu.ai*
> >
> > *+91 - 7411011661*
> >
> >
> > On Mon, Jan 13, 2020 at 10:17 PM Vinoth Chandar <[email protected]>
> wrote:
> >
> > > Hi Syed,
> > >
> > > If I follow correctly, are you asking how to do a bulk load first and
> > then
> > > use delta streamer on top of that dataset to apply binlogs from Kafka?
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Mon, Jan 13, 2020 at 12:39 AM Syed Abdul Kather <[email protected]
> >
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > We have on-board a few tables that have really huge number of records
> > > (100
> > > > M records ). The plan is like enable the binlog for database that is
> no
> > > > issues as stream can handle the load  . But for loading the snapshot
> .
> > We
> > > > have use sqoop to import whole table to s3.
> > > >
> > > > What we required here?
> > > >  Can we load the whole dump sqooped record to hudi table then we
> would
> > > use
> > > > the stream(binlog data comes vai kafka)
> > > >
> > > >             Thanks and Regards,
> > > >         S SYED ABDUL KATHER
> > > >          *Bigdata [email protected]*
> > > > *           +91-7411011661*
> > > >
> > >
> >
>

Reply via email to