Re: [DISCUSS] Hudi Reverse Streamer

Pratyaksh Sharma Thu, 30 Mar 2023 23:59:22 -0700

+1 to this.

I can help drive some of this work.


On Fri, Mar 31, 2023 at 10:09 AM Prashant Wason <[email protected]>
wrote:

> Could be useful. Also, may be useful for backup / replication scenario
> (keeping a copy of data in alternate/cloud DC).
>
> HoodieDeltaStreamer already has the concept of "sources". This can be
> implemented as a "sink" concept.
>
> On Thu, Mar 30, 2023 at 8:12 PM Vinoth Chandar <[email protected]> wrote:
>
> > Essentially.
> >
> > Old architecture :    (operational database) ==> some tool ==> (data
> > warehouse raw data) ==> SQL ETL ==> (data warehouse derived data)
> >
> > New architecture : (operational database) ==> Hudi delta Streamer ==>
> (Hudi
> > raw data) ==> Spark/Flink Hudi ETL ==> (Hudi derived data) ==> Hudi
> Reverse
> > Streamer ==> (Data Warehouse/Kafka/Operational Database)
> >
> > On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > Any interest in building a reverse streaming tool, that does the
> reverse
> > > of what the DeltaStreamer tool does? It will read Hudi table
> > incrementally
> > > (only source) and write out the data to a variety of sinks - Kafka,
> JDBC
> > > Databases, DFS.
> > >
> > > This has come up many times with data warehouse users. Often times,
> they
> > > want to use Hudi to speed up or reduce costs on their data ingestion
> and
> > > ETL (using Spark/Flink), but want to move the derived data back into a
> > data
> > > warehouse or an operational database for serving.
> > >
> > > What do you all think?
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>

Re: [DISCUSS] Hudi Reverse Streamer

Reply via email to