+1 to this. I can help drive some of this work.
On Fri, Mar 31, 2023 at 10:09 AM Prashant Wason <pwa...@uber.com.invalid> wrote: > Could be useful. Also, may be useful for backup / replication scenario > (keeping a copy of data in alternate/cloud DC). > > HoodieDeltaStreamer already has the concept of "sources". This can be > implemented as a "sink" concept. > > On Thu, Mar 30, 2023 at 8:12 PM Vinoth Chandar <vin...@apache.org> wrote: > > > Essentially. > > > > Old architecture : (operational database) ==> some tool ==> (data > > warehouse raw data) ==> SQL ETL ==> (data warehouse derived data) > > > > New architecture : (operational database) ==> Hudi delta Streamer ==> > (Hudi > > raw data) ==> Spark/Flink Hudi ETL ==> (Hudi derived data) ==> Hudi > Reverse > > Streamer ==> (Data Warehouse/Kafka/Operational Database) > > > > On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar <vin...@apache.org> > wrote: > > > > > Hi all, > > > > > > Any interest in building a reverse streaming tool, that does the > reverse > > > of what the DeltaStreamer tool does? It will read Hudi table > > incrementally > > > (only source) and write out the data to a variety of sinks - Kafka, > JDBC > > > Databases, DFS. > > > > > > This has come up many times with data warehouse users. Often times, > they > > > want to use Hudi to speed up or reduce costs on their data ingestion > and > > > ETL (using Spark/Flink), but want to move the derived data back into a > > data > > > warehouse or an operational database for serving. > > > > > > What do you all think? > > > > > > Thanks > > > Vinoth > > > > > >