What you suggest sounds more like an `Exporter` tool? I imagine you will support MOR as well? +1 on the idea itself. It could be useful if plain parquet snapshot was generated as a backup.
On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu <xu.shiyan.raym...@gmail.com> wrote: > Hi All, > > The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi copy and > primarily for backup purpose. > > I would like to start a RFC for a more generic Hudi snapshotter, which > > - Supports existing SnapshotCopier features > - Add option to export a Hudi dataset to plain parquet files > - output latest records via Spark dataframe writer > - remove Hudi metadata fields > - support custom repartition requirements > > Is this a good idea to start an RFC? > > Thank you. > > Regards, > Raymond Xu >