+1 on the exporter tool idea.

-Nishith

On Tue, Nov 12, 2019 at 5:06 AM leesf <[email protected]> wrote:

> +1. and we would discuss it further when design docs are available.
>
> Best,
> Leesf
>
> Balaji Varadarajan <[email protected]> 于2019年11月12日周二 下午4:17写道:
>
> > +1 on the exporter tool idea.
> >
> > On Mon, Nov 11, 2019 at 10:36 PM vino yang <[email protected]>
> wrote:
> >
> > > Hi Shiyan,
> > >
> > > +1 for this proposal, Also, it looks like an exporter tool.
> > >
> > > @Vinoth Chandar <[email protected]>  Any thoughts about where to place
> > it?
> > >
> > > Best,
> > > Vino
> > >
> > > Vinoth Chandar <[email protected]> 于2019年11月12日周二 上午8:58写道:
> > >
> > > > We can wait for others to chime in as well. :)
> > > >
> > > > On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > Yes, Vinoth, you're right that it is more of an exporter, which
> > > exports a
> > > > > snapshot from Hudi dataset.
> > > > >
> > > > > It should support MOR too; it shall just leverage on existing
> > > > > SnapshotCopier logic to find the latest file slices.
> > > > >
> > > > > So is it good to create a RFC for further discussion?
> > > > >
> > > > >
> > > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar <[email protected]>
> > > > wrote:
> > > > >
> > > > > > What you suggest sounds more like an `Exporter` tool?  I imagine
> > you
> > > > will
> > > > > > support MOR as well?  +1 on the idea itself. It could be useful
> if
> > > > plain
> > > > > > parquet snapshot was generated as a backup.
> > > > > >
> > > > > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > The existing SnapshotCopier under Hudi Utilities is a
> > Hudi-to-Hudi
> > > > copy
> > > > > > and
> > > > > > > primarily for backup purpose.
> > > > > > >
> > > > > > > I would like to start a RFC for a more generic Hudi
> snapshotter,
> > > > which
> > > > > > >
> > > > > > >    - Supports existing SnapshotCopier features
> > > > > > >    - Add option to export a Hudi dataset to plain parquet files
> > > > > > >       - output latest records via Spark dataframe writer
> > > > > > >       - remove Hudi metadata fields
> > > > > > >       - support custom repartition requirements
> > > > > > >
> > > > > > > Is this a good idea to start an RFC?
> > > > > > >
> > > > > > > Thank you.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Raymond Xu
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to