Came up with the first draft. Thank you.
https://cwiki.apache.org/confluence/display/HUDI/RFC-9%3A+%28WIP%29+Hudi+Dataset+Snapshotter


On Tue, Nov 12, 2019 at 12:44 PM Shiyan Xu <[email protected]>
wrote:

> Thank you all for the +1s! I'll go ahead add a RFC page then.
>
> On Tue, Nov 12, 2019 at 8:41 AM nishith agarwal <[email protected]>
> wrote:
>
>> +1 on the exporter tool idea.
>>
>> -Nishith
>>
>> On Tue, Nov 12, 2019 at 5:06 AM leesf <[email protected]> wrote:
>>
>> > +1. and we would discuss it further when design docs are available.
>> >
>> > Best,
>> > Leesf
>> >
>> > Balaji Varadarajan <[email protected]> 于2019年11月12日周二 下午4:17写道:
>> >
>> > > +1 on the exporter tool idea.
>> > >
>> > > On Mon, Nov 11, 2019 at 10:36 PM vino yang <[email protected]>
>> > wrote:
>> > >
>> > > > Hi Shiyan,
>> > > >
>> > > > +1 for this proposal, Also, it looks like an exporter tool.
>> > > >
>> > > > @Vinoth Chandar <[email protected]>  Any thoughts about where to
>> place
>> > > it?
>> > > >
>> > > > Best,
>> > > > Vino
>> > > >
>> > > > Vinoth Chandar <[email protected]> 于2019年11月12日周二 上午8:58写道:
>> > > >
>> > > > > We can wait for others to chime in as well. :)
>> > > > >
>> > > > > On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu <
>> > [email protected]
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Yes, Vinoth, you're right that it is more of an exporter, which
>> > > > exports a
>> > > > > > snapshot from Hudi dataset.
>> > > > > >
>> > > > > > It should support MOR too; it shall just leverage on existing
>> > > > > > SnapshotCopier logic to find the latest file slices.
>> > > > > >
>> > > > > > So is it good to create a RFC for further discussion?
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar <
>> [email protected]>
>> > > > > wrote:
>> > > > > >
>> > > > > > > What you suggest sounds more like an `Exporter` tool?  I
>> imagine
>> > > you
>> > > > > will
>> > > > > > > support MOR as well?  +1 on the idea itself. It could be
>> useful
>> > if
>> > > > > plain
>> > > > > > > parquet snapshot was generated as a backup.
>> > > > > > >
>> > > > > > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu <
>> > > > [email protected]
>> > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi All,
>> > > > > > > >
>> > > > > > > > The existing SnapshotCopier under Hudi Utilities is a
>> > > Hudi-to-Hudi
>> > > > > copy
>> > > > > > > and
>> > > > > > > > primarily for backup purpose.
>> > > > > > > >
>> > > > > > > > I would like to start a RFC for a more generic Hudi
>> > snapshotter,
>> > > > > which
>> > > > > > > >
>> > > > > > > >    - Supports existing SnapshotCopier features
>> > > > > > > >    - Add option to export a Hudi dataset to plain parquet
>> files
>> > > > > > > >       - output latest records via Spark dataframe writer
>> > > > > > > >       - remove Hudi metadata fields
>> > > > > > > >       - support custom repartition requirements
>> > > > > > > >
>> > > > > > > > Is this a good idea to start an RFC?
>> > > > > > > >
>> > > > > > > > Thank you.
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > > Raymond Xu
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to