This is a great discussion! thanks!

On Mon, Jun 22, 2020 at 6:33 PM vino yang <yanghua1...@gmail.com> wrote:

> Hi everyone,
>
> Thanks for sharing your thoughts.
>
> We have created a Jira issue to track this work.[1]
>
> Best,
> Vino
>
> [1]: https://issues.apache.org/jira/browse/HUDI-1037
>
> Vinoth Chandar <vin...@apache.org> 于2020年6月23日周二 上午6:38写道:
>
> > Great, looks like a JIRA is in order? :), given we all agree
> > enthusiastically
> >
> > On Sun, Jun 21, 2020 at 8:10 PM Gary Li <yanjia.gary...@gmail.com>
> wrote:
> >
> > > +1.
> > > That would be great to have a communication mechanism between
> downstream
> > > CDC applications chain.
> > > e.g. A->B->C->D. Right now I am using the commit timestamp to identify
> > > whether there is a new commit came in. But if I need to recompute app
> B,
> > > it’s difficult for C and D to aware they have to recompute as well,
> > > especially when the triggering frequencies are different.
> > >
> > > On Sun, Jun 21, 2020 at 6:11 PM hddong <hongdd2...@gmail.com<mailto:
> > > hongdd2...@gmail.com>> wrote:
> > > +1. a great feature.
> > >
> > > Sivabalan <n.siv...@gmail.com<mailto:n.siv...@gmail.com>>
> 于2020年6月22日周一
> > > 上午7:50写道:
> > >
> > > > +1. would be a nice addition.
> > > >
> > > > On Sun, Jun 21, 2020 at 12:02 PM vbal...@apache.org<mailto:
> > > vbal...@apache.org> <vbal...@apache.org<mailto:vbal...@apache.org>>
> > > > wrote:
> > > >
> > > > >
> > > > > +1. This would be a really good feature to have when building
> > dependent
> > > > > ETL pipelines.
> > > > >
> > > > >     On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang <
> > > > > vinoy...@apache.org<mailto:vinoy...@apache.org>> wrote:
> > > > >
> > > > >  Hi all,
> > > > >
> > > > > Currently, we have a need to incrementally process and build a new
> > > table
> > > > > based on an original hoodie table. We expect that after a new
> commit
> > is
> > > > > completed on the original hoodie table, it could be retrieved ASAP,
> > so
> > > > that
> > > > > it can be used for incremental view queries. Based on the existing
> > > > > capabilities, one approach we can use is to continuously poll
> > Hoodie's
> > > > > Timeline to check for new commits. This is a very common processing
> > > way,
> > > > > but it will cause unnecessary waste of resources.
> > > > >
> > > > > We expect to introduce a proactive notification(event callback)
> > > > mechanism.
> > > > > For example, a hook can be introduced after a successful commit.
> > > External
> > > > > processors interested in the commit, such as scheduling systems,
> can
> > > use
> > > > > the hook as their own trigger. When a certain commit is completed,
> > the
> > > > > scheduling system can pull up the task of obtaining incremental
> data
> > > > > through the API in the callback. Thereby completing the processing
> of
> > > > > incremental data.
> > > > >
> > > > > There is currently a `postCommit` method in Hudi's client module,
> and
> > > the
> > > > > existing implementation is mainly used for compression and cleanup
> > > after
> > > > > commit. And the triggering time is a little early. Not after
> > everything
> > > > is
> > > > > processed, we found that it may still cause the rollback of the
> > commit
> > > > due
> > > > > to the exception. We need to find a new location to trigger this
> hook
> > > to
> > > > > ensure that the commit is deterministic.
> > > > >
> > > > > This is one of our scene requirements, and it will be a very useful
> > > > feature
> > > > > combined with the incremental query, it can make the incremental
> > > > processing
> > > > > more timely.
> > > > >
> > > > > We hope to hear what the community thinks of this proposal. Any
> > > comments
> > > > > and opinions are appreciated.
> > > > >
> > > > > Best,
> > > > > Vino
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > -Sivabalan
> > > >
> > >
> >
>

Reply via email to