Re: [DISCUSS] Introduce a write committed callback hook

Shiyan Xu Sun, 21 Jun 2020 08:15:49 -0700

+1. It is a great complement to the pull model; helpful to fan-out scenarios


On Sun, Jun 21, 2020 at 8:07 AM Bhavani Sudha <bhavanisud...@gmail.com>
wrote:

> +1 . I think this is a valid use case and would be useful in general.
>
> On Sun, Jun 21, 2020 at 7:11 AM Vinoth Chandar <vin...@apache.org> wrote:
>
> > +1 as well
> >
> > > We expect to introduce a proactive notification(event callback)
> > mechanism. For example, a hook can be introduced after a successful
> commit.
> >
> > This would be very useful. We could write to a variety of event bus-es
> and
> > notify new data arrival.
> >
> > On Sat, Jun 20, 2020 at 2:51 AM wangxianghu <wxhj...@126.com> wrote:
> >
> > > +1 for this, I think this is a feature worth doing.
> > > Think about it in the filed of offline computing, data changes happens
> > > hourly or daily, if there is no a notification mechanism to inform the
> > > downstream,  then the tasks downstream will keeping running all the day
> > > along, but the time really processing data maybe very short, this
> > situation
> > > will surely cause resource wastes.
> > > > 2020年6月20日 上午8:13，vino yang <vinoy...@apache.org> 写道：
> > > >
> > > > Hi all,
> > > >
> > > > Currently, we have a need to incrementally process and build a new
> > table
> > > > based on an original hoodie table. We expect that after a new commit
> is
> > > > completed on the original hoodie table, it could be retrieved ASAP,
> so
> > > that
> > > > it can be used for incremental view queries. Based on the existing
> > > > capabilities, one approach we can use is to continuously poll
> Hoodie's
> > > > Timeline to check for new commits. This is a very common processing
> > way,
> > > > but it will cause unnecessary waste of resources.
> > > >
> > > > We expect to introduce a proactive notification(event callback)
> > > mechanism.
> > > > For example, a hook can be introduced after a successful commit.
> > External
> > > > processors interested in the commit, such as scheduling systems, can
> > use
> > > > the hook as their own trigger. When a certain commit is completed,
> the
> > > > scheduling system can pull up the task of obtaining incremental data
> > > > through the API in the callback. Thereby completing the processing of
> > > > incremental data.
> > > >
> > > > There is currently a `postCommit` method in Hudi's client module, and
> > the
> > > > existing implementation is mainly used for compression and cleanup
> > after
> > > > commit. And the triggering time is a little early. Not after
> everything
> > > is
> > > > processed, we found that it may still cause the rollback of the
> commit
> > > due
> > > > to the exception. We need to find a new location to trigger this hook
> > to
> > > > ensure that the commit is deterministic.
> > > >
> > > > This is one of our scene requirements, and it will be a very useful
> > > feature
> > > > combined with the incremental query, it can make the incremental
> > > processing
> > > > more timely.
> > > >
> > > > We hope to hear what the community thinks of this proposal. Any
> > comments
> > > > and opinions are appreciated.
> > > >
> > > > Best,
> > > > Vino
> > >
> > >
> >
>

Re: [DISCUSS] Introduce a write committed callback hook

Reply via email to