FWIW we built this out at Uber, at the ingest tool level (i.e
deltastreamer) and used it to notify the workflow scheduler to trigger
pipelines by data availability, and not by time.. So if we can do some
Airflow integration, that would be awesome (probably not in the scope of
this work may be).

Not sure if Nick is still actively following this list. This is a feature
he has brought up time and again as well.

On Sun, Jun 21, 2020 at 8:15 AM Shiyan Xu <xu.shiyan.raym...@gmail.com>
wrote:

> +1. It is a great complement to the pull model; helpful to fan-out
> scenarios
>
> On Sun, Jun 21, 2020 at 8:07 AM Bhavani Sudha <bhavanisud...@gmail.com>
> wrote:
>
> > +1 . I think this is a valid use case and would be useful in general.
> >
> > On Sun, Jun 21, 2020 at 7:11 AM Vinoth Chandar <vin...@apache.org>
> wrote:
> >
> > > +1 as well
> > >
> > > > We expect to introduce a proactive notification(event callback)
> > > mechanism. For example, a hook can be introduced after a successful
> > commit.
> > >
> > > This would be very useful. We could write to a variety of event bus-es
> > and
> > > notify new data arrival.
> > >
> > > On Sat, Jun 20, 2020 at 2:51 AM wangxianghu <wxhj...@126.com> wrote:
> > >
> > > > +1 for this, I think this is a feature worth doing.
> > > > Think about it in the filed of offline computing, data changes
> happens
> > > > hourly or daily, if there is no a notification mechanism to inform
> the
> > > > downstream,  then the tasks downstream will keeping running all the
> day
> > > > along, but the time really processing data maybe very short, this
> > > situation
> > > > will surely cause resource wastes.
> > > > > 2020年6月20日 上午8:13,vino yang <vinoy...@apache.org> 写道:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Currently, we have a need to incrementally process and build a new
> > > table
> > > > > based on an original hoodie table. We expect that after a new
> commit
> > is
> > > > > completed on the original hoodie table, it could be retrieved ASAP,
> > so
> > > > that
> > > > > it can be used for incremental view queries. Based on the existing
> > > > > capabilities, one approach we can use is to continuously poll
> > Hoodie's
> > > > > Timeline to check for new commits. This is a very common processing
> > > way,
> > > > > but it will cause unnecessary waste of resources.
> > > > >
> > > > > We expect to introduce a proactive notification(event callback)
> > > > mechanism.
> > > > > For example, a hook can be introduced after a successful commit.
> > > External
> > > > > processors interested in the commit, such as scheduling systems,
> can
> > > use
> > > > > the hook as their own trigger. When a certain commit is completed,
> > the
> > > > > scheduling system can pull up the task of obtaining incremental
> data
> > > > > through the API in the callback. Thereby completing the processing
> of
> > > > > incremental data.
> > > > >
> > > > > There is currently a `postCommit` method in Hudi's client module,
> and
> > > the
> > > > > existing implementation is mainly used for compression and cleanup
> > > after
> > > > > commit. And the triggering time is a little early. Not after
> > everything
> > > > is
> > > > > processed, we found that it may still cause the rollback of the
> > commit
> > > > due
> > > > > to the exception. We need to find a new location to trigger this
> hook
> > > to
> > > > > ensure that the commit is deterministic.
> > > > >
> > > > > This is one of our scene requirements, and it will be a very useful
> > > > feature
> > > > > combined with the incremental query, it can make the incremental
> > > > processing
> > > > > more timely.
> > > > >
> > > > > We hope to hear what the community thinks of this proposal. Any
> > > comments
> > > > > and opinions are appreciated.
> > > > >
> > > > > Best,
> > > > > Vino
> > > >
> > > >
> > >
> >
>

Reply via email to