+1. It is a great complement to the pull model; helpful to fan-out scenarios
On Sun, Jun 21, 2020 at 8:07 AM Bhavani Sudha <bhavanisud...@gmail.com> wrote: > +1 . I think this is a valid use case and would be useful in general. > > On Sun, Jun 21, 2020 at 7:11 AM Vinoth Chandar <vin...@apache.org> wrote: > > > +1 as well > > > > > We expect to introduce a proactive notification(event callback) > > mechanism. For example, a hook can be introduced after a successful > commit. > > > > This would be very useful. We could write to a variety of event bus-es > and > > notify new data arrival. > > > > On Sat, Jun 20, 2020 at 2:51 AM wangxianghu <wxhj...@126.com> wrote: > > > > > +1 for this, I think this is a feature worth doing. > > > Think about it in the filed of offline computing, data changes happens > > > hourly or daily, if there is no a notification mechanism to inform the > > > downstream, then the tasks downstream will keeping running all the day > > > along, but the time really processing data maybe very short, this > > situation > > > will surely cause resource wastes. > > > > 2020年6月20日 上午8:13,vino yang <vinoy...@apache.org> 写道: > > > > > > > > Hi all, > > > > > > > > Currently, we have a need to incrementally process and build a new > > table > > > > based on an original hoodie table. We expect that after a new commit > is > > > > completed on the original hoodie table, it could be retrieved ASAP, > so > > > that > > > > it can be used for incremental view queries. Based on the existing > > > > capabilities, one approach we can use is to continuously poll > Hoodie's > > > > Timeline to check for new commits. This is a very common processing > > way, > > > > but it will cause unnecessary waste of resources. > > > > > > > > We expect to introduce a proactive notification(event callback) > > > mechanism. > > > > For example, a hook can be introduced after a successful commit. > > External > > > > processors interested in the commit, such as scheduling systems, can > > use > > > > the hook as their own trigger. When a certain commit is completed, > the > > > > scheduling system can pull up the task of obtaining incremental data > > > > through the API in the callback. Thereby completing the processing of > > > > incremental data. > > > > > > > > There is currently a `postCommit` method in Hudi's client module, and > > the > > > > existing implementation is mainly used for compression and cleanup > > after > > > > commit. And the triggering time is a little early. Not after > everything > > > is > > > > processed, we found that it may still cause the rollback of the > commit > > > due > > > > to the exception. We need to find a new location to trigger this hook > > to > > > > ensure that the commit is deterministic. > > > > > > > > This is one of our scene requirements, and it will be a very useful > > > feature > > > > combined with the incremental query, it can make the incremental > > > processing > > > > more timely. > > > > > > > > We hope to hear what the community thinks of this proposal. Any > > comments > > > > and opinions are appreciated. > > > > > > > > Best, > > > > Vino > > > > > > > > >