+1. That would be great to have a communication mechanism between downstream CDC applications chain. e.g. A->B->C->D. Right now I am using the commit timestamp to identify whether there is a new commit came in. But if I need to recompute app B, it’s difficult for C and D to aware they have to recompute as well, especially when the triggering frequencies are different.
On Sun, Jun 21, 2020 at 6:11 PM hddong <hongdd2...@gmail.com<mailto:hongdd2...@gmail.com>> wrote: +1. a great feature. Sivabalan <n.siv...@gmail.com<mailto:n.siv...@gmail.com>> 于2020年6月22日周一 上午7:50写道: > +1. would be a nice addition. > > On Sun, Jun 21, 2020 at 12:02 PM > vbal...@apache.org<mailto:vbal...@apache.org> > <vbal...@apache.org<mailto:vbal...@apache.org>> > wrote: > > > > > +1. This would be a really good feature to have when building dependent > > ETL pipelines. > > > > On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang < > > vinoy...@apache.org<mailto:vinoy...@apache.org>> wrote: > > > > Hi all, > > > > Currently, we have a need to incrementally process and build a new table > > based on an original hoodie table. We expect that after a new commit is > > completed on the original hoodie table, it could be retrieved ASAP, so > that > > it can be used for incremental view queries. Based on the existing > > capabilities, one approach we can use is to continuously poll Hoodie's > > Timeline to check for new commits. This is a very common processing way, > > but it will cause unnecessary waste of resources. > > > > We expect to introduce a proactive notification(event callback) > mechanism. > > For example, a hook can be introduced after a successful commit. External > > processors interested in the commit, such as scheduling systems, can use > > the hook as their own trigger. When a certain commit is completed, the > > scheduling system can pull up the task of obtaining incremental data > > through the API in the callback. Thereby completing the processing of > > incremental data. > > > > There is currently a `postCommit` method in Hudi's client module, and the > > existing implementation is mainly used for compression and cleanup after > > commit. And the triggering time is a little early. Not after everything > is > > processed, we found that it may still cause the rollback of the commit > due > > to the exception. We need to find a new location to trigger this hook to > > ensure that the commit is deterministic. > > > > This is one of our scene requirements, and it will be a very useful > feature > > combined with the incremental query, it can make the incremental > processing > > more timely. > > > > We hope to hear what the community thinks of this proposal. Any comments > > and opinions are appreciated. > > > > Best, > > Vino > > > > > > -- > Regards, > -Sivabalan >