This is a great discussion! thanks! On Mon, Jun 22, 2020 at 6:33 PM vino yang <[email protected]> wrote:
> Hi everyone, > > Thanks for sharing your thoughts. > > We have created a Jira issue to track this work.[1] > > Best, > Vino > > [1]: https://issues.apache.org/jira/browse/HUDI-1037 > > Vinoth Chandar <[email protected]> 于2020年6月23日周二 上午6:38写道: > > > Great, looks like a JIRA is in order? :), given we all agree > > enthusiastically > > > > On Sun, Jun 21, 2020 at 8:10 PM Gary Li <[email protected]> > wrote: > > > > > +1. > > > That would be great to have a communication mechanism between > downstream > > > CDC applications chain. > > > e.g. A->B->C->D. Right now I am using the commit timestamp to identify > > > whether there is a new commit came in. But if I need to recompute app > B, > > > it’s difficult for C and D to aware they have to recompute as well, > > > especially when the triggering frequencies are different. > > > > > > On Sun, Jun 21, 2020 at 6:11 PM hddong <[email protected]<mailto: > > > [email protected]>> wrote: > > > +1. a great feature. > > > > > > Sivabalan <[email protected]<mailto:[email protected]>> > 于2020年6月22日周一 > > > 上午7:50写道: > > > > > > > +1. would be a nice addition. > > > > > > > > On Sun, Jun 21, 2020 at 12:02 PM [email protected]<mailto: > > > [email protected]> <[email protected]<mailto:[email protected]>> > > > > wrote: > > > > > > > > > > > > > > +1. This would be a really good feature to have when building > > dependent > > > > > ETL pipelines. > > > > > > > > > > On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang < > > > > > [email protected]<mailto:[email protected]>> wrote: > > > > > > > > > > Hi all, > > > > > > > > > > Currently, we have a need to incrementally process and build a new > > > table > > > > > based on an original hoodie table. We expect that after a new > commit > > is > > > > > completed on the original hoodie table, it could be retrieved ASAP, > > so > > > > that > > > > > it can be used for incremental view queries. Based on the existing > > > > > capabilities, one approach we can use is to continuously poll > > Hoodie's > > > > > Timeline to check for new commits. This is a very common processing > > > way, > > > > > but it will cause unnecessary waste of resources. > > > > > > > > > > We expect to introduce a proactive notification(event callback) > > > > mechanism. > > > > > For example, a hook can be introduced after a successful commit. > > > External > > > > > processors interested in the commit, such as scheduling systems, > can > > > use > > > > > the hook as their own trigger. When a certain commit is completed, > > the > > > > > scheduling system can pull up the task of obtaining incremental > data > > > > > through the API in the callback. Thereby completing the processing > of > > > > > incremental data. > > > > > > > > > > There is currently a `postCommit` method in Hudi's client module, > and > > > the > > > > > existing implementation is mainly used for compression and cleanup > > > after > > > > > commit. And the triggering time is a little early. Not after > > everything > > > > is > > > > > processed, we found that it may still cause the rollback of the > > commit > > > > due > > > > > to the exception. We need to find a new location to trigger this > hook > > > to > > > > > ensure that the commit is deterministic. > > > > > > > > > > This is one of our scene requirements, and it will be a very useful > > > > feature > > > > > combined with the incremental query, it can make the incremental > > > > processing > > > > > more timely. > > > > > > > > > > We hope to hear what the community thinks of this proposal. Any > > > comments > > > > > and opinions are appreciated. > > > > > > > > > > Best, > > > > > Vino > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > -Sivabalan > > > > > > > > > >
