+1. would be a nice addition. On Sun, Jun 21, 2020 at 12:02 PM [email protected] <[email protected]> wrote:
> > +1. This would be a really good feature to have when building dependent > ETL pipelines. > > On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang < > [email protected]> wrote: > > Hi all, > > Currently, we have a need to incrementally process and build a new table > based on an original hoodie table. We expect that after a new commit is > completed on the original hoodie table, it could be retrieved ASAP, so that > it can be used for incremental view queries. Based on the existing > capabilities, one approach we can use is to continuously poll Hoodie's > Timeline to check for new commits. This is a very common processing way, > but it will cause unnecessary waste of resources. > > We expect to introduce a proactive notification(event callback) mechanism. > For example, a hook can be introduced after a successful commit. External > processors interested in the commit, such as scheduling systems, can use > the hook as their own trigger. When a certain commit is completed, the > scheduling system can pull up the task of obtaining incremental data > through the API in the callback. Thereby completing the processing of > incremental data. > > There is currently a `postCommit` method in Hudi's client module, and the > existing implementation is mainly used for compression and cleanup after > commit. And the triggering time is a little early. Not after everything is > processed, we found that it may still cause the rollback of the commit due > to the exception. We need to find a new location to trigger this hook to > ensure that the commit is deterministic. > > This is one of our scene requirements, and it will be a very useful feature > combined with the incremental query, it can make the incremental processing > more timely. > > We hope to hear what the community thinks of this proposal. Any comments > and opinions are appreciated. > > Best, > Vino > -- Regards, -Sivabalan
