+1. This would be a really good feature to have when building dependent ETL pipelines.
On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang <vinoy...@apache.org> wrote: Hi all, Currently, we have a need to incrementally process and build a new table based on an original hoodie table. We expect that after a new commit is completed on the original hoodie table, it could be retrieved ASAP, so that it can be used for incremental view queries. Based on the existing capabilities, one approach we can use is to continuously poll Hoodie's Timeline to check for new commits. This is a very common processing way, but it will cause unnecessary waste of resources. We expect to introduce a proactive notification(event callback) mechanism. For example, a hook can be introduced after a successful commit. External processors interested in the commit, such as scheduling systems, can use the hook as their own trigger. When a certain commit is completed, the scheduling system can pull up the task of obtaining incremental data through the API in the callback. Thereby completing the processing of incremental data. There is currently a `postCommit` method in Hudi's client module, and the existing implementation is mainly used for compression and cleanup after commit. And the triggering time is a little early. Not after everything is processed, we found that it may still cause the rollback of the commit due to the exception. We need to find a new location to trigger this hook to ensure that the commit is deterministic. This is one of our scene requirements, and it will be a very useful feature combined with the incremental query, it can make the incremental processing more timely. We hope to hear what the community thinks of this proposal. Any comments and opinions are appreciated. Best, Vino