+1. This would be a really good feature to have when building dependent ETL 
pipelines.

    On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang <vinoy...@apache.org> 
wrote:  
 
 Hi all,

Currently, we have a need to incrementally process and build a new table
based on an original hoodie table. We expect that after a new commit is
completed on the original hoodie table, it could be retrieved ASAP, so that
it can be used for incremental view queries. Based on the existing
capabilities, one approach we can use is to continuously poll Hoodie's
Timeline to check for new commits. This is a very common processing way,
but it will cause unnecessary waste of resources.

We expect to introduce a proactive notification(event callback) mechanism.
For example, a hook can be introduced after a successful commit. External
processors interested in the commit, such as scheduling systems, can use
the hook as their own trigger. When a certain commit is completed, the
scheduling system can pull up the task of obtaining incremental data
through the API in the callback. Thereby completing the processing of
incremental data.

There is currently a `postCommit` method in Hudi's client module, and the
existing implementation is mainly used for compression and cleanup after
commit. And the triggering time is a little early. Not after everything is
processed, we found that it may still cause the rollback of the commit due
to the exception. We need to find a new location to trigger this hook to
ensure that the commit is deterministic.

This is one of our scene requirements, and it will be a very useful feature
combined with the incremental query, it can make the incremental processing
more timely.

We hope to hear what the community thinks of this proposal. Any comments
and opinions are appreciated.

Best,
Vino
  

Reply via email to