+1. would be a nice addition.

On Sun, Jun 21, 2020 at 12:02 PM vbal...@apache.org <vbal...@apache.org>
wrote:

>
> +1. This would be a really good feature to have when building dependent
> ETL pipelines.
>
>     On Friday, June 19, 2020, 05:13:45 PM PDT, vino yang <
> vinoy...@apache.org> wrote:
>
>  Hi all,
>
> Currently, we have a need to incrementally process and build a new table
> based on an original hoodie table. We expect that after a new commit is
> completed on the original hoodie table, it could be retrieved ASAP, so that
> it can be used for incremental view queries. Based on the existing
> capabilities, one approach we can use is to continuously poll Hoodie's
> Timeline to check for new commits. This is a very common processing way,
> but it will cause unnecessary waste of resources.
>
> We expect to introduce a proactive notification(event callback) mechanism.
> For example, a hook can be introduced after a successful commit. External
> processors interested in the commit, such as scheduling systems, can use
> the hook as their own trigger. When a certain commit is completed, the
> scheduling system can pull up the task of obtaining incremental data
> through the API in the callback. Thereby completing the processing of
> incremental data.
>
> There is currently a `postCommit` method in Hudi's client module, and the
> existing implementation is mainly used for compression and cleanup after
> commit. And the triggering time is a little early. Not after everything is
> processed, we found that it may still cause the rollback of the commit due
> to the exception. We need to find a new location to trigger this hook to
> ensure that the commit is deterministic.
>
> This is one of our scene requirements, and it will be a very useful feature
> combined with the incremental query, it can make the incremental processing
> more timely.
>
> We hope to hear what the community thinks of this proposal. Any comments
> and opinions are appreciated.
>
> Best,
> Vino
>



-- 
Regards,
-Sivabalan

Reply via email to