Pranoti Shanbhag created HUDI-1947:
--------------------------------------

             Summary: Hudi Commit Callback and commit in a single transaction
                 Key: HUDI-1947
                 URL: https://issues.apache.org/jira/browse/HUDI-1947
             Project: Apache Hudi
          Issue Type: New Feature
            Reporter: Pranoti Shanbhag


Hello,

I am using Hudi Commit callbacks to call an internal service. As per my 
understanding, the service is called after the commit on the dataset and if 
there is a failure in the callback service we would not rollback the commit.

The service which we call saves the commit time in a database which is accessed 
by multiple pipelines to get the incremental delta. For example, when there are 
4 commits in hudi dataset, we register 4 commit timestamps in the database. The 
pipelines that need the incremental delta, run at different frequencies and use 
this database to fetch new data after their respective runs. 

For this to work well, we need the hudi commit and call back to be atomic in a 
single transaction. Otherwise on callback failures, there may be data in the 
hudi dataset which may not be registered in the DB.

Please can you let me know if this can be supported and if there is a way to 
achieve this with the current implementation. We do have retries set up and are 
not expecting failures but we want to keep the hudi commits in sync with what 
we register in the DB.

 

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to