[ https://issues.apache.org/jira/browse/HUDI-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan reassigned HUDI-7503: ----------------------------------------- Assignee: sivabalan narayanan > Concurrent executions of table service plan should not corrupt dataset > ---------------------------------------------------------------------- > > Key: HUDI-7503 > URL: https://issues.apache.org/jira/browse/HUDI-7503 > Project: Apache Hudi > Issue Type: Improvement > Components: compaction, table-service > Reporter: Krishen Bhan > Assignee: sivabalan narayanan > Priority: Minor > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > Some external workflow schedulers can accidentally (or) misbehave and > schedule duplicate executions of the same compaction plan. We need a way to > guard against this inside Hudi (vs user taking a lock externally). In such a > world, 2 instance of the job concurrently call > `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same > compaction instant. > This is since one writer might execute the instant and create an inflight, > while the other writer sees the inflight and tries to roll it back before > re-attempting to execute it (since it will assume said inflight was a > previously failed compaction attempt). > This logic should be updated such that only one writer will actually execute > the compaction plan at a time (and the others will fail/abort). > One approach is to use a transaction (base table lock) in conjunction with > heartbeating, to ensure that the writer triggers a heartbeat before executing > compaction, and any concurrent writers will use the heartbeat to check wether > the compaction is currently being executed by another writer. Specifically , > the compact API should execute the following steps > # Get the instant to compact C (as usual) > # Start a transaction > # Checks if C has an active heartbeat, if so finish transaction and throw > exception > # Start a heartbeat for C (this will implicitly re-start the heartbeat if it > has been started before by another job) > # Finish transaction > # Run the existing compact API logic on C > # If execution succeeds, clean up heartbeat file . If it fails do nothing > (as the heartbeat will anyway be automatically expired later). > Note that this approach only holds the table lock temporarily, when > checking/starting the heartbeat > Also, this flow can be applied to execution of clean plans and other table > services -- This message was sent by Atlassian Jira (v8.20.10#820010)