[ https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenbing Shen updated HUDI-7447: ------------------------------- Affects Version/s: 0.14.1 0.13.1 > Fix not bootstrap when subTask restart when OPCoordinator handle > CheckPointComplete not finished > ------------------------------------------------------------------------------------------------ > > Key: HUDI-7447 > URL: https://issues.apache.org/jira/browse/HUDI-7447 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap > Affects Versions: 0.13.1, 0.14.1 > Reporter: Wenbing Shen > Priority: Major > Labels: pull-request-available > > 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is > in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may > be due to the time-consuming processing of some tableService scanning hdfs, > or the time-consuming hdfs execution encountered during Rollback and > initInstant. > 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the > corresponding commit file has been submitted. At this time, the bootstrap > event will be sent when the subTask restarts. > 3. After the OperatorCoordinator completes processing the > notifyCheckpointComplete, it will create a new Instant, and the subTask will > create the corresponding parquet file, etc. based on the Instant. > 4. OperatorCoordinator then processes the bootstrap event, creates another > new Instant, and rolls back the Instant created in the third step. This > causes OperatorCoordinator and Operator to begin to be inconsistent. > This is related to Hudi's three-stage submission, including data snapshot, > submit commit file, and submit ckp_meta file -- This message was sent by Atlassian Jira (v8.20.10#820010)