[ 
https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7447:
---------------------------------
    Labels: pull-request-available  (was: )

> Fix not bootstrap when subTask restart when OPCoordinator handle 
> CheckPointComplete not finished
> ------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-7447
>                 URL: https://issues.apache.org/jira/browse/HUDI-7447
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: bootstrap
>            Reporter: Wenbing Shen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.1, 0.14.2
>
>
> 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is 
> in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may 
> be due to the time-consuming processing of some tableService scanning hdfs, 
> or the time-consuming hdfs execution encountered during Rollback and 
> initInstant.
> 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the 
> corresponding commit file has been submitted. At this time, the bootstrap 
> event will be sent when the subTask restarts.
> 3. After the OperatorCoordinator completes processing the 
> notifyCheckpointComplete, it will create a new Instant, and the subTask will 
> create the corresponding parquet file, etc. based on the Instant.
> 4. OperatorCoordinator then processes the bootstrap event, creates another 
> new Instant, and rolls back the Instant created in the third step. This 
> causes OperatorCoordinator and Operator to begin to be inconsistent.
> This is related to Hudi's three-stage submission, including data snapshot, 
> submit commit file, and submit ckp_meta file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to