[jira] [Updated] (HUDI-7447) Fix not bootstrap when subTask restart when OPCoordinator handle CheckPointComplete not finished

2024-03-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7447:
-
Fix Version/s: 0.15.0
   1.0.0

> Fix not bootstrap when subTask restart when OPCoordinator handle 
> CheckPointComplete not finished
> 
>
> Key: HUDI-7447
> URL: https://issues.apache.org/jira/browse/HUDI-7447
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Affects Versions: 0.13.1, 0.14.1
>Reporter: Wenbing Shen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is 
> in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may 
> be due to the time-consuming processing of some tableService scanning hdfs, 
> or the time-consuming hdfs execution encountered during Rollback and 
> initInstant.
> 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the 
> corresponding commit file has been submitted. At this time, the bootstrap 
> event will be sent when the subTask restarts.
> 3. After the OperatorCoordinator completes processing the 
> notifyCheckpointComplete, it will create a new Instant, and the subTask will 
> create the corresponding parquet file, etc. based on the Instant.
> 4. OperatorCoordinator then processes the bootstrap event, creates another 
> new Instant, and rolls back the Instant created in the third step. This 
> causes OperatorCoordinator and Operator to begin to be inconsistent.
> This is related to Hudi's three-stage submission, including data snapshot, 
> submit commit file, and submit ckp_meta file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7447) Fix not bootstrap when subTask restart when OPCoordinator handle CheckPointComplete not finished

2024-02-27 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated HUDI-7447:
---
Affects Version/s: 0.14.1
   0.13.1

> Fix not bootstrap when subTask restart when OPCoordinator handle 
> CheckPointComplete not finished
> 
>
> Key: HUDI-7447
> URL: https://issues.apache.org/jira/browse/HUDI-7447
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Affects Versions: 0.13.1, 0.14.1
>Reporter: Wenbing Shen
>Priority: Major
>  Labels: pull-request-available
>
> 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is 
> in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may 
> be due to the time-consuming processing of some tableService scanning hdfs, 
> or the time-consuming hdfs execution encountered during Rollback and 
> initInstant.
> 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the 
> corresponding commit file has been submitted. At this time, the bootstrap 
> event will be sent when the subTask restarts.
> 3. After the OperatorCoordinator completes processing the 
> notifyCheckpointComplete, it will create a new Instant, and the subTask will 
> create the corresponding parquet file, etc. based on the Instant.
> 4. OperatorCoordinator then processes the bootstrap event, creates another 
> new Instant, and rolls back the Instant created in the third step. This 
> causes OperatorCoordinator and Operator to begin to be inconsistent.
> This is related to Hudi's three-stage submission, including data snapshot, 
> submit commit file, and submit ckp_meta file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7447) Fix not bootstrap when subTask restart when OPCoordinator handle CheckPointComplete not finished

2024-02-27 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated HUDI-7447:
---
Fix Version/s: (was: 0.13.1)
   (was: 0.14.2)

> Fix not bootstrap when subTask restart when OPCoordinator handle 
> CheckPointComplete not finished
> 
>
> Key: HUDI-7447
> URL: https://issues.apache.org/jira/browse/HUDI-7447
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Wenbing Shen
>Priority: Major
>  Labels: pull-request-available
>
> 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is 
> in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may 
> be due to the time-consuming processing of some tableService scanning hdfs, 
> or the time-consuming hdfs execution encountered during Rollback and 
> initInstant.
> 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the 
> corresponding commit file has been submitted. At this time, the bootstrap 
> event will be sent when the subTask restarts.
> 3. After the OperatorCoordinator completes processing the 
> notifyCheckpointComplete, it will create a new Instant, and the subTask will 
> create the corresponding parquet file, etc. based on the Instant.
> 4. OperatorCoordinator then processes the bootstrap event, creates another 
> new Instant, and rolls back the Instant created in the third step. This 
> causes OperatorCoordinator and Operator to begin to be inconsistent.
> This is related to Hudi's three-stage submission, including data snapshot, 
> submit commit file, and submit ckp_meta file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7447) Fix not bootstrap when subTask restart when OPCoordinator handle CheckPointComplete not finished

2024-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7447:
-
Labels: pull-request-available  (was: )

> Fix not bootstrap when subTask restart when OPCoordinator handle 
> CheckPointComplete not finished
> 
>
> Key: HUDI-7447
> URL: https://issues.apache.org/jira/browse/HUDI-7447
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Wenbing Shen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.2
>
>
> 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is 
> in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may 
> be due to the time-consuming processing of some tableService scanning hdfs, 
> or the time-consuming hdfs execution encountered during Rollback and 
> initInstant.
> 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the 
> corresponding commit file has been submitted. At this time, the bootstrap 
> event will be sent when the subTask restarts.
> 3. After the OperatorCoordinator completes processing the 
> notifyCheckpointComplete, it will create a new Instant, and the subTask will 
> create the corresponding parquet file, etc. based on the Instant.
> 4. OperatorCoordinator then processes the bootstrap event, creates another 
> new Instant, and rolls back the Instant created in the third step. This 
> causes OperatorCoordinator and Operator to begin to be inconsistent.
> This is related to Hudi's three-stage submission, including data snapshot, 
> submit commit file, and submit ckp_meta file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7447) Fix not bootstrap when subTask restart when OPCoordinator handle CheckPointComplete not finished

2024-02-26 Thread Wenbing Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Shen updated HUDI-7447:
---
Fix Version/s: 0.14.2
   0.13.1

> Fix not bootstrap when subTask restart when OPCoordinator handle 
> CheckPointComplete not finished
> 
>
> Key: HUDI-7447
> URL: https://issues.apache.org/jira/browse/HUDI-7447
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: Wenbing Shen
>Priority: Major
> Fix For: 0.13.1, 0.14.2
>
>
> 1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is 
> in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may 
> be due to the time-consuming processing of some tableService scanning hdfs, 
> or the time-consuming hdfs execution encountered during Rollback and 
> initInstant.
> 2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the 
> corresponding commit file has been submitted. At this time, the bootstrap 
> event will be sent when the subTask restarts.
> 3. After the OperatorCoordinator completes processing the 
> notifyCheckpointComplete, it will create a new Instant, and the subTask will 
> create the corresponding parquet file, etc. based on the Instant.
> 4. OperatorCoordinator then processes the bootstrap event, creates another 
> new Instant, and rolls back the Instant created in the third step. This 
> causes OperatorCoordinator and Operator to begin to be inconsistent.
> This is related to Hudi's three-stage submission, including data snapshot, 
> submit commit file, and submit ckp_meta file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)