hudi-agent commented on code in PR #19023:
URL: https://github.com/apache/hudi/pull/19023#discussion_r3432549057


##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java:
##########
@@ -397,6 +397,20 @@ public void preTxn(WriteOperationType operationType, 
HoodieTableMetaClient metaC
     tableServiceClient.startAsyncArchiveService(this);
   }
 
+  /**
+   * Initializes the transaction state for recommitting an inflight instant 
during recovery.
+   * Sets lastCompletedTxnAndMetadata to the last completed instant whose 
requested time
+   * and completion time are both before the given instant's requested time, 
ensuring
+   * OCC conflict resolution only checks against genuinely concurrent commits.
+   */
+  public void preTxnForRecommit(WriteOperationType operationType, 
HoodieTableMetaClient metaClient, String currentInstantTime) {
+    if (txnManager.isLockRequired() && 
config.needResolveWriteConflict(operationType, metaClient.isMetadataTable(), 
config, metaClient.getTableConfig())) {
+      this.lastCompletedTxnAndMetadata = 
TransactionUtils.getLastCompletedTxnInstantAndMetadata(metaClient, 
currentInstantTime);

Review Comment:
   🤖 Could you trace through the multi-pending restart case (e.g. 
@HuangZhenQiu's 3-checkpoint-failure scenario where A and B are both inflight 
on restart)? In iter 2 for B, A was just recommitted in iter 1, but its 
completionTime is reassigned to wall-clock-now (post-restart, so > 
B.requestedTime). The `completionTime < currentInstantTime` filter then 
excludes A from the baseline, falling back to some earlier C. 
SimpleConcurrentFileWritesConflictResolutionStrategy's 
`findInstantsAfter(C.requestedTime)` then picks up A as a candidate, and 
`hasConflict(B, A)` returns true on the shared file groups typical of streaming 
upserts — re-triggering the same `HoodieWriteConflictException` this PR is 
fixing. The new `testCommittingMultipleInstantsWithOCC` exercises the regular 
`commitInstants` path (which uses the no-arg `preTxn`), so it wouldn't catch 
this. @danny0405 thoughts on whether `preTxnForRecommit` should use the no-arg 
`getLastCompletedTxnInstantAndMetadata` here so the just-re
 committed instant becomes the baseline?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to