yihua commented on code in PR #9473: URL: https://github.com/apache/hudi/pull/9473#discussion_r1320357906
########## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ########## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); Review Comment: ```suggestion // When `beginInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` (due to missing checkpoint), `previousInstantTime` is set to `DEFAULT_BEGIN_TIMESTAMP` as well. ``` ########## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ########## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); - String previousInstantTime = beginInstantTime; + String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) { Review Comment: ```suggestion // When `beginInstantTime` is present, `previousInstantTime` is set to the completed commit before `beginInstantTime` if that exists. If there is no completed commit before `beginInstantTime`, e.g., `beginInstantTime` is the first commit in the active timeline, `previousInstantTime` is set to `DEFAULT_BEGIN_TIMESTAMP`. if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) { ``` ########## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ########## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); - String previousInstantTime = beginInstantTime; + String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; Review Comment: Have you also gone through the logic for `MissingCheckpointStrategy.READ_LATEST` and see if there's any gap? ########## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ########## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); - String previousInstantTime = beginInstantTime; + String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; Review Comment: Basically, the only change here is when `beginInstantTime === <first commit>`, the `previousInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` now whereas `<first commit>` before this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org