[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391352601 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) { .map(x -> new TopicPartition(x.topic(), x.partition())).collect(Collectors.toSet()); // Determine the offset ranges to read from - if (lastCheckpointStr.isPresent()) { + if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) { Review comment: Thanks, I need to think about it, wait a moment : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391348916 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) { .map(x -> new TopicPartition(x.topic(), x.partition())).collect(Collectors.toSet()); // Determine the offset ranges to read from - if (lastCheckpointStr.isPresent()) { + if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) { Review comment: Prefer to pushdown the control bebavior to datasource(e.g kafka / pulsar) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391348111 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) { .map(x -> new TopicPartition(x.topic(), x.partition())).collect(Collectors.toSet()); // Determine the offset ranges to read from - if (lastCheckpointStr.isPresent()) { + if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) { Review comment: hi @garyli1019, let's imagine a scenario that a topic with no data, the first commit will save empty checkpoint, then the second commit will always throw exception(even if we send msg to kafka). In that case, `EARLIEST` or `LATEST` will no longer has any effect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r390695516 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) { .map(x -> new TopicPartition(x.topic(), x.partition())).collect(Collectors.toSet()); // Determine the offset ranges to read from - if (lastCheckpointStr.isPresent()) { + if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) { Review comment: hi @bvaradar, add `!commitMetadata.getMetadata(CHECKPOINT_KEY).isEmpty()`. but if we do that, the application will always throw `HoodieDeltaStreamerException` ![image](https://user-images.githubusercontent.com/20113411/76372301-ee63d700-6377-11ea-863a-21a99028dc5d.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r388930286 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) { .map(x -> new TopicPartition(x.topic(), x.partition())).collect(Collectors.toSet()); // Determine the offset ranges to read from - if (lastCheckpointStr.isPresent()) { + if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) { Review comment: hi, if a hidden bug happens, the `*.commit` file will not be created, the next run will still read from the last successful commit, so no data will be lossed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services