[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-601303031 >https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L331 This is handling the case that the transformer filtered everything out, but we want to do an empty commit to record that the offsets moved. The resets happening around Kafka, should be left alone IMO.. It can happen due to variety of reasons like topic retention kicking in etc.. For this PR, I am not sure if we want to special case an empty string.. ? Can we formulate the problem we are trying to solve again.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-599717588 I am bit confused at this point for the PR.. Can you summarize where we are at? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-596358157 >>Create a kafka topic(don't produce any data to kafka) which means a topic with no messages, then start the delta streamer, hudi will store the empty checkpoint. I assume there will be partitions created already.. So instead of special casing the empty checkpoint for a specific source (which seems higher maintenance), can we make the code store an actual checkpoint.. Our contract can be that if a checkpoint is stored, then it is indeed non empty.. I am with @garyli1019 that erroring out on empty checkpoints is better.. Not sure if we expect an empty checkpoint in other sources as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-596357138 >>then start the delta streamer, hudi will store the empty checkpoint. Re-reading this again.. Is this the right behavior? I think there are a few cases now handled in delta-streamer that has made life a bit complicated.. Reason for writing such empty checkpoint could be that - we want to write checkpoints even for empty commits, since it could have read data but the transformer could have filtered all of that out.. I think the right fix could be to checkpoint the actual fromOffsets instead of empty checkpoint.. >>the second commit will use the last checkpoint {}, which means the fromoffset is 0. but the previous messages may be removed because of kafka retention mechanism. I'd appreciate it if we took into consideration how checkpoint is handled in a general source agnostic way and also fix this issue.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-595907417 @garyli1019 can you drive this review? :). I can merge once I have a 👍 from you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services