[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-19 Thread GitBox
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-601303031
 
 
   
>https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L331
   This is handling the case that the transformer filtered everything out, but 
we want to do an empty commit to record that the offsets moved.
   
   The resets happening around Kafka, should be left alone IMO.. It can happen 
due to variety of reasons like topic retention kicking in etc.. 
   
   For this PR, I am not sure if we want to special case an empty string.. ? 
Can we formulate the problem we are trying to solve again..  
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-16 Thread GitBox
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-599717588
 
 
   I am bit confused at this point for the PR.. Can you summarize where we are 
at? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-08 Thread GitBox
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-596358157
 
 
   >>Create a kafka topic(don't produce any data to kafka) which means a topic 
with no messages, then start the delta streamer, hudi will store the empty 
checkpoint.
   
   I assume there will be partitions created already.. So instead of special 
casing the empty checkpoint for a specific source (which seems higher 
maintenance), can we make the code store an actual checkpoint.. Our contract 
can be that if a checkpoint is stored, then it is indeed non empty.. 
   
   I am with @garyli1019  that erroring out on empty checkpoints is better.. 
Not sure if we expect an empty checkpoint in other sources as well 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-08 Thread GitBox
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-596357138
 
 
   >>then start the delta streamer, hudi will store the empty checkpoint.
   Re-reading this again.. Is this the right behavior? I think there are a few 
cases now handled in delta-streamer that has made life a bit complicated.. 
   
   Reason for writing such empty checkpoint could be that - we want to write 
checkpoints even for empty commits, since it could have read data but the 
transformer could have filtered all of that out.. 
   
   I think the right fix could be to checkpoint the actual fromOffsets instead 
of empty checkpoint.. 
   
   >>the second commit will use the last checkpoint {}, which means the 
fromoffset is 0.
   but the previous messages may be removed because of kafka retention 
mechanism.
   
   I'd appreciate it if we took into consideration how checkpoint is handled in 
a general source agnostic way and also fix this issue.. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-06 Thread GitBox
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-595907417
 
 
   @garyli1019 can you drive this review? :). I can merge once I have a 👍  from 
you


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services