nsivabalan edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-799571386


   Thanks for your contribution. this is going to be useful to the community. 
   Few high level questions.
   1. Why not we leverage DeltaSreamerConfig.checkpoint to pass in a timestamp 
for Kafka source? Or do we expect the format of this config to be 
"topic_name,partition_num:offset,partition_num:offset,...." and hence we need a 
new config for timestamp based checkpoint. 
   2. If yes to (1), Did we think about parsing the checkpoint config and 
determining whether its above format or timestamp and then proceeding from 
there. Just trying to avoid introducing new configs if possible. 
   3. Checkpoint in deltastreamer in general is getting too complicated. I 
definitely see a benefit in this patch. But, is there a way we can abstract it 
out based on source. Bcoz, the new config introduced as part of this PR, is 
very specific to Kafka. So, trying to see if we can keep it abstracted out from 
deltastreamer if possible. 
   4. I see KafkaConsumer.offsetsForTimes() could return null for partitions w/ 
msgs of old format. So, what's the expected behavior for such partitions. Do we 
resume from earliest offset? 
   
   @n3nash @vinothchandar : open to hear your thoughts if any. One of my 
suggestion above, could potentially add apis to Source and hence CCing you. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to