nsivabalan edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-799571386
Thanks for your contribution. this is going to be useful to the community. Few high level questions. 1. Why not we leverage DeltaSreamerConfig.checkpoint to pass in a timestamp for Kafka source? Or do we expect the format of this config to be "topic_name,partition_num:offset,partition_num:offset,...." and hence we need a new config for timestamp based checkpoint. 2. If yes to (1), Did we think about parsing the checkpoint config and determining whether its above format or timestamp and then proceeding from there. Just trying to avoid introducing new configs if possible. 3. Checkpoint in deltastreamer in general is getting too complicated. I definitely see a benefit in this patch. But, is there a way we can abstract it out based on source. Bcoz, the new config introduced as part of this PR, is very specific to Kafka. So, trying to see if we can keep it abstracted out from deltastreamer if possible. 4. I see KafkaConsumer.offsetsForTimes() could return null for partitions w/ msgs of old format. So, what's the expected behavior for such partitions. Do we resume from earliest offset? @n3nash @vinothchandar : open to hear your thoughts if any. One of my suggestion above, could potentially add apis to Source and hence CCing you. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org