Hi All, I've been working on a pull request [1] to allow Spark read from a specific timestamp from Kinesis. I have iterated the patch with the help of other contributors and we think that its in a good state now.
This patch would save hours of crash recovery time for Spark while reading off Kinesis. Kinesis suffers from Throttling issues unlike Kafka and hence this patch would essentially reduce the amount of data requested from Kinesis. I would love to hear some thoughts from the committers and see if I can work on any improvements. 1. https://github.com/apache/spark/pull/18029 Best Regards, Yash