GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/3655
[SPARK-4704][STREAMING] Reliable Kafka Receiver can lose data if the blo... ...ck generator fails to store data. The Reliable Kafka Receiver commits offsets only when events are actually stored, which ensures that on restart we will actually start where we left off. But if the failure happens in the store() call, and the block generator reports an error the receiver does not do anything and will continue reading from the current offset and not the last commit. This means that messages between the last commit and the current offset will be lost. This PR retries the store call four times and then stops the receiver with an error message and the last exception that was received from the store. You can merge this pull request into a Git repository by running: $ git pull https://github.com/harishreedharan/spark kafka-failure-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3655.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3655 ---- commit 5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0 Author: Hari Shreedharan <hshreedha...@apache.org> Date: 2014-12-10T01:44:39Z [SPARK-4704][STREAMING] Reliable Kafka Receiver can lose data if the block generator fails to store data. The Reliable Kafka Receiver commits offsets only when events are actually stored, which ensures that on restart we will actually start where we left off. But if the failure happens in the store() call, and the block generator reports an error the receiver does not do anything and will continue reading from the current offset and not the last commit. This means that messages between the last commit and the current offset will be lost. This PR retries the store call four times and then stops the receiver with an error message and the last exception that was received from the store. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org