GitHub user harishreedharan opened a pull request:

    https://github.com/apache/spark/pull/3655

    [SPARK-4704][STREAMING] Reliable Kafka Receiver can lose data if the blo...

    ...ck generator fails to store data.
    
    The Reliable Kafka Receiver commits offsets only when events are actually 
stored, which ensures that on restart we will actually start where we left off. 
But if the failure happens in the store() call, and the block generator reports 
an error the receiver does not do anything and will continue reading from the 
current offset and not the last commit. This means that messages between the 
last commit and the current offset will be lost.
    
    This PR retries the store call four times and then stops the receiver with 
an error message and the last exception that was received from the store.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harishreedharan/spark kafka-failure-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3655
    
----
commit 5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-12-10T01:44:39Z

    [SPARK-4704][STREAMING] Reliable Kafka Receiver can lose data if the block 
generator fails to store data.
    
    The Reliable Kafka Receiver commits offsets only when events are actually 
stored, which ensures that on restart we will actually start where we left off. 
But if the failure happens in the store() call, and the block generator reports 
an error the receiver does not do anything and will continue reading from the 
current offset and not the last commit. This means that messages between the 
last commit and the current offset will be lost.
    
    This PR retries the store call four times and then stops the receiver with 
an error message and the last exception that was received from the store.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to