Hi Wei,

On Tue, Aug 19, 2014 at 10:18 AM, Wei Liu <wei....@stellarloyalty.com>
wrote:
>
> Since our application cannot tolerate losing customer data, I am wondering
> what is the best way for us to address this issue.
> 1) We are thinking writing application specific logic to address the data
> loss. To us, the problem seems to be caused by that Kinesis receivers
> advanced their checkpoint before we know for sure the data is replicated.
> For example, we can do another checkpoint ourselves to remember the kinesis
> sequence number for data that has been processed by spark streaming. When
> Kinesis receiver is restarted due to worker failures, we restarted it from
> the checkpoint we tracked.
>

This sounds pretty much to me like the way Kafka does it. So, I am not
saying that the stock KafkaReceiver does what you want (it may or may not),
but it should be possible to update the "offset" (corresponds to "sequence
number") in Zookeeper only after data has been replicated successfully. I
guess "replace Kinesis by Kafka" is not in option for you, but you may
consider pulling Kinesis data into Kafka before processing with Spark?

Tobias

Reply via email to