[ 
https://issues.apache.org/jira/browse/SPARK-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695929#comment-14695929
 ] 

Cody Koeninger commented on SPARK-6249:
---------------------------------------

https://github.com/koeninger/kafka-exactly-once

If you've tried the code, read the blog post, and watched the presentation
linked from that repo, and still have specific questions, feel free to ask.



> Get Kafka offsets from consumer group in ZK when using direct stream
> --------------------------------------------------------------------
>
>                 Key: SPARK-6249
>                 URL: https://issues.apache.org/jira/browse/SPARK-6249
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Tathagata Das
>
> This is the proposal. 
> The simpler direct API (the one that does not take explicit offsets) can be 
> modified to also pick up the initial offset from ZK if group.id is specified. 
> This is exactly similar to how we find the latest or earliest offset in that 
> API, just that instead of latest/earliest offset of the topic we want to find 
> the offset from the consumer group. The group offsets is ZK is not used at 
> all for any further processing and restarting, so the exactly-once semantics 
> is not broken. 
> The use case where this is useful is simplified code upgrade. If the user 
> wants to upgrade the code, he/she can the context stop gracefully which will 
> ensure the ZK consumer group offset will be updated with the last offsets 
> processed. Then the new code is started (not restarted from checkpoint) can 
> pickup  the consumer group offset from ZK and continue where the previous 
> code had left off. 
> Without the functionality of picking up consumer group offsets to start (that 
> is, currently) the only way to do this is for the users to save the offsets 
> somewhere (file, database, etc.) and manage the offsets themselves. I just 
> want to simplify this process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to