[ https://issues.apache.org/jira/browse/SPARK-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697188#comment-14697188 ]
Cody Koeninger commented on SPARK-6249: --------------------------------------- If you want an api that has imprecise semantics and stores stuff in ZK, use the receiver based stream. This ticket has been closed for a while, I'd suggest further discussion would be better suited for the mailing list. > Get Kafka offsets from consumer group in ZK when using direct stream > -------------------------------------------------------------------- > > Key: SPARK-6249 > URL: https://issues.apache.org/jira/browse/SPARK-6249 > Project: Spark > Issue Type: Improvement > Components: Streaming > Reporter: Tathagata Das > > This is the proposal. > The simpler direct API (the one that does not take explicit offsets) can be > modified to also pick up the initial offset from ZK if group.id is specified. > This is exactly similar to how we find the latest or earliest offset in that > API, just that instead of latest/earliest offset of the topic we want to find > the offset from the consumer group. The group offsets is ZK is not used at > all for any further processing and restarting, so the exactly-once semantics > is not broken. > The use case where this is useful is simplified code upgrade. If the user > wants to upgrade the code, he/she can the context stop gracefully which will > ensure the ZK consumer group offset will be updated with the last offsets > processed. Then the new code is started (not restarted from checkpoint) can > pickup the consumer group offset from ZK and continue where the previous > code had left off. > Without the functionality of picking up consumer group offsets to start (that > is, currently) the only way to do this is for the users to save the offsets > somewhere (file, database, etc.) and manage the offsets themselves. I just > want to simplify this process. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org