[jira] [Commented] (SPARK-6249) Get Kafka offsets from consumer group in ZK when using direct stream

Cody Koeninger (JIRA) Tue, 10 Mar 2015 12:47:23 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355561#comment-14355561
 ]


Cody Koeninger commented on SPARK-6249:
---------------------------------------

First, I don't think setting group.id should have any effect on whether offsets 
are pulled from / saved to ZK.  They're two different things.

Second, I think this is another case where exposing an API to make it easy to 
get / set kafka's managed consumer offsets would allow people to solve this 
problem for themselves, in the way that makes sense for them.

We already have a way to specify the beginning offsets of the stream.  We 
already have code to get / set consumer offsets from Kafka, it's just not 
exposed.  By the time 1.4 is ready to release, hopefully we'll know whether 
we're ok exposing it.

I don't want to add more config options with confusing semantics around what is 
being used for the system of record for offsets, I'd rather make it easy for 
people to explicitly do what they need.

> Get Kafka offsets from consumer group in ZK when using direct stream
> --------------------------------------------------------------------
>
>                 Key: SPARK-6249
>                 URL: https://issues.apache.org/jira/browse/SPARK-6249
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Tathagata Das
>
> This is the proposal. 
> The simpler direct API (the one that does not take explicit offsets) can be 
> modified to also pick up the initial offset from ZK if group.id is specified. 
> This is exactly similar to how we find the latest or earliest offset in that 
> API, just that instead of latest/earliest offset of the topic we want to find 
> the offset from the consumer group. The group offsets is ZK is not used at 
> all for any further processing and restarting, so the exactly-once semantics 
> is not broken. 
> The use case where this is useful is simplified code upgrade. If the user 
> wants to upgrade the code, he/she can the context stop gracefully which will 
> ensure the ZK consumer group offset will be updated with the last offsets 
> processed. Then the new code is started (not restarted from checkpoint) can 
> pickup  the consumer group offset from ZK and continue where the previous 
> code had left off. 
> Without the functionality of picking up consumer group offsets to start (that 
> is, currently) the only way to do this is for the users to save the offsets 
> somewhere (file, database, etc.) and manage the offsets themselves. I just 
> want to simplify this process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6249) Get Kafka offsets from consumer group in ZK when using direct stream

Reply via email to