[jira] [Commented] (SPARK-10320) Support new topic subscriptions without requiring restart of the streaming context

Cody Koeninger (JIRA) Thu, 27 Aug 2015 12:16:25 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717342#comment-14717342
 ]


Cody Koeninger commented on SPARK-10320:
----------------------------------------

As I said on the list, the best way to deal with this currently is start a new 
app with your new code, before stopping the old app.

In terms of a potential feature addition, I think there are a number of 
questions that would need to be cleared up... e.g.

- when would you change topics?  During a streaming listener onbatch completed 
handler?  From a separate thread?

- when adding a topic, what would the expectations around starting offset be?  
As in the current api, provide explicit offsets per partition, start at 
beginning, or start at end?

- if you add partitions for topics that currently exist, and specify a starting 
offset that's different from where the job is currently, what would the 
expectation be?
- if you add, later remove, then later re-add a topic, what would the 
expectation regarding saved checkpoints be?

> Support new topic subscriptions without requiring restart of the streaming 
> context
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-10320
>                 URL: https://issues.apache.org/jira/browse/SPARK-10320
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe 
> to current ones once the streaming context has been started. Restarting the 
> streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the 
> topics is no longer needed in streaming analytics and hence should be 
> dropped. We could do this by stopping the streaming context, removing that 
> topic from the topic list and restarting the streaming context. Since with 
> some DStreams such as DirectKafkaStream, the per-partition offsets are 
> maintained by Spark, we should be able to resume uninterrupted (I think?) 
> from where we left off with a minor delay. However, in instances where 
> expensive state initialization (from an external datastore) may be needed for 
> datasets published to all topics, before streaming updates can be applied to 
> it, it is more convenient to only subscribe or unsubcribe to the incremental 
> changes to the topic list. Without such a feature, updates go unprocessed for 
> longer than they need to be, thus affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10320) Support new topic subscriptions without requiring restart of the streaming context

Reply via email to