[jira] [Commented] (SPARK-10320) Support new topic subscriptions without requiring restart of the streaming context

Cody Koeninger (JIRA) Mon, 31 Aug 2015 14:25:58 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724074#comment-14724074
 ]


Cody Koeninger commented on SPARK-10320:
----------------------------------------

" If you restart the job and specify a new offset, that is where consumption 
should start, in effect overriding any saved offsets."

That's not the way checkpoints work.  You're either restarting from a 
checkpoint, or you're not restarting from a checkpoint, the decision is up to 
you.  If you want to specify a new offset, start the job clean.

"The topic changes happen in the same thread of execution where the initial 
list of topics was provided before starting the streaming context."

Can you say a little more about what you're actually doing here?  How do you 
know when topics need to be modified?  Typically streaming jobs just call 
ssc.awaitTermination in their main thread, which seems incompatible with what 
you're describing.

> Support new topic subscriptions without requiring restart of the streaming 
> context
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-10320
>                 URL: https://issues.apache.org/jira/browse/SPARK-10320
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe 
> to current ones once the streaming context has been started. Restarting the 
> streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the 
> topics is no longer needed in streaming analytics and hence should be 
> dropped. We could do this by stopping the streaming context, removing that 
> topic from the topic list and restarting the streaming context. Since with 
> some DStreams such as DirectKafkaStream, the per-partition offsets are 
> maintained by Spark, we should be able to resume uninterrupted (I think?) 
> from where we left off with a minor delay. However, in instances where 
> expensive state initialization (from an external datastore) may be needed for 
> datasets published to all topics, before streaming updates can be applied to 
> it, it is more convenient to only subscribe or unsubcribe to the incremental 
> changes to the topic list. Without such a feature, updates go unprocessed for 
> longer than they need to be, thus affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10320) Support new topic subscriptions without requiring restart of the streaming context

Reply via email to