[ 
https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790804#comment-14790804
 ] 

Sudarshan Kadambi commented on SPARK-10320:
-------------------------------------------

Sure, a function as proposed that allows for the topic, partitions and offsets 
to be specified in a fine grained manner is needed to provide the full 
flexbility we desire (starting at an arbitrary offset within each topic 
partition). If separate DStreams are desired for each topic, you intend for 
createDirectStream to be called multiple times (with a different subscription 
topic each time) both before and after the streaming context is started? 

Also, what kind of defaults did you have in mind? For e.g. I might require the 
ability to specify new topics after the streaming context is started but might 
not want the burden of being aware of the partitions within the topic or the 
offsets. I might simply want to default to either the start or the end of each 
partition that exists for that topic.

> Kafka Support new topic subscriptions without requiring restart of the 
> streaming context
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-10320
>                 URL: https://issues.apache.org/jira/browse/SPARK-10320
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe 
> to current ones once the streaming context has been started. Restarting the 
> streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the 
> topics is no longer needed in streaming analytics and hence should be 
> dropped. We could do this by stopping the streaming context, removing that 
> topic from the topic list and restarting the streaming context. Since with 
> some DStreams such as DirectKafkaStream, the per-partition offsets are 
> maintained by Spark, we should be able to resume uninterrupted (I think?) 
> from where we left off with a minor delay. However, in instances where 
> expensive state initialization (from an external datastore) may be needed for 
> datasets published to all topics, before streaming updates can be applied to 
> it, it is more convenient to only subscribe or unsubcribe to the incremental 
> changes to the topic list. Without such a feature, updates go unprocessed for 
> longer than they need to be, thus affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to