[ 
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490731#comment-15490731
 ] 

Jeff Nadler commented on SPARK-17510:
-------------------------------------

I consider this to be a separate issue from the backpressure bug (I'll file 
that one now).   Here's a bit of background:

We have a topic of user session start/end events.   We use updateStateByKey to 
keep track of session statistics for each user.  The processing is very 
lightweight, and we'd probably be fine with no maxRate at all.

We have another topic with user activity events.   We join the two together, 
enriching the activity events with information about the user's session.   
There's heavy processing on this stream, and new users are adopting the system 
at an incredible clip, so to ensure stability we need to use a maxRate.

However, the overall rates of the two streams vary substantially - the maxRate 
I'd like to use for user activities would cause us to fall behind on user 
sessions.    

The solution I've implemented - and I consider it to be a workaround at best - 
is to use Receivers for one and Direct Stream for the other.   This is a 
cheating way to get independent maxRate values, since Direct Streams use a 
different config value (and set maxRate per partition... but that's another 
story).

We definitely need maxRate *and* backpressure, because we're getting up to 45k 
events per second (and doing a lot of processing as I mentioned) and when we 
have an outage or upgrade the Kafka topic backs up very fast.   Without a 
maxRate, when we restart the application will ingest millions of records and 
crash quickly.   

I understand that this is a major change, the kind that would go in a major 
release.   IMO it should have been this way in the first place.    It looks to 
me like use cases where a Streaming app consumes multiple topics in parallel 
were not really considered, which is understandable, but probably worth a fix 
to cover more types of usage.   In general it would be great to have streaming 
properties set per-stream not in SparkConf.   

If this existed, I could work-around the backpressure bug by turning off 
backpressure for user session events.  

> Set Streaming MaxRate Independently For Multiple Streams
> --------------------------------------------------------
>
>                 Key: SPARK-17510
>                 URL: https://issues.apache.org/jira/browse/SPARK-17510
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 2.0.0
>            Reporter: Jeff Nadler
>
> We use multiple DStreams coming from different Kafka topics in a Streaming 
> application.
> Some settings like maxrate and backpressure enabled/disabled would be better 
> passed as config to KafkaUtils.createStream and 
> KafkaUtils.createDirectStream, instead of setting them in SparkConf.
> Being able to set a different maxrate for different streams is an important 
> requirement for us; we currently work-around the problem by using one 
> receiver-based stream and one direct stream.   
> We would like to be able to turn on backpressure for only one of the streams 
> as well.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to