[ 
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491165#comment-15491165
 ] 

Jeff Nadler commented on SPARK-17510:
-------------------------------------


Yes you're right - it's partly about differing rates but the big issue is that 
the compute time is far higher on one stream vs the other.   That's the only 
reason we need rate limiting at all, really.    Overprovisioning would be 
expensive in this case.

We have run at a higher maxRate and let backpressure manage it.   It works, but 
it's suboptimal.   During surges of data, the rate limiter will fluctuate 
widely up/down as scheduling delay builds up & is burned off.   The algorithm 
is such that it does not ever level out, tho that's a separate opportunity for 
improvement in the backpressure impl.   When these big fluctuations in inbound 
rate happen, the total throughput on a longer time period (per hour) is far 
lower than it would be if we just calibrate a maxRate that's reflective (or 
close) to what our cluster is capable of handling.

Also just to be clear I understand that this is probably a long-haul change, 
and nothing that's going to solve our issues at this time.  It seems to me like 
this would be the right long term direction, but you all who are committers may 
not agree.   For sure I am starting to feel like I'm stacking up workarounds to 
make Spark Streaming viable for our particular use.

In my little fantasyland, there would be separate "SparkConf" for global 
settings and "StreamConf" that can be passed to the various KafkaUtils.* 
functions to set the spark.streaming.* settings independently for each stream.

> Set Streaming MaxRate Independently For Multiple Streams
> --------------------------------------------------------
>
>                 Key: SPARK-17510
>                 URL: https://issues.apache.org/jira/browse/SPARK-17510
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 2.0.0
>            Reporter: Jeff Nadler
>
> We use multiple DStreams coming from different Kafka topics in a Streaming 
> application.
> Some settings like maxrate and backpressure enabled/disabled would be better 
> passed as config to KafkaUtils.createStream and 
> KafkaUtils.createDirectStream, instead of setting them in SparkConf.
> Being able to set a different maxrate for different streams is an important 
> requirement for us; we currently work-around the problem by using one 
> receiver-based stream and one direct stream.   
> We would like to be able to turn on backpressure for only one of the streams 
> as well.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to