[ https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491165#comment-15491165 ]
Jeff Nadler commented on SPARK-17510: ------------------------------------- Yes you're right - it's partly about differing rates but the big issue is that the compute time is far higher on one stream vs the other. That's the only reason we need rate limiting at all, really. Overprovisioning would be expensive in this case. We have run at a higher maxRate and let backpressure manage it. It works, but it's suboptimal. During surges of data, the rate limiter will fluctuate widely up/down as scheduling delay builds up & is burned off. The algorithm is such that it does not ever level out, tho that's a separate opportunity for improvement in the backpressure impl. When these big fluctuations in inbound rate happen, the total throughput on a longer time period (per hour) is far lower than it would be if we just calibrate a maxRate that's reflective (or close) to what our cluster is capable of handling. Also just to be clear I understand that this is probably a long-haul change, and nothing that's going to solve our issues at this time. It seems to me like this would be the right long term direction, but you all who are committers may not agree. For sure I am starting to feel like I'm stacking up workarounds to make Spark Streaming viable for our particular use. In my little fantasyland, there would be separate "SparkConf" for global settings and "StreamConf" that can be passed to the various KafkaUtils.* functions to set the spark.streaming.* settings independently for each stream. > Set Streaming MaxRate Independently For Multiple Streams > -------------------------------------------------------- > > Key: SPARK-17510 > URL: https://issues.apache.org/jira/browse/SPARK-17510 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 2.0.0 > Reporter: Jeff Nadler > > We use multiple DStreams coming from different Kafka topics in a Streaming > application. > Some settings like maxrate and backpressure enabled/disabled would be better > passed as config to KafkaUtils.createStream and > KafkaUtils.createDirectStream, instead of setting them in SparkConf. > Being able to set a different maxrate for different streams is an important > requirement for us; we currently work-around the problem by using one > receiver-based stream and one direct stream. > We would like to be able to turn on backpressure for only one of the streams > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org