[ https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490731#comment-15490731 ]
Jeff Nadler commented on SPARK-17510: ------------------------------------- I consider this to be a separate issue from the backpressure bug (I'll file that one now). Here's a bit of background: We have a topic of user session start/end events. We use updateStateByKey to keep track of session statistics for each user. The processing is very lightweight, and we'd probably be fine with no maxRate at all. We have another topic with user activity events. We join the two together, enriching the activity events with information about the user's session. There's heavy processing on this stream, and new users are adopting the system at an incredible clip, so to ensure stability we need to use a maxRate. However, the overall rates of the two streams vary substantially - the maxRate I'd like to use for user activities would cause us to fall behind on user sessions. The solution I've implemented - and I consider it to be a workaround at best - is to use Receivers for one and Direct Stream for the other. This is a cheating way to get independent maxRate values, since Direct Streams use a different config value (and set maxRate per partition... but that's another story). We definitely need maxRate *and* backpressure, because we're getting up to 45k events per second (and doing a lot of processing as I mentioned) and when we have an outage or upgrade the Kafka topic backs up very fast. Without a maxRate, when we restart the application will ingest millions of records and crash quickly. I understand that this is a major change, the kind that would go in a major release. IMO it should have been this way in the first place. It looks to me like use cases where a Streaming app consumes multiple topics in parallel were not really considered, which is understandable, but probably worth a fix to cover more types of usage. In general it would be great to have streaming properties set per-stream not in SparkConf. If this existed, I could work-around the backpressure bug by turning off backpressure for user session events. > Set Streaming MaxRate Independently For Multiple Streams > -------------------------------------------------------- > > Key: SPARK-17510 > URL: https://issues.apache.org/jira/browse/SPARK-17510 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 2.0.0 > Reporter: Jeff Nadler > > We use multiple DStreams coming from different Kafka topics in a Streaming > application. > Some settings like maxrate and backpressure enabled/disabled would be better > passed as config to KafkaUtils.createStream and > KafkaUtils.createDirectStream, instead of setting them in SparkConf. > Being able to set a different maxrate for different streams is an important > requirement for us; we currently work-around the problem by using one > receiver-based stream and one direct stream. > We would like to be able to turn on backpressure for only one of the streams > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org