[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting
[ https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629332#comment-15629332 ] Cody Koeninger commented on SPARK-17938: Direct stream isn't a receiver, receiver settings don't apply to it. > Backpressure rate not adjusting > --- > > Key: SPARK-17938 > URL: https://issues.apache.org/jira/browse/SPARK-17938 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0, 2.0.1 >Reporter: Samy Dindane > > spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same > behavior with 2.0.0 though. > spark.streaming.kafka.consumer.poll.ms is set to 3 > spark.streaming.kafka.maxRatePerPartition is set to 10 > spark.streaming.backpressure.enabled is set to true > `batchDuration` of the streaming context is set to 1 second. > I consume a Kafka topic using KafkaUtils.createDirectStream(). > My system can handle 100k records batches, but it'd take more than 1 seconds > to process them all. I'd thus expect the backpressure to reduce the number of > records that would be fetched in the next batch to keep the processing delay > inferior to 1 second. > Only this does not happen and the rate of the backpressure stays the same: > stuck in `100.0`, no matter how the other variables change (processing time, > error, etc.). > Here's a log showing how all these variables change but the chosen rate stays > the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I > would have attached a file but I don't see how). > Is this the expected behavior and I am missing something, or is this a bug? > I'll gladly help by providing more information or writing code if necessary. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting
[ https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629049#comment-15629049 ] Fabien LD commented on SPARK-17938: --- For us, backpressure works fine with: - spark 2.0.1 - kafka 0.10 - scala 2.11 Here are the settings we have: ``` spark.streaming.backpressure.initialRate=1000 spark.streaming.receiver.maxRate=1 spark.streaming.kafka.maxRatePerPartition=2000 ``` The only problem we have is with `spark.streaming.receiver.maxRate` Streaming starts at 2000 events per partition (processed in ~ 4-5s after a few initial batches) and then backpressure starts doing its job. But the actual average rate is then around 25000 records per second (a bit more than 5 records per 2s batch) which is higher than the configured maxRate. Later, because we have catched up with live record production in kafka, it drops to a lower rate. => maxRate is never applied and this is a problem because this would allow us to control warm up and increase the maxRatePerPartition (that we have to keep low for the initial few batches, before backpressure starts doing its job) and process spikes happening later in some topics. > Backpressure rate not adjusting > --- > > Key: SPARK-17938 > URL: https://issues.apache.org/jira/browse/SPARK-17938 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0, 2.0.1 >Reporter: Samy Dindane > > spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same > behavior with 2.0.0 though. > spark.streaming.kafka.consumer.poll.ms is set to 3 > spark.streaming.kafka.maxRatePerPartition is set to 10 > spark.streaming.backpressure.enabled is set to true > `batchDuration` of the streaming context is set to 1 second. > I consume a Kafka topic using KafkaUtils.createDirectStream(). > My system can handle 100k records batches, but it'd take more than 1 seconds > to process them all. I'd thus expect the backpressure to reduce the number of > records that would be fetched in the next batch to keep the processing delay > inferior to 1 second. > Only this does not happen and the rate of the backpressure stays the same: > stuck in `100.0`, no matter how the other variables change (processing time, > error, etc.). > Here's a log showing how all these variables change but the chosen rate stays > the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I > would have attached a file but I don't see how). > Is this the expected behavior and I am missing something, or is this a bug? > I'll gladly help by providing more information or writing code if necessary. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting
[ https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578133#comment-15578133 ] Cody Koeninger commented on SPARK-17938: There was pretty extensive discussion of this on list, should link or summarize it. Couple of things here: 100 is the default minimum rate for pidestimator. If you're willing to write code, put more logging in to determine why that rate isn't being configured, or hardcode it to a different number. I have successfully adjusted that rate using spark configuration. The other thing is that if your system takes way longer than 1 second to process 100k records, 100k obviously isn't a reasonable max. Many large batches will be defined during the time that first batch is running, before back pressure is involved at all. Try a lower max. > Backpressure rate not adjusting > --- > > Key: SPARK-17938 > URL: https://issues.apache.org/jira/browse/SPARK-17938 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Samy Dindane > > spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same > behavior with 2.0.0 though. > spark.streaming.kafka.consumer.poll.ms is set to 3 > spark.streaming.kafka.maxRatePerPartition is set to 10 > spark.streaming.backpressure.enabled is set to true > `batchDuration` of the streaming context is set to 1 second. > I consume a Kafka topic using KafkaUtils.createDirectStream(). > My system can handle 100k records batches, but it'd take more than 1 seconds > to process them all. I'd thus expect the backpressure to reduce the number of > records that would be fetched in the next batch to keep the processing delay > inferior to 1 second. > Only this does not happen and the rate of the backpressure stays the same: > stuck in `100.0`, no matter how the other variables change (processing time, > error, etc.). > Here's a log showing how all these variables change but the chosen rate stays > the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I > would have attached a file but I don't see how). > Is this the expected behavior and I am missing something, or is this a bug? > I'll gladly help by providing more information or writing code if necessary. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org