[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting

2016-11-02 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629332#comment-15629332
 ] 

Cody Koeninger commented on SPARK-17938:


Direct stream isn't a receiver, receiver settings don't apply to it.

> Backpressure rate not adjusting
> ---
>
> Key: SPARK-17938
> URL: https://issues.apache.org/jira/browse/SPARK-17938
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Samy Dindane
>
> spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same 
> behavior with 2.0.0 though.
> spark.streaming.kafka.consumer.poll.ms is set to 3
> spark.streaming.kafka.maxRatePerPartition is set to 10
> spark.streaming.backpressure.enabled is set to true
> `batchDuration` of the streaming context is set to 1 second.
> I consume a Kafka topic using KafkaUtils.createDirectStream().
> My system can handle 100k records batches, but it'd take more than 1 seconds 
> to process them all. I'd thus expect the backpressure to reduce the number of 
> records that would be fetched in the next batch to keep the processing delay 
> inferior to 1 second.
> Only this does not happen and the rate of the backpressure stays the same: 
> stuck in `100.0`, no matter how the other variables change (processing time, 
> error, etc.).
> Here's a log showing how all these variables change but the chosen rate stays 
> the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I 
> would have attached a file but I don't see how).
> Is this the expected behavior and I am missing something, or is this  a bug?
> I'll gladly help by providing more information or writing code if necessary.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting

2016-11-02 Thread Fabien LD (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629049#comment-15629049
 ] 

Fabien LD commented on SPARK-17938:
---

For us, backpressure works fine with:
- spark 2.0.1
- kafka 0.10
- scala 2.11

Here are the settings we have:
```
spark.streaming.backpressure.initialRate=1000
spark.streaming.receiver.maxRate=1
spark.streaming.kafka.maxRatePerPartition=2000
```
The only problem we have is with `spark.streaming.receiver.maxRate`

Streaming starts at 2000 events per partition (processed in ~ 4-5s after a few 
initial batches) and then backpressure starts doing its job. But the actual 
average rate is then around 25000 records per second (a bit more than 5 
records per 2s batch) which is higher than the configured maxRate. Later, 
because we have catched up with live record production in kafka, it drops to a 
lower rate.

=> maxRate is never applied and this is a problem because this would allow us 
to control warm up and increase the maxRatePerPartition (that we have to keep 
low for the initial few batches, before backpressure starts doing its job) and 
process spikes happening later in some topics.

> Backpressure rate not adjusting
> ---
>
> Key: SPARK-17938
> URL: https://issues.apache.org/jira/browse/SPARK-17938
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Samy Dindane
>
> spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same 
> behavior with 2.0.0 though.
> spark.streaming.kafka.consumer.poll.ms is set to 3
> spark.streaming.kafka.maxRatePerPartition is set to 10
> spark.streaming.backpressure.enabled is set to true
> `batchDuration` of the streaming context is set to 1 second.
> I consume a Kafka topic using KafkaUtils.createDirectStream().
> My system can handle 100k records batches, but it'd take more than 1 seconds 
> to process them all. I'd thus expect the backpressure to reduce the number of 
> records that would be fetched in the next batch to keep the processing delay 
> inferior to 1 second.
> Only this does not happen and the rate of the backpressure stays the same: 
> stuck in `100.0`, no matter how the other variables change (processing time, 
> error, etc.).
> Here's a log showing how all these variables change but the chosen rate stays 
> the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I 
> would have attached a file but I don't see how).
> Is this the expected behavior and I am missing something, or is this  a bug?
> I'll gladly help by providing more information or writing code if necessary.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting

2016-10-15 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578133#comment-15578133
 ] 

Cody Koeninger commented on SPARK-17938:


There was pretty extensive discussion of this on list, should link or summarize 
it.

Couple of things here:

 100 is the default minimum rate for pidestimator.  If you're willing to write 
code, put more logging in to determine why that rate isn't being configured, or 
hardcode it to a different number. I have successfully adjusted that rate using 
spark configuration.

The other thing is that if your system takes way longer than 1 second to 
process 100k records, 100k obviously isn't a reasonable max. Many large batches 
will be defined during the time that first batch is running, before back 
pressure is involved at all. Try a lower max.

> Backpressure rate not adjusting
> ---
>
> Key: SPARK-17938
> URL: https://issues.apache.org/jira/browse/SPARK-17938
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Samy Dindane
>
> spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same 
> behavior with 2.0.0 though.
> spark.streaming.kafka.consumer.poll.ms is set to 3
> spark.streaming.kafka.maxRatePerPartition is set to 10
> spark.streaming.backpressure.enabled is set to true
> `batchDuration` of the streaming context is set to 1 second.
> I consume a Kafka topic using KafkaUtils.createDirectStream().
> My system can handle 100k records batches, but it'd take more than 1 seconds 
> to process them all. I'd thus expect the backpressure to reduce the number of 
> records that would be fetched in the next batch to keep the processing delay 
> inferior to 1 second.
> Only this does not happen and the rate of the backpressure stays the same: 
> stuck in `100.0`, no matter how the other variables change (processing time, 
> error, etc.).
> Here's a log showing how all these variables change but the chosen rate stays 
> the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I 
> would have attached a file but I don't see how).
> Is this the expected behavior and I am missing something, or is this  a bug?
> I'll gladly help by providing more information or writing code if necessary.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org