If you want finer-grained max rate setting, SPARK-17510 got merged a while ago. There's also SPARK-18580 which might help address the issue of starting backpressure rate for the first batch.
On Mon, Dec 5, 2016 at 4:18 PM, Liren Ding <sky.gonna.bri...@gmail.com> wrote: > Hey all, > > Does backressure actually work on spark kafka streaming? According to the > latest spark streaming document: > http://spark.apache.org/docs/latest/streaming-programming-guide.html > "In Spark 1.5, we have introduced a feature called backpressure that > eliminate the need to set this rate limit, as Spark Streaming automatically > figures out the rate limits and dynamically adjusts them if the processing > conditions change. This backpressure can be enabled by setting the > configuration parameter spark.streaming.backpressure.enabled to true." > But I also see a few open spark jira tickets on this option: > https://issues.apache.org/jira/browse/SPARK-7398 > https://issues.apache.org/jira/browse/SPARK-18371 > > The case in the second ticket describes a similar issue as we have here. We > use Kafka to send large batches (10~100M) to spark streaming, and the spark > streaming interval is set to 1~4 minutes. With the backpressure set to true, > the queued active batches still pile up when average batch processing time > takes longer than default interval. After the spark driver is restarted, all > queued batches turn to a giant batch, which block subsequent batches and > also have a great chance to fail eventually. The only config we found that > might help is "spark.streaming.kafka.maxRatePerPartition". It does limit the > incoming batch size, but not a perfect solution since it depends on size of > partition as well as the length of batch interval. For our case, hundreds of > partitions X minutes of interval still produce a number that is too large > for each batch. So we still want to figure out how to make the backressure > work in spark kafka streaming, if it is supposed to work there. Thanks. > > > Liren > > > > > > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org