hehuiyuan commented on a change in pull request #23999: [docs]Add additional explanation for "Setting the max receiving rate" in streaming-programming-guide.md URL: https://github.com/apache/spark/pull/23999#discussion_r264024237
########## File path: docs/streaming-programming-guide.md ########## @@ -2036,7 +2036,7 @@ To run a Spark Streaming applications, you need to have the following. `spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition` for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that eliminate the need to set this rate limit, as Spark Streaming automatically figures out the - rate limits and dynamically adjusts them if the processing conditions change. This backpressure + rate limits and dynamically adjusts them if the processing conditions change.If the first batch of data is very large which causes the first batch is processing all the time and the task can not work normally , using a maximum rate limit can solve the problem .This backpressure Review comment: First of all,think you for your reply. The original document means that setting backpressure does not require to set this rate limit。However, In actual usage scenarios, such as spark streaming consuming kafka, the first batch of data is often very large, leading to the first batch has been processing, affecting the normal operation of tasks。Even the first batch of data is finished and it costs much more time than the batch time , the efficiency of processing subsequent batches is not as good as the efficiency of the first batch of data was processed in batch time then continue processing subsequent batches. In a word,i want to express setting backpressure is not need setting rate limit that is not rigorous . ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org