Github user akonopko commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r166606906 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -126,7 +129,10 @@ private[spark] class DirectKafkaInputDStream[K, V]( protected[streaming] def maxMessagesPerPartition( offsets: Map[TopicPartition, Long]): Option[Map[TopicPartition, Long]] = { - val estimatedRateLimit = rateController.map(_.getLatestRate()) + val estimatedRateLimit = rateController.map(x => { + val lr = x.getLatestRate() + if (lr > 0) lr else initialRate --- End diff -- If somehow cluster was so heavily loaded with other processes that could process 0 events in Spark Streaming, this means that we might have huge backlog after that. Which mean without this fix system has big chance of overflowing
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org