[ https://issues.apache.org/jira/browse/SPARK-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487793#comment-14487793 ]
Tathagata Das commented on SPARK-6691: -------------------------------------- Yes, I complete agree to the motivation. There is a need for backpressure. I have a design myself but I had not written up a design doc. I think looking through your design it will help me get more concrete ideas and together we can come up with a good design. Thanks :) > Abstract and add a dynamic RateLimiter for Spark Streaming > ---------------------------------------------------------- > > Key: SPARK-6691 > URL: https://issues.apache.org/jira/browse/SPARK-6691 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 1.3.0 > Reporter: Saisai Shao > > Flow control (or rate control) for input data is very important in streaming > system, especially for Spark Streaming to keep stable and up-to-date. The > unexpected flood of incoming data or the high ingestion rate of input data > which beyond the computation power of cluster will make the system unstable > and increase the delay time. For Spark Streaming’s job generation and > processing pattern, this delay will be accumulated and introduce unacceptable > exceptions. > ---- > Currently in Spark Streaming’s receiver based input stream, there’s a > RateLimiter in BlockGenerator which controls the ingestion rate of input > data, but the current implementation has several limitations: > # The max ingestion rate is set by user through configuration beforehand, > user may lack the experience of how to set an appropriate value before the > application is running. > # This configuration is fixed through the life-time of application, which > means you need to consider the worst scenario to set a reasonable > configuration. > # Input stream like DirectKafkaInputStream need to maintain another solution > to achieve the same functionality. > # Lack of slow start control makes the whole system easily trapped into large > processing and scheduling delay at the very beginning. > ---- > So here we propose a new dynamic RateLimiter as well as the new interface for > the RateLimiter to better improve the whole system's stability. The target is: > * Dynamically adjust the ingestion rate according to processing rate of > previous finished jobs. > * Offer an uniform solution not only for receiver based input stream, but > also for direct stream like DirectKafkaInputStream and new ones. > * Slow start rate to control the network congestion when job is started. > * Pluggable framework to make the maintenance of extension more easy. > ---- > Here is the design doc > (https://docs.google.com/document/d/1lqJDkOYDh_9hRLQRwqvBXcbLScWPmMa7MlG8J_TE93w/edit?usp=sharing) > and working branch > (https://github.com/jerryshao/apache-spark/tree/dynamic-rate-limiter). > Any comment would be greatly appreciated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org