[ 
https://issues.apache.org/jira/browse/SPARK-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595374#comment-14595374
 ] 

Tathagata Das commented on SPARK-7398:
--------------------------------------

I took a look at the whole design doc. Its very well composed, but the actual 
details on how the actual code changes is a little unclear. I Now that you have 
a working branch, I strongly recommend doing another additional design doc 
which skips all the intro and background, and just focuses on the code changes. 

Here is a design doc for inspiration. This is original design doc for the Write 
Ahead Log. 
https://docs.google.com/document/d/1vTCB5qVfyxQPlHuv8rit9-zjdttlgaSrMgfCDQlCJIM/edit#heading=h.9xoxtbgz551y
See the architecture and proposed implementation section. Accordingly you 
should have the following two sections 

1. Use diagrams to explain the high-level control flow in the architecture with 
new classes in the picture and how they interoperate/interface with existing 
classes (BTW, high-level = not as detailed at the control flow that you have in 
the earlier design doc).
2. The details of every class and interface that needs to be introduced or 
modified. Especially focus on the interfaces for - (1) the heuristic algorithm, 
(2) the congestion control.

This will allow me and others to evaluate the architecture more critically. 

Then if needed we can break up the task into smaller smaller sub-tasks (as done 
in the case of the WAL JIRA - 
https://issues.apache.org/jira/browse/SPARK-3129). 






> Add back-pressure to Spark Streaming
> ------------------------------------
>
>                 Key: SPARK-7398
>                 URL: https://issues.apache.org/jira/browse/SPARK-7398
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.1
>            Reporter: François Garillot
>              Labels: streams
>
> Spark Streaming has trouble dealing with situations where 
>  batch processing time > batch interval
> Meaning a high throughput of input data w.r.t. Spark's ability to remove data 
> from the queue.
> If this throughput is sustained for long enough, it leads to an unstable 
> situation where the memory of the Receiver's Executor is overflowed.
> This aims at transmitting a back-pressure signal back to data ingestion to 
> help with dealing with that high throughput, in a backwards-compatible way.
> The design doc can be found here:
> https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to