[ 
https://issues.apache.org/jira/browse/SPARK-56664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56664:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add a new shuffle implementation to support Real-time Mode (RTM)
> ----------------------------------------------------------------
>
>                 Key: SPARK-56664
>                 URL: https://issues.apache.org/jira/browse/SPARK-56664
>             Project: Spark
>          Issue Type: Epic
>          Components: Structured Streaming
>    Affects Versions: 4.2.0
>            Reporter: Boyang Jerry Peng
>            Priority: Major
>              Labels: pull-request-available
>
> The streaming shuffle is an alternative {{ShuffleManager}} implementation 
> designed for low-latency, continuously-running queries (for example, 
> real-time mode in Structured Streaming). Unlike the default sort-based 
> shuffle, it does not materialize map outputs to disk and does not require map 
> tasks to finish before reduce tasks can start. Instead, each map task hosts a 
> network server that pushes records to reduce tasks as they are produced; 
> reduce tasks open clients to those servers and consume records as a stream.
>  
> Design doc:
>  
> https://github.com/jerrypeng/spark/blob/1fe0abd72a317e8f73df1406966f8f49b24e8fd1/docs/streaming-shuffle.md



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to