[ 
https://issues.apache.org/jira/browse/SPARK-29351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-29351:
-----------------------------------

    Assignee: DB Tsai

> Avoid full synchronization in ShuffleMapStage
> ---------------------------------------------
>
>                 Key: SPARK-29351
>                 URL: https://issues.apache.org/jira/browse/SPARK-29351
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.4.4
>         Environment: # 
>            Reporter: DB Tsai
>            Assignee: DB Tsai
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In one of our production streaming jobs that has more than 1k executors, and 
> each has 20 cores, Spark spends significant portion of time (30s) in sending 
> out the `ShuffeStatus`. We find there are two issues.
> # In driver's message loop, it's calling `serializedMapStatus` which is in 
> sync block. When the job scales really big, it can cause the contention.
> # When the job is big, the `MapStatus` is huge as well, the serialization 
> time and compression time is slow.
> This work aims to address the first problem.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to