[ 
https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161891#comment-14161891
 ] 

Thomas Graves commented on SPARK-3797:
--------------------------------------

Nice write up Sandy.  

Just to point out on the rolling upgrade part, it would interfere, but you 
could make the shuffle more tolerant of it.  Like what they did in 
MAPREDUCE-5891 for MR.  

There is also a jira to eventually move the Aux Services out of the nodemanager 
and to have it be more dynamically started as needed, but as far as I know no 
one is actively working on that as its pretty complicated.

Personally I see running a shuffle handler in another container along side the 
executor much more wasteful in resources and really hard right now (because of 
the problems you state).  You have to change it to make sure you get a 
container on the same node (the scheduler logic is much more complex), it 
definitely could affect startup time,  if you have a busy cluster and need 2 
containers on that same node you have to deal with things like what if there is 
only room for 1, what if you have multiple executors on the same node, are you 
going to start up one for each executor or figure out how to share, etc.   

At least for now it seems to me the Aux service and also still supporting 
running it directly in the executor like now would be the most straight forward 
approach.  If we do the aux service right then hopefully changing how it is 
deployed wouldn't be much work once we have more of the YARN features Sandy 
mentioned.

> Run the shuffle service inside the YARN NodeManager as an AuxiliaryService
> --------------------------------------------------------------------------
>
>                 Key: SPARK-3797
>                 URL: https://issues.apache.org/jira/browse/SPARK-3797
>             Project: Spark
>          Issue Type: Sub-task
>          Components: YARN
>            Reporter: Patrick Wendell
>            Assignee: Andrew Or
>
> It's also worth considering running the shuffle service in a YARN container 
> beside the executor(s) on each node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to