[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161891#comment-14161891 ]
Thomas Graves commented on SPARK-3797: -------------------------------------- Nice write up Sandy. Just to point out on the rolling upgrade part, it would interfere, but you could make the shuffle more tolerant of it. Like what they did in MAPREDUCE-5891 for MR. There is also a jira to eventually move the Aux Services out of the nodemanager and to have it be more dynamically started as needed, but as far as I know no one is actively working on that as its pretty complicated. Personally I see running a shuffle handler in another container along side the executor much more wasteful in resources and really hard right now (because of the problems you state). You have to change it to make sure you get a container on the same node (the scheduler logic is much more complex), it definitely could affect startup time, if you have a busy cluster and need 2 containers on that same node you have to deal with things like what if there is only room for 1, what if you have multiple executors on the same node, are you going to start up one for each executor or figure out how to share, etc. At least for now it seems to me the Aux service and also still supporting running it directly in the executor like now would be the most straight forward approach. If we do the aux service right then hopefully changing how it is deployed wouldn't be much work once we have more of the YARN features Sandy mentioned. > Run the shuffle service inside the YARN NodeManager as an AuxiliaryService > -------------------------------------------------------------------------- > > Key: SPARK-3797 > URL: https://issues.apache.org/jira/browse/SPARK-3797 > Project: Spark > Issue Type: Sub-task > Components: YARN > Reporter: Patrick Wendell > Assignee: Andrew Or > > It's also worth considering running the shuffle service in a YARN container > beside the executor(s) on each node. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org