[ 
https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162254#comment-14162254
 ] 

Andrew Or commented on SPARK-3797:
----------------------------------

Thanks for detailing the considerations Sandy. I agree with every single one of 
the drawbacks you listed.

The alternative of launching the shuffle service inside containers has been 
given much thought. However, it will be overkill if we allocate one such 
service for each executor or even application. In general, these services are 
intended to be long-running local resource managers that are really more suited 
to be run per-node. As you suggested, these services tend to have low memory 
requirements and would be forced to take up more than what is needed.

For the rolling upgrades point, we can add some logic as in MR to handle short 
outages as Tom suggested. The dependency and deployment stories are a little 
harder to workaround. I think the point here is that either way we need to 
offer an alternative of running it independently of the NM in case the cluster 
has conflicting dependencies. Perhaps we'll need some 
`start-shuffle-service.sh` script to launch these containers on all nodes 
before running any actual Spark application. I should note that our shuffle 
service is intended to be fairly lightweight and will have very limited 
dependencies (e.g. we are considering building it with Java because we don't 
want to bundle Scala). Hopefully that mitigates the issue.

> Run the shuffle service inside the YARN NodeManager as an AuxiliaryService
> --------------------------------------------------------------------------
>
>                 Key: SPARK-3797
>                 URL: https://issues.apache.org/jira/browse/SPARK-3797
>             Project: Spark
>          Issue Type: Sub-task
>          Components: YARN
>            Reporter: Patrick Wendell
>            Assignee: Andrew Or
>
> It's also worth considering running the shuffle service in a YARN container 
> beside the executor(s) on each node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to