[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533205#comment-16533205
 ] 

Saisai Shao commented on SPARK-24723:
-------------------------------------

Hi [~mengxr], I would like to know the goal of this ticket? The goal of barrier 
scheduler is to offer gang semantics in the task scheduling level, whereas the 
gang semantics in the YARN level is more regarding to resource level.

I discussed with [~leftnoteasy] about the feasibility of supporting gang 
semantics on YARN. YARN has Reservation System which support gang like 
semantics (reserve requested resources), but it is not designed for gang. Here 
is some thoughts about supporting it on YARN 
[https://docs.google.com/document/d/1OA-iVwuHB8wlzwwlrEHOK6Q2SlKy3-5QEB5AmwMVEUU/edit?usp=sharing],
 I'm not sure if it aligns your goal of this ticket.

> Discuss necessary info and access in barrier mode + YARN
> --------------------------------------------------------
>
>                 Key: SPARK-24723
>                 URL: https://issues.apache.org/jira/browse/SPARK-24723
>             Project: Spark
>          Issue Type: Story
>          Components: ML, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + YARN. There were some 
> past attempts from the Hadoop community. So we should find someone with good 
> knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to