[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534529#comment-16534529
 ] 

Saisai Shao commented on SPARK-24723:
-------------------------------------

After discussed with Xiangrui offline, resource reservation is not the key 
focus here. Here the main problem is how to provide necessary information for 
barrier tasks to start MPI job in a password-less manner.

> Discuss necessary info and access in barrier mode + YARN
> --------------------------------------------------------
>
>                 Key: SPARK-24723
>                 URL: https://issues.apache.org/jira/browse/SPARK-24723
>             Project: Spark
>          Issue Type: Story
>          Components: ML, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + YARN. There were some 
> past attempts from the Hadoop community. So we should find someone with good 
> knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to