[ 
https://issues.apache.org/jira/browse/SPARK-24581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554409#comment-16554409
 ] 

Jiang Xingbo commented on SPARK-24581:
--------------------------------------

Design doc: 
https://docs.google.com/document/d/1r07-vU5JTH6s1jJ6azkmK0K5it6jwpfO6b_K3mJmxR4/edit?usp=sharing

> Design: BarrierTaskContext.barrier()
> ------------------------------------
>
>                 Key: SPARK-24581
>                 URL: https://issues.apache.org/jira/browse/SPARK-24581
>             Project: Spark
>          Issue Type: Story
>          Components: ML, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Jiang Xingbo
>            Priority: Major
>
> We need to provide a communication barrier function to users to help 
> coordinate tasks within a barrier stage. This is very similar to MPI_Barrier 
> function in MPI. This story is for its design.
>  
> Requirements:
>  * Low-latency. The tasks should be unblocked soon after all tasks have 
> reached this barrier. The latency is more important than CPU cycles here.
>  * Support unlimited timeout with proper logging. For DL tasks, it might take 
> very long to converge, we should support unlimited timeout with proper 
> logging. So users know why a task is waiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to