[ 
https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492745#comment-16492745
 ] 

Wenchen Fan commented on SPARK-24375:
-------------------------------------

For the PySpark side, we don't need to care about the scheduler stuff, because 
PySpark driver connects to a JVM driver, and all the schedule stuff is done in 
the JVM driver.

For the task barrier, one problem is that, we launch a Python worker per task, 
and the Python workers talk to the JVM executor via socket. It's hard to change 
the protocol and allow the Python worker to send a signal to the JVM executor 
to request a sync. We can set up a PY4J server per task, and the Python Worker 
can send the barrier sync request via PY4J.

> Design sketch: support barrier scheduling in Apache Spark
> ---------------------------------------------------------
>
>                 Key: SPARK-24375
>                 URL: https://issues.apache.org/jira/browse/SPARK-24375
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Jiang Xingbo
>            Priority: Major
>
> This task is to outline a design sketch for the barrier scheduling SPIP 
> discussion. It doesn't need to be a complete design before the vote. But it 
> should at least cover both Scala/Java and PySpark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to