[ https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492745#comment-16492745 ]
Wenchen Fan commented on SPARK-24375: ------------------------------------- For the PySpark side, we don't need to care about the scheduler stuff, because PySpark driver connects to a JVM driver, and all the schedule stuff is done in the JVM driver. For the task barrier, one problem is that, we launch a Python worker per task, and the Python workers talk to the JVM executor via socket. It's hard to change the protocol and allow the Python worker to send a signal to the JVM executor to request a sync. We can set up a PY4J server per task, and the Python Worker can send the barrier sync request via PY4J. > Design sketch: support barrier scheduling in Apache Spark > --------------------------------------------------------- > > Key: SPARK-24375 > URL: https://issues.apache.org/jira/browse/SPARK-24375 > Project: Spark > Issue Type: Story > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Xiangrui Meng > Assignee: Jiang Xingbo > Priority: Major > > This task is to outline a design sketch for the barrier scheduling SPIP > discussion. It doesn't need to be a complete design before the vote. But it > should at least cover both Scala/Java and PySpark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org