[ https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569363#comment-16569363 ]
Mridul Muralidharan edited comment on SPARK-24375 at 8/5/18 4:42 AM: --------------------------------------------------------------------- {quote} We've thought hard on the issue and don't feel we can make it unless we force users to explicitly set a number in a barrier() call (actually it's not a good idea because it brings more borden to manage the code).{quote} I am not sure where the additional burden exists. Make it an optional param to barrier. * If not defined, it would be analogous to what exists right now. * If specified, fail the stage if different tasks in stage end up waiting on different barrier names (or some have a name and others dont). In example usecases I have seen, there is usually partition specific code paths (if partition 0, do some initialization/teardown, etc) - which results in divergent codepaths : and so increases potential for this issue. It will be very difficult to reason about the state when this happens. was (Author: mridulm80): {quote} We've thought hard on the issue and don't feel we can make it unless we force users to explicitly set a number in a barrier() call (actually it's not a good idea because it brings more borden to manage the code).{quote} I am not sure where the additional burden exists. Make it an optional param to barrier. * If not defined, it would be analogous to what exists right now. * If specified, fail the stage if different tasks in stage end up waiting on different barrier names (or some have a name and others dont). In example usecases I have seen, there is usually partition specific code paths (if partition 0, do some initialization/teardown, etc) - which results in divergent codepaths : and so increases potential for this issue. It will be very difficult to reason about the state what happens. > Design sketch: support barrier scheduling in Apache Spark > --------------------------------------------------------- > > Key: SPARK-24375 > URL: https://issues.apache.org/jira/browse/SPARK-24375 > Project: Spark > Issue Type: Story > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Xiangrui Meng > Assignee: Jiang Xingbo > Priority: Major > > This task is to outline a design sketch for the barrier scheduling SPIP > discussion. It doesn't need to be a complete design before the vote. But it > should at least cover both Scala/Java and PySpark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org