[ https://issues.apache.org/jira/browse/SPARK-13902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takuya Ueshin updated SPARK-13902: ---------------------------------- Description: {{DAGScheduler}} sometimes generate incorrect stage graph. Some stages are generated for the same shuffleId twice or more and they are referenced by the child stages because the building order of the graph is not correct. Here, we submit an RDD\[F\] having a linage of RDDs as follows (please see this in {{monospaced}} font): {noformat} <-------------------- / \ [A] <--(1)-- [B] <--(2)-- [C] <--(3)-- [D] <--(4)-- [E] <--(5)-- [F] \ / <-------------------- {noformat} Note: \[\] means an RDD, () means a shuffle dependency. {{DAGScheduler}} generates the following stages and their parents for each shuffle: | | stage | parents | | (1) | ShuffleMapStage 2 | List() | | (2) | ShuffleMapStage 1 | List(ShuffleMapStage 0) | | (3) | ShuffleMapStage 3 | List(ShuffleMapStage 1) | | (4) | ShuffleMapStage 4 | List(ShuffleMapStage 2, ShuffleMapStage 3) | | (5) | ShuffleMapStage 5 | List(ShuffleMapStage 1, ShuffleMapStage 4) | | \- | ResultStage 6 | List(ShuffleMapStage 5) | The stage for shuffle id {{0}} should be {{ShuffleMapStage 0}}, but the stage for shuffle id {{0}} is generated twice as {{ShuffleMapStage 2}} and {{ShuffleMapStage 0}} is overwritten by {{ShuffleMapStage 2}}, and the stage {{ShuffleMap Stage1}} keeps referring the _old_ stage {{ShuffleMapStage 0}}. was: {{DAGScheduler}} sometimes generate incorrect stage graph. Some stages are generated for the same shuffleId twice or more and they are referenced by the child stages because the building order of the graph is not correct. Here, we submit an RDD\[F\] having a linage of RDDs as follows (please see this in {{monospaced}} font): {noformat} <-------------------- / \ [A] <--(1)-- [B] <--(2)-- [C] <--(3)-- [D] <--(4)-- [E] <--(5)-- [F] \ / <-------------------- {noformat} {{DAGScheduler}} generates the following stages and their parents for each shuffle id: | shuffle id | stage | parents | | 0 | ShuffleMapStage 2 | List() | | 1 | ShuffleMapStage 1 | List(ShuffleMapStage 0) | | 2 | ShuffleMapStage 3 | List(ShuffleMapStage 1) | | 3 | ShuffleMapStage 4 | List(ShuffleMapStage 2, ShuffleMapStage 3) | | 4 | ShuffleMapStage 5 | List(ShuffleMapStage 1, ShuffleMapStage 4) | | \- | ResultStage 6 | List(ShuffleMapStage 5) | The stage for shuffle id {{0}} should be {{ShuffleMapStage 0}}, but the stage for shuffle id {{0}} is generated twice as {{ShuffleMapStage 2}} and {{ShuffleMapStage 0}} is overwritten by {{ShuffleMapStage 2}}, and the stage {{ShuffleMap Stage1}} keeps referring the _old_ stage {{ShuffleMapStage 0}}. > Make DAGScheduler.getAncestorShuffleDependencies() return in topological > order to ensure building ancestor stages first. > ------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-13902 > URL: https://issues.apache.org/jira/browse/SPARK-13902 > Project: Spark > Issue Type: Bug > Components: Scheduler > Reporter: Takuya Ueshin > > {{DAGScheduler}} sometimes generate incorrect stage graph. > Some stages are generated for the same shuffleId twice or more and they are > referenced by the child stages because the building order of the graph is not > correct. > Here, we submit an RDD\[F\] having a linage of RDDs as follows (please see > this in {{monospaced}} font): > {noformat} > <-------------------- > / \ > [A] <--(1)-- [B] <--(2)-- [C] <--(3)-- [D] <--(4)-- [E] <--(5)-- [F] > \ / > <-------------------- > {noformat} > Note: \[\] means an RDD, () means a shuffle dependency. > {{DAGScheduler}} generates the following stages and their parents for each > shuffle: > | | stage | parents | > | (1) | ShuffleMapStage 2 | List() | > | (2) | ShuffleMapStage 1 | List(ShuffleMapStage 0) | > | (3) | ShuffleMapStage 3 | List(ShuffleMapStage 1) | > | (4) | ShuffleMapStage 4 | List(ShuffleMapStage 2, ShuffleMapStage 3) | > | (5) | ShuffleMapStage 5 | List(ShuffleMapStage 1, ShuffleMapStage 4) | > | \- | ResultStage 6 | List(ShuffleMapStage 5) | > The stage for shuffle id {{0}} should be {{ShuffleMapStage 0}}, but the stage > for shuffle id {{0}} is generated twice as {{ShuffleMapStage 2}} and > {{ShuffleMapStage 0}} is overwritten by {{ShuffleMapStage 2}}, and the stage > {{ShuffleMap Stage1}} keeps referring the _old_ stage {{ShuffleMapStage 0}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org