[ 
https://issues.apache.org/jira/browse/SPARK-20288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963884#comment-15963884
 ] 

Apache Spark commented on SPARK-20288:
--------------------------------------

User 'jinxing64' has created a pull request for this issue:
https://github.com/apache/spark/pull/17603

> ImproveĀ BasicSchedulerIntegrationSuite "multi-stage job"
> --------------------------------------------------------
>
>                 Key: SPARK-20288
>                 URL: https://issues.apache.org/jira/browse/SPARK-20288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>            Priority: Minor
>
> ShuffleId is determined before job submitted. But it's hard to predict 
> stageId by shuffleId.
> Stage is created in DAGScheduler(
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L381),
>  but the order is n
> ot determined.
> I added a log(println(s"Creating ShufflMapStage-$id on 
> shuffle-${shuffleDep.shuffleId}")) after 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L331),
>  when testing BasicSchedulerIntegrationSuite:"multi-stage job". It will print:
> Creating ShufflMapStage-0 on shuffle-0
> Creating ShufflMapStage-1 on shuffle-2
> Creating ShufflMapStage-2 on shuffle-1
> Creating ShufflMapStage-3 on shuffle-3
> or
> Creating ShufflMapStage-0 on shuffle-1
> Creating ShufflMapStage-1 on shuffle-3
> Creating ShufflMapStage-2 on shuffle-0
> Creating ShufflMapStage-3 on shuffle-2
> So It might be better to avoid generating the MapStatus by stageId.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to