[jira] [Updated] (SPARK-20288) Improve BasicSchedulerIntegrationSuite "multi-stage job"

jin xing (JIRA) Mon, 10 Apr 2017 23:20:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jin xing updated SPARK-20288:
-----------------------------
    Description: 
ShuffleId is determined before job submitted. But it's hard to predict stageId 
by shuffleId.
Stage is created in DAGScheduler(
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L381),
 but the order is n
ot determined.
I added a log(println(s"Creating ShufflMapStage-$id on 
shuffle-${shuffleDep.shuffleId}")) after 
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L331),
 when testing BasicSchedulerIntegrationSuite:"multi-stage job". It will print:
Creating ShufflMapStage-0 on shuffle-0
Creating ShufflMapStage-1 on shuffle-2
Creating ShufflMapStage-2 on shuffle-1
Creating ShufflMapStage-3 on shuffle-3
or
Creating ShufflMapStage-0 on shuffle-1
Creating ShufflMapStage-1 on shuffle-3
Creating ShufflMapStage-2 on shuffle-0
Creating ShufflMapStage-3 on shuffle-2

So It might be better to avoid generating the MapStatus by stageId.

  was:
ShuffleId is determined before job submitted. But it's hard to predict stageId 
by shuffleId.
Stage is created in DAGScheduler(
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L381),
 but the order is n
ot determined.
I added a log after 
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L331),
 when testing BasicSchedulerIntegrationSuite:"multi-stage job". It will print:
Creating ShufflMapStage-0 on shuffle-0
Creating ShufflMapStage-1 on shuffle-2
Creating ShufflMapStage-2 on shuffle-1
Creating ShufflMapStage-3 on shuffle-3
or
Creating ShufflMapStage-0 on shuffle-1
Creating ShufflMapStage-1 on shuffle-3
Creating ShufflMapStage-2 on shuffle-0
Creating ShufflMapStage-3 on shuffle-2

So It might be better to avoid generating the MapStatus by stageId.


> Improve BasicSchedulerIntegrationSuite "multi-stage job"
> --------------------------------------------------------
>
>                 Key: SPARK-20288
>                 URL: https://issues.apache.org/jira/browse/SPARK-20288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>            Priority: Minor
>
> ShuffleId is determined before job submitted. But it's hard to predict 
> stageId by shuffleId.
> Stage is created in DAGScheduler(
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L381),
>  but the order is n
> ot determined.
> I added a log(println(s"Creating ShufflMapStage-$id on 
> shuffle-${shuffleDep.shuffleId}")) after 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L331),
>  when testing BasicSchedulerIntegrationSuite:"multi-stage job". It will print:
> Creating ShufflMapStage-0 on shuffle-0
> Creating ShufflMapStage-1 on shuffle-2
> Creating ShufflMapStage-2 on shuffle-1
> Creating ShufflMapStage-3 on shuffle-3
> or
> Creating ShufflMapStage-0 on shuffle-1
> Creating ShufflMapStage-1 on shuffle-3
> Creating ShufflMapStage-2 on shuffle-0
> Creating ShufflMapStage-3 on shuffle-2
> So It might be better to avoid generating the MapStatus by stageId.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20288) Improve BasicSchedulerIntegrationSuite "multi-stage job"

Reply via email to