GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/11720

    [SPARK-13902][SCHEDULER] Make DAGScheduler.getAncestorShuffleDependencies() 
return in topological order to ensure building ancestor stages first.

    ## What changes were proposed in this pull request?
    
    `DAGScheduler`sometimes generate incorrect stage graph.
    Some stages are generated for the same shuffleId twice or more and they are 
referenced by the child stages because the building order of the graph is not 
correct.
    
    This patch is fixing it.
    
    ## How was this patch tested?
    
    I added the sample RDD graph to show the illegal stage graph to 
`DAGSchedulerSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-13902

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11720
    
----
commit 9a1724de0287b5ca41e30f3d3401fd721a2e1520
Author: Takuya UESHIN <ues...@happy-camper.st>
Date:   2016-03-15T02:21:09Z

    Add a test to check if the stage graph is properly built.

commit f8b7910ecb52a5954de091ed79d5de9c19ba2744
Author: Takuya UESHIN <ues...@happy-camper.st>
Date:   2016-03-15T02:22:42Z

    Make DAGScheduler.getAncestorShuffleDependencies() return in topological 
order to ensure building ancestor stages first.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to