[SPARK-6943] [SPARK-6944] DAG visualization on SparkUI

This patch adds the functionality to display the RDD DAG on the SparkUI.

This DAG describes the relationships between
- an RDD and its dependencies,
- an RDD and its operation scopes, and
- an RDD's operation scopes and the stage / job hierarchy

An operation scope here refers to the existing public APIs that created the 
RDDs (e.g. `textFile`, `treeAggregate`). In the future, we can expand this to 
include higher level operations like SQL queries.

*Note: This blatantly stole a few lines of HTML and JavaScript from #5547 
(thanks shroffpradyumn!)*

Here's what the job page looks like:
<img 
src="https://issues.apache.org/jira/secure/attachment/12730286/job-page.png"; 
width="700px"/>
and the stage page:
<img 
src="https://issues.apache.org/jira/secure/attachment/12730287/stage-page.png"; 
width="300px"/>

Author: Andrew Or <and...@databricks.com>

Closes #5729 from andrewor14/viz2 and squashes the following commits:

666c03b [Andrew Or] Round corners of RDD boxes on stage page (minor)
01ba336 [Andrew Or] Change RDD cache color to red (minor)
6f9574a [Andrew Or] Add tests for RDDOperationScope
1c310e4 [Andrew Or] Wrap a few more RDD functions in an operation scope
3ffe566 [Andrew Or] Restore "null" as default for RDD name
5fdd89d [Andrew Or] children -> child (minor)
0d07a84 [Andrew Or] Fix python style
afb98e2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
0d7aa32 [Andrew Or] Fix python tests
3459ab2 [Andrew Or] Fix tests
832443c [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
429e9e1 [Andrew Or] Display cached RDDs on the viz
b1f0fd1 [Andrew Or] Rename OperatorScope -> RDDOperationScope
31aae06 [Andrew Or] Extract visualization logic from listener
83f9c58 [Andrew Or] Implement a programmatic representation of operator scopes
5a7faf4 [Andrew Or] Rename references to viz scopes to viz clusters
ee33d52 [Andrew Or] Separate HTML generating code from listener
f9830a2 [Andrew Or] Refactor + clean up + document JS visualization code
b80cc52 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
0706992 [Andrew Or] Add link from jobs to stages
deb48a0 [Andrew Or] Translate stage boxes taking into account the width
5c7ce16 [Andrew Or] Connect RDDs across stages + update style
ab91416 [Andrew Or] Introduce visualization to the Job Page
5f07e9c [Andrew Or] Remove more return statements from scopes
5e388ea [Andrew Or] Fix line too long
43de96e [Andrew Or] Add parent IDs to StageInfo
6e2cfea [Andrew Or] Remove all return statements in `withScope`
d19c4da [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
7ef957c [Andrew Or] Fix scala style
4310271 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
aa868a9 [Andrew Or] Ensure that HadoopRDD is actually serializable
c3bfcae [Andrew Or] Re-implement scopes using closures instead of annotations
52187fc [Andrew Or] Rat excludes
09d361e [Andrew Or] Add ID to node label (minor)
71281fa [Andrew Or] Embed the viz in the UI in a toggleable manner
8dd5af2 [Andrew Or] Fill in documentation + miscellaneous minor changes
fe7816f [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
205f838 [Andrew Or] Reimplement rendering with dagre-d3 instead of viz.js
5e22946 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
6a7cdca [Andrew Or] Move RDD scope util methods and logic to its own file
494d5c2 [Andrew Or] Revert a few unintended style changes
9fac6f3 [Andrew Or] Re-implement scopes through annotations instead
f22f337 [Andrew Or] First working implementation of visualization with vis.js
2184348 [Andrew Or] Translate RDD information to dot file
5143523 [Andrew Or] Expose the necessary information in RDDInfo
a9ed4f9 [Andrew Or] Add a few missing scopes to certain RDD methods
6b3403b [Andrew Or] Scope all RDD methods


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/863ec0cb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/863ec0cb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/863ec0cb

Branch: refs/heads/branch-1.4
Commit: 863ec0cb4de7dc77987117b35454cf79e240b1e7
Parents: 34edaa8
Author: Andrew Or <and...@databricks.com>
Authored: Mon May 4 16:21:36 2015 -0700
Committer: Andrew Or <and...@databricks.com>
Committed: Mon May 4 16:24:35 2015 -0700

----------------------------------------------------------------------
 .rat-excludes                                   |   3 +
 .../org/apache/spark/ui/static/d3.min.js        |   5 +
 .../org/apache/spark/ui/static/dagre-d3.min.js  |  29 ++
 .../apache/spark/ui/static/graphlib-dot.min.js  |   4 +
 .../org/apache/spark/ui/static/spark-dag-viz.js | 392 +++++++++++++++++++
 .../org/apache/spark/ui/static/webui.css        |   2 +-
 .../scala/org/apache/spark/SparkContext.scala   |  97 +++--
 .../org/apache/spark/rdd/AsyncRDDActions.scala  |  10 +-
 .../apache/spark/rdd/DoubleRDDFunctions.scala   |  38 +-
 .../scala/org/apache/spark/rdd/HadoopRDD.scala  |   6 +-
 .../apache/spark/rdd/OrderedRDDFunctions.scala  |   6 +-
 .../org/apache/spark/rdd/PairRDDFunctions.scala | 167 ++++----
 .../main/scala/org/apache/spark/rdd/RDD.scala   | 341 +++++++++-------
 .../apache/spark/rdd/RDDOperationScope.scala    | 137 +++++++
 .../spark/rdd/SequenceFileRDDFunctions.scala    |   4 +-
 .../org/apache/spark/scheduler/StageInfo.scala  |   2 +
 .../org/apache/spark/storage/RDDInfo.scala      |  11 +-
 .../scala/org/apache/spark/ui/SparkUI.scala     |  10 +-
 .../scala/org/apache/spark/ui/UIUtils.scala     |  55 ++-
 .../org/apache/spark/ui/jobs/AllJobsPage.scala  |   2 +-
 .../apache/spark/ui/jobs/AllStagesPage.scala    |  10 +-
 .../apache/spark/ui/jobs/ExecutorTable.scala    |   2 +-
 .../org/apache/spark/ui/jobs/JobPage.scala      |  14 +-
 .../spark/ui/jobs/JobProgressListener.scala     |  17 +-
 .../org/apache/spark/ui/jobs/JobsTab.scala      |   6 +-
 .../org/apache/spark/ui/jobs/PoolPage.scala     |   4 +-
 .../org/apache/spark/ui/jobs/PoolTable.scala    |   2 +-
 .../org/apache/spark/ui/jobs/StagePage.scala    |  17 +-
 .../org/apache/spark/ui/jobs/StagesTab.scala    |   7 +-
 .../spark/ui/scope/RDDOperationGraph.scala      | 205 ++++++++++
 .../ui/scope/RDDOperationGraphListener.scala    |  68 ++++
 .../org/apache/spark/util/JsonProtocol.scala    |  28 +-
 .../spark/ExecutorAllocationManagerSuite.scala  |   2 +-
 .../spark/rdd/RDDOperationScopeSuite.scala      | 133 +++++++
 .../org/apache/spark/storage/StorageSuite.scala |   4 +-
 .../ui/jobs/JobProgressListenerSuite.scala      |   6 +-
 .../spark/ui/storage/StorageTabSuite.scala      |  30 +-
 .../apache/spark/util/JsonProtocolSuite.scala   |  45 ++-
 38 files changed, 1584 insertions(+), 337 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/863ec0cb/.rat-excludes
----------------------------------------------------------------------
diff --git a/.rat-excludes b/.rat-excludes
index 2238a5b..dccf2db 100644
--- a/.rat-excludes
+++ b/.rat-excludes
@@ -30,6 +30,9 @@ spark-env.sh.template
 log4j-defaults.properties
 bootstrap-tooltip.js
 jquery-1.11.1.min.js
+d3.min.js
+dagre-d3.min.js
+graphlib-dot.min.js
 sorttable.js
 vis.min.js
 vis.min.css


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to