[GitHub] [spark] srowen commented on a change in pull request #25598: [SPARK-28542][DOCS][WebUI] Stages Tab

GitBox Fri, 30 Aug 2019 08:41:19 -0700

srowen commented on a change in pull request #25598: [SPARK-28542][DOCS][WebUI] 
Stages Tab
URL: https://github.com/apache/spark/pull/25598#discussion_r319568836


 ##########
 File path: docs/web-ui.md
 ##########
 @@ -94,9 +94,76 @@ This page displays the details of a specific job identified 
by its job ID.
 </p>
 
 ## Stages Tab
+
 The Stages tab displays a summary page that shows the current state of all 
stages of all jobs in
-the Spark application, and, when you click on a stage, a details page for that 
stage. The details
-page shows the event timeline, DAG visualization, and all tasks for the stage.
+the Spark application.
+
+At the beginning of the page is the summary with the count of all stages by 
status (active, pending, completed, sikipped, and failed)
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail1.png" title="Stages header" alt="Stages 
header" width="30%">
+</p>
+
+In [Fair scheduling 
mode](job-scheduling.html#scheduling-within-an-application) there is a table 
that displays [pools 
properties](job-scheduling.html#configuring-pool-properties)
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail2.png" title="Pool properties" alt="Pool 
properties">
+</p>
+
+After that are the details of stages per status (active, pending, completed, 
skipped, failed). In active stages, it's possible to kill the stage with the 
kill link. Only in failed stages, failure reason is shown. Task detail can be 
accessed by clicking on the description.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail3.png" title="Stages detail" alt="Stages 
detail">
+</p>
+
+### Stage detail
+The stage detail page begins with information like total time across all 
tasks, [Locality level summary](tuning.html#data-locality), [Shuffle Read Size 
/ Records](rdd-programming-guide.html#shuffle-operations) and Associated Job 
IDs.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail4.png" title="Stage header" alt="Stage 
header" width="30%">
+</p>
+
+There is also a visual representation of the directed acyclic graph (DAG) of 
this stage, where vertices represent the RDDs or DataFrames and the edges 
represent an operation to be applied.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail5.png" title="Stage DAG" alt="Stage DAG" 
width="50%">
+</p>
+
+Summary metrics for all task are represented in a table and in a timeline.
+* **[Tasks deserialization 
time](configuration.html#compression-and-serialization)**
+* **Duration of tasks**.
+* **GC time** is the total JVM garbage collection time.
+* **Result serialization time** is the time spent serializing the task result 
on a executor before sending it back to the driver.
+* **Getting result time** is the time that the driver spends fetching task 
results from workers.
+* **Scheduler delay** is the time the task waits to be scheduled for execution.
+* **Peak execution memory** it's the maximum memory used by the internal data 
structures created during shuffles, aggregations and joins.
 
 Review comment:
   it's -> is

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25598: [SPARK-28542][DOCS][WebUI] Stages Tab

Reply via email to