zhengruifeng commented on a change in pull request #25349: 
[SPARK-28538][UI][WIP] Document SQL page
URL: https://github.com/apache/spark/pull/25349#discussion_r311332004
 
 

 ##########
 File path: docs/monitoring.md
 ##########
 @@ -40,6 +40,100 @@ To view the web UI after the fact, set 
`spark.eventLog.enabled` to true before s
 application. This configures Spark to log Spark events that encode the 
information displayed
 in the UI to persisted storage.
 
+## Web UI Tabs
+The web UI provides an overview of the Spark cluster and is composed of 
following tabs:
+
+### Jobs Tab
+The Jobs tab displays a summary page of all jobs in the Spark application and 
a detailed page
+for each job. The summary page shows high-level information, such as the 
status, duration, and
+progress of all jobs and the overall event timeline. When you click on a job 
on the summary
+page, you see the detailed page for that job. The detailed page further shows 
the event timeline,
+DAG visualization, and all stages of the job.
+
+### Stages Tab
+The Stages tab displays a summary page that shows the current state of all 
stages of all jobs in
+the Spark application, and, when you click on a stage, a detailed page for 
that stage. The details
+page shows the event timeline, DAG visualization, and all tasks for the stage.
+
+### Storage Tab
+The Storage tab displays the persisted RDDs, if any, in the application. The 
summary page shows
+the storage levels, sizes and partitions of all RDDs, and the detailed page 
shows the sizes and
+using executors for all partitions in an RDD.
+
+### Environment Tab
+The Environment tab displays the values for the different environment and 
configuration variables,
+including JVM, Spark, and system properties.
+
+### Executors Tab
+The Executors tab displays summary information about the executors that were 
created for the
+application, including memory and disk usage and task and shuffle information. 
The Storage Memory
+column shows the amount of memory used and reserved for caching data.
+
+### SQL Tab
+If the application executes Spark SQL queries, the SQL tab displays 
information, such as the duration,
+jobs, and physical and logical plans for the queries. Here we include a basic 
example to illustrate
+this tab:
+{% highlight scala %}
+scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
+df: org.apache.spark.sql.DataFrame = [count: int, name: string]
+
+scala> df.count
+res0: Long = 3                                                                 
 
+
+scala> df.createGlobalTempView("df")
+
+scala> spark.sql("select name,sum(count) from global_temp.df group by 
name").show
++----+----------+
+|name|sum(count)|
++----+----------+
+|andy|         3|
+| bob|         2|
++----+----------+
+{% endhighlight %}
+
+<p style="text-align: center;">
+  <img src="img/webui-sql-tab.png"
+       title="SQL tab"
+       alt="SQL tab"
+       width="80%" />
+  <!-- Images are downsized intentionally to improve quality on retina 
displays -->
+</p>
+
+Now the above three dataframe/SQL operators are shown in the list. If we click 
the
+'show at \<console\>: 24' link of the last query, we will see the DAG of the 
job.
+
+<p style="text-align: center;">
+  <img src="img/webui-sql-dag.png"
+       title="SQL DAG"
+       alt="SQL DAG"
+       width="50%" />
+  <!-- Images are downsized intentionally to improve quality on retina 
displays -->
+</p>
+
+We can see that detailed information of each stage. The first block 
'WholeStageCodegen'  
+compile multiple operator ('LocalTableScan' and 'HashAggregate') together into 
a single Java
+function to improve performance, and metrics like number of rows and spill 
size are listed in
+the block. The second block 'Exchange' shows the metrics on the shuffle 
exchange, including
+number of written shuffle records, total data size, etc.
+
+
+<p style="text-align: center;">
+  <img src="img/webui-sql-plan.png"
 
 Review comment:
   HI, @dongjoon-hyun  I local run the example and make screenshots to png 
files.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to