This is an automated email from the ASF dual-hosted git repository.
yaooqinn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 897c2619d09f [SPARK-55846][DOCS] Update Web UI documentation for UI
modernization
897c2619d09f is described below
commit 897c2619d09fab58747a2cb53f3b7cb62fc7e695
Author: Kent Yao <[email protected]>
AuthorDate: Sun May 24 22:22:38 2026 +0800
[SPARK-55846][DOCS] Update Web UI documentation for UI modernization
### What changes were proposed in this pull request?
This PR rewrites `docs/web-ui.md` for the modernized Spark Web UI delivered
under SPARK-55760
Highlights:
- **Overview**: brief description of how to access the UI, port-walking
behavior, `spark.ui.port` / `spark.ui.enabled`, and the new tab navbar.
- **Jobs / Stages tabs**: simplified from 15+ inline screenshots to 4
representative ones (`AllJobsPage.png`, `JobPage.png`, `AllStagesPage.png`,
`StagePage.png`); removed stale "Started At / User / Total uptime" prose now
shown in the page footer.
- **Storage tab**: refreshed screenshots only.
- **Environment tab**: rewritten as an overview plus a list of the seven
new subtabs (Runtime Information, Spark Properties, Resource Profiles, Hadoop
Properties, System Properties, Metrics Properties, Classpath Entries).
- **Executors tab**: documents the new **Thread Dump**, **Heap Histogram**,
and **Flame Graph** side panel (drag-resizable from the left edge). Generalized
the **stderr** / **stdout** link description so it isn't standalone-only.
- **SQL tab**: new structure with `Query Listing`, `SQL Plan
Visualization`, `Execution Detail Page`, and the existing SQL metrics table.
Documents pan/zoom, in-graph metrics, node search, and the side-panel node
details.
- Refreshed all 10 corresponding screenshots and added 4 new ones for the
Jobs/Stages pages.
The Structured Streaming, Streaming (DStreams), and JDBC/ODBC Server
sections are left untouched.
### Why are the changes needed?
The Web UI was substantially modernized for Spark 5.0 (Bootstrap 5,
DataTables-based listings, side panels, plan-viz pan/zoom, etc.), and
`docs/web-ui.md` still described the pre-modernization layout in many places.
This PR brings the user-facing documentation in line with the shipping UI.
### Does this PR introduce _any_ user-facing change?
Documentation only.
### How was this patch tested?
- Re-rendered the page locally and verified all referenced screenshots
resolve.
- All screenshots were captured against `master` running a small demo Spark
application.
- A reader-test pass was performed to find gaps for first-time users and
the resulting issues were addressed.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: GitHub Copilot CLI 1.0.51-2 with Claude Opus 4.7
Closes #56013 from yaooqinn/SPARK-55846.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
docs/img/AllJobsPage.png | Bin 0 -> 245946 bytes
docs/img/AllJobsPageDetail1.png | Bin 20567 -> 0 bytes
docs/img/AllJobsPageDetail2.png | Bin 70557 -> 0 bytes
docs/img/AllJobsPageDetail3.png | Bin 94804 -> 0 bytes
docs/img/AllStagesPage.png | Bin 0 -> 273153 bytes
docs/img/AllStagesPageDetail1.png | Bin 18689 -> 0 bytes
docs/img/AllStagesPageDetail2.png | Bin 21445 -> 0 bytes
docs/img/AllStagesPageDetail3.png | Bin 124154 -> 0 bytes
docs/img/AllStagesPageDetail4.png | Bin 18098 -> 0 bytes
docs/img/AllStagesPageDetail5.png | Bin 33047 -> 0 bytes
docs/img/AllStagesPageDetail6.png | Bin 163423 -> 0 bytes
docs/img/AllStagesPageDetail7.png | Bin 26097 -> 0 bytes
docs/img/AllStagesPageDetail8.png | Bin 10501 -> 0 bytes
docs/img/AllStagesPageDetail9.png | Bin 61336 -> 0 bytes
docs/img/JobPage.png | Bin 0 -> 85873 bytes
docs/img/JobPageDetail1.png | Bin 73262 -> 0 bytes
docs/img/JobPageDetail2.png | Bin 24088 -> 0 bytes
docs/img/JobPageDetail3.png | Bin 48152 -> 0 bytes
docs/img/StagePage.png | Bin 0 -> 153617 bytes
docs/img/webui-env-class.png | Bin 100687 -> 0 bytes
docs/img/webui-env-hadoop.png | Bin 107531 -> 0 bytes
docs/img/webui-env-sys.png | Bin 71739 -> 0 bytes
docs/img/webui-env-tab.png | Bin 250669 -> 65030 bytes
docs/img/webui-exe-err.png | Bin 718376 -> 0 bytes
docs/img/webui-exe-tab.png | Bin 554068 -> 119447 bytes
docs/img/webui-exe-thread.png | Bin 201409 -> 0 bytes
docs/img/webui-sql-dag.png | Bin 74288 -> 169600 bytes
docs/img/webui-sql-plan.png | Bin 458454 -> 0 bytes
docs/img/webui-sql-tab.png | Bin 372594 -> 154963 bytes
docs/img/webui-storage-detail.png | Bin 190970 -> 120052 bytes
docs/img/webui-storage-tab.png | Bin 78441 -> 62667 bytes
docs/web-ui.md | 333 ++++++++++++++++----------------------
32 files changed, 139 insertions(+), 194 deletions(-)
diff --git a/docs/img/AllJobsPage.png b/docs/img/AllJobsPage.png
new file mode 100644
index 000000000000..19e1acc8cecf
Binary files /dev/null and b/docs/img/AllJobsPage.png differ
diff --git a/docs/img/AllJobsPageDetail1.png b/docs/img/AllJobsPageDetail1.png
deleted file mode 100644
index de7e8c888332..000000000000
Binary files a/docs/img/AllJobsPageDetail1.png and /dev/null differ
diff --git a/docs/img/AllJobsPageDetail2.png b/docs/img/AllJobsPageDetail2.png
deleted file mode 100644
index b7203b2e6658..000000000000
Binary files a/docs/img/AllJobsPageDetail2.png and /dev/null differ
diff --git a/docs/img/AllJobsPageDetail3.png b/docs/img/AllJobsPageDetail3.png
deleted file mode 100644
index 75b7caec119b..000000000000
Binary files a/docs/img/AllJobsPageDetail3.png and /dev/null differ
diff --git a/docs/img/AllStagesPage.png b/docs/img/AllStagesPage.png
new file mode 100644
index 000000000000..52b2882f60ab
Binary files /dev/null and b/docs/img/AllStagesPage.png differ
diff --git a/docs/img/AllStagesPageDetail1.png
b/docs/img/AllStagesPageDetail1.png
deleted file mode 100644
index ac3c48b5a9a1..000000000000
Binary files a/docs/img/AllStagesPageDetail1.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail2.png
b/docs/img/AllStagesPageDetail2.png
deleted file mode 100644
index 41d4165b9298..000000000000
Binary files a/docs/img/AllStagesPageDetail2.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail3.png
b/docs/img/AllStagesPageDetail3.png
deleted file mode 100644
index fd5267aa4a1c..000000000000
Binary files a/docs/img/AllStagesPageDetail3.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail4.png
b/docs/img/AllStagesPageDetail4.png
deleted file mode 100644
index 2f038b3d6196..000000000000
Binary files a/docs/img/AllStagesPageDetail4.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail5.png
b/docs/img/AllStagesPageDetail5.png
deleted file mode 100644
index 95d1f0e7f3be..000000000000
Binary files a/docs/img/AllStagesPageDetail5.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail6.png
b/docs/img/AllStagesPageDetail6.png
deleted file mode 100644
index 1c4ec1594e03..000000000000
Binary files a/docs/img/AllStagesPageDetail6.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail7.png
b/docs/img/AllStagesPageDetail7.png
deleted file mode 100644
index 6ab37481aa15..000000000000
Binary files a/docs/img/AllStagesPageDetail7.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail8.png
b/docs/img/AllStagesPageDetail8.png
deleted file mode 100644
index a60745c27b16..000000000000
Binary files a/docs/img/AllStagesPageDetail8.png and /dev/null differ
diff --git a/docs/img/AllStagesPageDetail9.png
b/docs/img/AllStagesPageDetail9.png
deleted file mode 100644
index c471320cd9bb..000000000000
Binary files a/docs/img/AllStagesPageDetail9.png and /dev/null differ
diff --git a/docs/img/JobPage.png b/docs/img/JobPage.png
new file mode 100644
index 000000000000..594bdcd30d35
Binary files /dev/null and b/docs/img/JobPage.png differ
diff --git a/docs/img/JobPageDetail1.png b/docs/img/JobPageDetail1.png
deleted file mode 100644
index 1ee741d1f09d..000000000000
Binary files a/docs/img/JobPageDetail1.png and /dev/null differ
diff --git a/docs/img/JobPageDetail2.png b/docs/img/JobPageDetail2.png
deleted file mode 100644
index 5eb529eb7c27..000000000000
Binary files a/docs/img/JobPageDetail2.png and /dev/null differ
diff --git a/docs/img/JobPageDetail3.png b/docs/img/JobPageDetail3.png
deleted file mode 100644
index 9f691e4ed2b6..000000000000
Binary files a/docs/img/JobPageDetail3.png and /dev/null differ
diff --git a/docs/img/StagePage.png b/docs/img/StagePage.png
new file mode 100644
index 000000000000..9cbabd9eacf8
Binary files /dev/null and b/docs/img/StagePage.png differ
diff --git a/docs/img/webui-env-class.png b/docs/img/webui-env-class.png
deleted file mode 100644
index e57dada528d1..000000000000
Binary files a/docs/img/webui-env-class.png and /dev/null differ
diff --git a/docs/img/webui-env-hadoop.png b/docs/img/webui-env-hadoop.png
deleted file mode 100644
index e4ae232d1821..000000000000
Binary files a/docs/img/webui-env-hadoop.png and /dev/null differ
diff --git a/docs/img/webui-env-sys.png b/docs/img/webui-env-sys.png
deleted file mode 100644
index e7d57fe1a84d..000000000000
Binary files a/docs/img/webui-env-sys.png and /dev/null differ
diff --git a/docs/img/webui-env-tab.png b/docs/img/webui-env-tab.png
index d9bfc1d4adad..2ef145ca3bf8 100644
Binary files a/docs/img/webui-env-tab.png and b/docs/img/webui-env-tab.png
differ
diff --git a/docs/img/webui-exe-err.png b/docs/img/webui-exe-err.png
deleted file mode 100644
index 2fb11638faf7..000000000000
Binary files a/docs/img/webui-exe-err.png and /dev/null differ
diff --git a/docs/img/webui-exe-tab.png b/docs/img/webui-exe-tab.png
index 8b835fd1f974..287ba238677d 100644
Binary files a/docs/img/webui-exe-tab.png and b/docs/img/webui-exe-tab.png
differ
diff --git a/docs/img/webui-exe-thread.png b/docs/img/webui-exe-thread.png
deleted file mode 100644
index 136d274159e1..000000000000
Binary files a/docs/img/webui-exe-thread.png and /dev/null differ
diff --git a/docs/img/webui-sql-dag.png b/docs/img/webui-sql-dag.png
index 1c83c176da32..e20630bdddf5 100644
Binary files a/docs/img/webui-sql-dag.png and b/docs/img/webui-sql-dag.png
differ
diff --git a/docs/img/webui-sql-plan.png b/docs/img/webui-sql-plan.png
deleted file mode 100644
index f88e0b24a541..000000000000
Binary files a/docs/img/webui-sql-plan.png and /dev/null differ
diff --git a/docs/img/webui-sql-tab.png b/docs/img/webui-sql-tab.png
index dca58e7d93a3..1d9660cc3df4 100644
Binary files a/docs/img/webui-sql-tab.png and b/docs/img/webui-sql-tab.png
differ
diff --git a/docs/img/webui-storage-detail.png
b/docs/img/webui-storage-detail.png
index 837b235be011..9cb448e1ca46 100644
Binary files a/docs/img/webui-storage-detail.png and
b/docs/img/webui-storage-detail.png differ
diff --git a/docs/img/webui-storage-tab.png b/docs/img/webui-storage-tab.png
index 3a832981cb93..1f000bef95e2 100644
Binary files a/docs/img/webui-storage-tab.png and
b/docs/img/webui-storage-tab.png differ
diff --git a/docs/web-ui.md b/docs/web-ui.md
index 3889b41f03a0..6ae0a363d187 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -28,6 +28,31 @@ to monitor the status and resource consumption of your Spark
cluster.
* This will become a table of contents (this text will be scraped).
{:toc}
+## Overview
+
+The Web UI is built into every Spark application: while the application is
+running, it serves a set of web pages that let you inspect what is happening
+inside it. Typical uses include monitoring a running job, diagnosing a
+failure, analyzing the execution plan of a slow SQL query, and checking how
+memory and tasks are distributed across executors.
+
+By default the Web UI is available at `http://<driver-host>:4040`. When that
+port is already in use (for example, when several Spark applications run on
+the same host), Spark tries `4041`, `4042`, and so on until it finds a free
+port, and logs the chosen port at startup. You can override the default port
+with `spark.ui.port`, and tune other UI behavior through the `spark.ui.*`
+properties documented in the [Configuration](configuration.html#spark-ui)
+reference.
+
+The Web UI is tied to the lifetime of the application: once it exits, the UI
+is no longer reachable. To inspect an application after it has finished,
+enable event logging and run the Spark History Server, which reconstructs an
+equivalent UI from the persisted event log; see
+[Monitoring and Instrumentation](monitoring.html) for setup details.
+
+The remaining sections walk through each tab in the Web UI's top navigation
+bar.
+
## Jobs Tab
The Jobs tab displays a summary page of all jobs in the Spark application and
a details page
for each job. The summary page shows high-level information, such as the
status, duration, and
@@ -35,64 +60,33 @@ progress of all jobs and the overall event timeline. When
you click on a job on
page, you see the details page for that job. The details page further shows
the event timeline,
DAG visualization, and all stages of the job.
-The information that is displayed in this section is
-* User: Current Spark user
-* Started At: The startup time of Spark application
-* Total uptime: Time since Spark application started
+The information displayed at the top of the page includes:
+
* Scheduling mode: See [job
scheduling](job-scheduling.html#configuring-pool-properties)
* Number of jobs per status: Active, Completed, Failed
-
-<p style="text-align: center;">
- <img src="img/AllJobsPageDetail1.png" title="Basic info" alt="Basic info"
width="20%"/>
-</p>
-
* Event timeline: Displays in chronological order the events related to the
executors (added, removed) and the jobs
-
-<p style="text-align: center;">
- <img src="img/AllJobsPageDetail2.png" title="Event timeline" alt="Event
timeline"/>
-</p>
-
* Details of jobs grouped by status: Displays detailed information of the jobs
including Job ID, description (with a link to detailed job page), submitted
time, duration, stages summary and tasks progress bar
+The current user, application start time, and total uptime are shown in the
footer at the
+bottom of every page.
+
<p style="text-align: center;">
- <img src="img/AllJobsPageDetail3.png" title="Details of jobs grouped by
status" alt="Details of jobs grouped by status"/>
+ <img src="img/AllJobsPage.png" title="All Jobs page" alt="All Jobs page"
width="100%"/>
</p>
-
-When you click on a specific job, you can see the detailed information of this
job.
-
### Jobs detail
This page displays the details of a specific job identified by its job ID.
+
* Job Status: (running, succeeded, failed)
* Number of stages per status (active, pending, completed, skipped, failed)
-* Associated SQL Query: Link to the sql tab for this job
+* Associated SQL Query: Link to the SQL tab for this job
* Event timeline: Displays in chronological order the events related to the
executors (added, removed) and the stages of the job
+* DAG visualization: Visual representation of the directed acyclic graph of
this job where vertices represent the RDDs or DataFrames and the edges
represent an operation to be applied on RDD
+* List of stages (grouped by state active, pending, completed, skipped, and
failed), with columns including Stage ID, description, submitted timestamp,
duration, tasks progress bar, **Input** (bytes read from storage), **Output**
(bytes written to storage), **Shuffle read** (total shuffle bytes and records
read locally and from remote executors), and **Shuffle write** (bytes and
records written to disk for a future shuffle)
<p style="text-align: center;">
- <img src="img/JobPageDetail1.png" title="Event timeline" alt="Event
timeline"/>
-</p>
-
-* DAG visualization: Visual representation of the directed acyclic graph of
this job where vertices represent the RDDs or DataFrames and the edges
represent an operation to be applied on RDD.
-* An example of DAG visualization for `sc.parallelize(1 to 100).toDF.count()`
-
-<p style="text-align: center;">
- <img src="img/JobPageDetail2.png" title="DAG" alt="DAG" width="40%">
-</p>
-
-* List of stages (grouped by state active, pending, completed, skipped, and
failed)
- * Stage ID
- * Description of the stage
- * Submitted timestamp
- * Duration of the stage
- * Tasks progress bar
- * Input: Bytes read from storage in this stage
- * Output: Bytes written in storage in this stage
- * Shuffle read: Total shuffle bytes and records read, includes both data
read locally and data read from remote executors
- * Shuffle write: Bytes and records written to disk in order to be read by
a shuffle in a future stage
-
-<p style="text-align: center;">
- <img src="img/JobPageDetail3.png" title="DAG" alt="DAG">
+ <img src="img/JobPage.png" title="Job detail page" alt="Job detail page"
width="100%"/>
</p>
## Stages Tab
@@ -100,41 +94,36 @@ This page displays the details of a specific job
identified by its job ID.
The Stages tab displays a summary page that shows the current state of all
stages of all jobs in
the Spark application.
-At the beginning of the page is the summary with the count of all stages by
status (active, pending, completed, skipped, and failed)
+At the top of the page is a summary with the count of all stages by status
(active, pending,
+completed, skipped, and failed). In [Fair scheduling
mode](job-scheduling.html#scheduling-within-an-application)
+a table of [pool properties](job-scheduling.html#configuring-pool-properties)
is also shown.
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail1.png" title="Stages header" alt="Stages
header" width="30%">
-</p>
-
-In [Fair scheduling
mode](job-scheduling.html#scheduling-within-an-application) there is a table
that displays [pools
properties](job-scheduling.html#configuring-pool-properties)
+Below the summary are the stages, grouped by status (active, pending,
completed, skipped, failed).
+An active stage shows a small **(kill)** link next to its description;
clicking it asks Spark
+to cancel that stage. Only failed stages show the failure reason. Click a
stage's description
+to open its [Stage detail](#stage-detail) page.
<p style="text-align: center;">
- <img src="img/AllStagesPageDetail2.png" title="Pool properties" alt="Pool
properties">
-</p>
-
-After that are the details of stages per status (active, pending, completed,
skipped, failed). In active stages, it's possible to kill the stage with the
kill link. Only in failed stages, failure reason is shown. Task detail can be
accessed by clicking on the description.
-
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail3.png" title="Stages detail" alt="Stages
detail">
+ <img src="img/AllStagesPage.png" title="Stages tab" alt="Stages tab"
width="100%">
</p>
### Stage detail
-The stage detail page begins with information like total time across all
tasks, [Locality level summary](tuning.html#data-locality), [Shuffle Read Size
/ Records](rdd-programming-guide.html#shuffle-operations) and Associated Job
IDs.
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail4.png" title="Stage header" alt="Stage
header" width="30%">
-</p>
+The stage detail page begins with information like total time across all tasks,
+[Locality level summary](tuning.html#data-locality),
+[Shuffle Read Size / Records](rdd-programming-guide.html#shuffle-operations)
and Associated Job IDs.
-There is also a visual representation of the directed acyclic graph (DAG) of
this stage, where vertices represent the RDDs or DataFrames and the edges
represent an operation to be applied.
-Nodes are grouped by operation scope in the DAG visualization and labelled
with the operation scope name (BatchScan, WholeStageCodegen, Exchange, etc).
-Notably, Whole Stage Code Generation operations are also annotated with the
code generation id. For stages belonging to Spark DataFrame or SQL execution,
this allows to cross-reference Stage execution details to the relevant details
in the Web-UI SQL Tab page where SQL plan graphs and execution plans are
reported.
+It also shows a visual representation of the directed acyclic graph (DAG) of
this stage,
+where vertices represent the RDDs or DataFrames and the edges represent an
operation to be
+applied. Nodes are grouped by operation scope in the DAG visualization and
labelled with the
+operation scope name (`BatchScan`, `WholeStageCodegen`, `Exchange`, etc).
+Notably, whole-stage code generation operations are also annotated with the
code generation id.
+For stages belonging to Spark DataFrame or SQL execution, this allows you to
cross-reference
+stage execution details to the relevant query in the [SQL Tab](#sql-tab).
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail5.png" title="Stage DAG" alt="Stage DAG"
width="50%">
-</p>
+Summary metrics for all tasks are represented in a table and in a timeline:
-Summary metrics for all task are represented in a table and in a timeline.
-* **[Tasks deserialization
time](configuration.html#compression-and-serialization)**
+* **Task deserialization time** is the time spent deserializing the task
closure on an executor before it can run.
* **Duration of tasks**.
* **GC time** is the total JVM garbage collection time.
* **Result serialization time** is the time spent serializing the task result
on an executor before sending it back to the driver.
@@ -148,26 +137,14 @@ Summary metrics for all task are represented in a table
and in a timeline.
* **Shuffle spill (memory)** is the size of the deserialized form of the
shuffled data in memory.
* **Shuffle spill (disk)** is the size of the serialized form of the data on
disk.
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail6.png" title="Stages metrics" alt="Stages
metrics">
-</p>
-
-Aggregated metrics by executor show the same information aggregated by
executor.
-
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail7.png" title="Stages metrics per executor"
alt="Stages metrics per executors">
-</p>
-
-**[Accumulators](rdd-programming-guide.html#accumulators)** are a type of
shared variables. It provides a mutable variable that can be updated inside of
a variety of transformations. It is possible to create accumulators with and
without name, but only named accumulators are displayed.
+The same metrics are also shown aggregated by executor.
+**[Accumulators](rdd-programming-guide.html#accumulators)** are shared
variables that can be
+updated inside transformations; only named accumulators are displayed here.
Finally, a tasks
+table shows the same information broken down per task, with links to executor
logs and the task
+attempt number for failures.
<p style="text-align: center;">
- <img src="img/AllStagesPageDetail8.png" title="Stage accumulator" alt="Stage
accumulator">
-</p>
-
-Tasks details basically includes the same information as in the summary
section but detailed by task. It also includes links to review the logs and the
task attempt number if it fails for any reason. If there are named
accumulators, here it is possible to see the accumulator value at the end of
each task.
-
-<p style="text-align: center;">
- <img src="img/AllStagesPageDetail9.png" title="Tasks" alt="Tasks">
+ <img src="img/StagePage.png" title="Stage detail" alt="Stage detail"
width="100%">
</p>
## Storage Tab
@@ -224,8 +201,11 @@ distribution on the cluster.
## Environment Tab
-The Environment tab displays the values for the different environment and
configuration variables,
-including JVM, Spark, and system properties.
+
+The Environment tab is the place to verify that your Spark application is
+running with the configuration you expect. It groups the environment and
+configuration information into a set of sub-tabs along the left side of the
+page; clicking one switches the panel on the right.
<p style="text-align: center;">
<img src="img/webui-env-tab.png"
@@ -235,47 +215,32 @@ including JVM, Spark, and system properties.
<!-- Images are downsized intentionally to improve quality on retina
displays -->
</p>
-This environment page has five parts. It is a useful place to check whether
your properties have
-been set correctly.
-The first part 'Runtime Information' simply contains the [runtime
properties](configuration.html#runtime-environment)
-like versions of Java and Scala.
-The second part 'Spark Properties' lists the [application
properties](configuration.html#application-properties) like
-['spark.app.name'](configuration.html#application-properties) and
'spark.driver.memory'.
-
-<p style="text-align: center;">
- <img src="img/webui-env-hadoop.png"
- title="Hadoop Properties"
- alt="Hadoop Properties"
- width="100%" />
- <!-- Images are downsized intentionally to improve quality on retina
displays -->
-</p>
-Clicking the 'Hadoop Properties' link displays properties relative to Hadoop
and YARN. Note that properties like
-['spark.hadoop.*'](configuration.html#execution-behavior) are shown not in
this part but in 'Spark Properties'.
-
-<p style="text-align: center;">
- <img src="img/webui-env-sys.png"
- title="System Properties"
- alt="System Properties"
- width="100%" />
- <!-- Images are downsized intentionally to improve quality on retina
displays -->
-</p>
-'System Properties' shows more details about the JVM.
-
-<p style="text-align: center;">
- <img src="img/webui-env-class.png"
- title="Classpath Entries"
- alt="Classpath Entries"
- width="100%" />
- <!-- Images are downsized intentionally to improve quality on retina
displays -->
-</p>
-
-The last part 'Classpath Entries' lists the classes loaded from different
sources, which is very useful
-to resolve class conflicts.
+The sub-tabs are:
+
+* **Runtime Information** — JVM, Scala, and other
+ [runtime properties](configuration.html#runtime-environment) of the driver.
+* **Spark Properties** — the effective
+ [application properties](configuration.html#application-properties)
+ (such as `spark.app.name` and `spark.driver.memory`). Note that
+ [`spark.hadoop.*`](configuration.html#execution-behavior) properties are
+ listed here, not under Hadoop Properties.
+* **Resource Profiles** — CPU, memory, and accelerator resource
+ requests for each [resource
profile](configuration.html#stage-level-scheduling-overview)
+ in use.
+* **Hadoop Properties** — values loaded from Hadoop and YARN
configuration
+ files.
+* **System Properties** — the underlying JVM system properties.
+* **Metrics Properties** — the configuration loaded for the
+ [metrics system](monitoring.html#metrics).
+* **Classpath Entries** — the classes loaded into the driver, broken
+ down by source. Handy when tracking down class conflicts.
## Executors Tab
-The Executors tab displays summary information about the executors that were
created for the
-application, including memory and disk usage and task and shuffle information.
The Storage Memory
-column shows the amount of memory used and reserved for caching data.
+The Executors tab lists every executor that has been allocated to the
+application, including the driver. Each row shows resource usage (memory,
+disk, cores), storage memory reserved for cached data, task counts, shuffle
+totals, and performance signals such as
+[GC time](tuning.html#garbage-collection-tuning).
<p style="text-align: center;">
<img src="img/webui-exe-tab.png"
@@ -285,51 +250,28 @@ column shows the amount of memory used and reserved for
caching data.
<!-- Images are downsized intentionally to improve quality on retina
displays -->
</p>
-The Executors tab provides not only resource information (amount of memory,
disk, and cores used by each executor)
-but also performance information ([GC
time](tuning.html#garbage-collection-tuning) and shuffle information).
-
-<p style="text-align: center;">
- <img src="img/webui-exe-err.png"
- title="Stderr Log"
- alt="Stderr Log"
- width="80%" />
- <!-- Images are downsized intentionally to improve quality on retina
displays -->
-</p>
-
-Clicking the 'stderr' link of executor 0 displays detailed [standard error
log](spark-standalone.html#monitoring-and-logging)
-in its console.
-
-<p style="text-align: center;">
- <img src="img/webui-exe-thread.png"
- title="Thread Dump"
- alt="Thread Dump"
- width="80%" />
- <!-- Images are downsized intentionally to improve quality on retina
displays -->
-</p>
-
-Clicking the 'Thread Dump' link of executor 0 displays the thread dump of JVM
on executor 0, which is pretty useful
-for performance analysis.
+Each row carries a set of detail links — **Thread Dump**, **Heap
+Histogram**, and **Flame Graph** — that open the corresponding live
+data for that executor in a side panel without leaving the page. The panel
+can be resized by dragging its left edge. The **stderr** and **stdout**
+links open the executor's log files in a new view; the exact location of
+those logs depends on your cluster manager (see
+[Monitoring and Instrumentation](monitoring.html) for details).
## SQL Tab
-If the application executes Spark SQL queries, the SQL tab displays
information, such as the duration,
-jobs, and physical and logical plans for the queries. Here we include a basic
example to illustrate
-this tab:
-{% highlight scala %}
-scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
-df: org.apache.spark.sql.DataFrame = [count: int, name: string]
-scala> df.count
-res0: Long = 3
+### Query Listing
-scala> df.createGlobalTempView("df")
+The SQL tab lists all SQL and DataFrame queries submitted to the Spark
+application. Any DataFrame action that triggers execution (such as `count`,
+`show`, or `write`) shows up here, not only queries written as SQL strings.
+Here is a short example that produces a few entries:
-scala> spark.sql("select name,sum(count) from global_temp.df group by
name").show
-+----+----------+
-|name|sum(count)|
-+----+----------+
-|andy| 3|
-| bob| 2|
-+----+----------+
+{% highlight python %}
+df = spark.createDataFrame([(1, "andy"), (2, "bob"), (2, "andy")], ["count",
"name"])
+df.count()
+df.createOrReplaceTempView("df")
+spark.sql("SELECT name, SUM(count) FROM df GROUP BY name").show()
{% endhighlight %}
<p style="text-align: center;">
@@ -340,44 +282,47 @@ scala> spark.sql("select name,sum(count) from
global_temp.df group by name").sho
<!-- Images are downsized intentionally to improve quality on retina
displays -->
</p>
-Now the above three dataframe/SQL operators are shown in the list. If we click
the
-'show at \<console\>: 24' link of the last query, we will see the DAG and
details of the query execution.
-
-<p style="text-align: center;">
- <img src="img/webui-sql-dag.png"
- title="SQL DAG"
- alt="SQL DAG"
- width="50%" />
- <!-- Images are downsized intentionally to improve quality on retina
displays -->
-</p>
+The listing supports sorting by column, searching, filtering by status,
+and pagination, which makes it easy to locate a specific query in
+long-running applications.
-The query details page displays information about the query execution time,
its duration,
-the list of associated jobs, and the query execution DAG.
-The first block 'WholeStageCodegen (1)' compiles multiple operators
('LocalTableScan' and 'HashAggregate') together into a single Java
-function to improve performance, and metrics like number of rows and spill
size are listed in the block.
-The annotation '(1)' in the block name is the code generation id.
-The second block 'Exchange' shows the metrics on the shuffle exchange,
including
-number of written shuffle records, total data size, etc.
+### SQL Plan Visualization
+Each query in the listing has a graph view of its operators. Every node
+shows the operator name together with its metrics inline, and the edges
+follow the data flow. You can pan and zoom the graph to navigate large
+plans, search for a node by name, and click any node to open a side panel
+with its full details.
<p style="text-align: center;">
- <img src="img/webui-sql-plan.png"
- title="logical plans and the physical plan"
- alt="logical plans and the physical plan"
+ <img src="img/webui-sql-dag.png"
+ title="SQL plan visualization"
+ alt="SQL plan visualization"
width="80%" />
<!-- Images are downsized intentionally to improve quality on retina
displays -->
</p>
-Clicking the 'Details' link on the bottom displays the logical plans and the
physical plan, which
-illustrate how Spark parses, analyzes, optimizes and performs the query.
-Steps in the physical plan subject to whole stage code generation
optimization, are prefixed by a star followed by
-the code generation id, for example: '*(1) LocalTableScan'
+
+### Execution Detail Page
+
+The execution detail page, opened by clicking the **ID** or **Description**
+link of any row in the query listing, gathers everything recorded for a
+single query. The header lists the
+query's submission time, duration, status, description, and the jobs and
+stages associated with it. The
+[SQL Plan Visualization](#sql-plan-visualization) shows the graph of
+operators. At the bottom of the page, a "Details" link expands the full
+text of the parsed, analyzed, and optimized logical plans together with
+the physical plan, useful when you want to see how Spark transformed your
+query during planning.
### SQL metrics
-The metrics of SQL operators are shown in the block of physical operators. The
SQL metrics can be useful
-when we want to dive into the execution details of each operator. For example,
"number of output rows"
-can answer how many rows are output after a Filter operator, "shuffle bytes
written total" in an Exchange
-operator shows the number of bytes written by a shuffle.
+Each node in the [SQL Plan Visualization](#sql-plan-visualization) carries
+its own metrics inline. These metrics are useful when you want to dive into
+the execution details of each operator. For example, `number of output rows`
+shows how many rows pass through a `Filter` operator, and
+`shuffle bytes written` in an `Exchange` shows how much data the
+shuffle wrote.
Here is the list of SQL metrics:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]