This is an automated email from the ASF dual-hosted git repository.
RocMarshal pushed a commit to branch release-2.3
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-2.3 by this push:
new 84cd7548d1d [FLINK-38902][docs] Add user instructions and usage
documentation for FLIP-487 (#28075)
84cd7548d1d is described below
commit 84cd7548d1dddcd9dfecd4e49f287fc8554ae57c
Author: Yuepeng Pan <[email protected]>
AuthorDate: Fri May 1 06:17:30 2026 +0800
[FLINK-38902][docs] Add user instructions and usage documentation for
FLIP-487 (#28075)
(cherry picked from commit 21d40d42b8b09c217971439486c688d85516440d)
Co-authored-by: spuru9 <[email protected]>
Co-authored-by: XComp <[email protected]>
Co-authored-by: davidradl <[email protected]>
Co-authored-by: Sergey Nuyanzin <[email protected]>
Co-authored-by: och5351 <[email protected]>
---
docs/content.zh/docs/deployment/elastic_scaling.md | 78 ++++++++++++++++++++++
docs/content/docs/deployment/elastic_scaling.md | 77 +++++++++++++++++++++
.../generated/all_jobmanager_section.html | 2 +-
.../shortcodes/generated/web_configuration.html | 2 +-
.../org/apache/flink/configuration/WebOptions.java | 3 +
5 files changed, 160 insertions(+), 2 deletions(-)
diff --git a/docs/content.zh/docs/deployment/elastic_scaling.md
b/docs/content.zh/docs/deployment/elastic_scaling.md
index d155e70d90f..96d4bb5ffac 100644
--- a/docs/content.zh/docs/deployment/elastic_scaling.md
+++ b/docs/content.zh/docs/deployment/elastic_scaling.md
@@ -189,4 +189,82 @@ cp ./examples/streaming/TopSpeedWindowing.jar lib/
仅支持如下的部署方式:[Application 模式下的 Standalone 部署]({{< ref
"docs/deployment/resource-providers/standalone/overview"
>}}#application-mode)(可以参考[上文](#getting-started))、[Application 模式下的 Docker
部署]({{< ref "docs/deployment/resource-providers/standalone/docker"
>}}#application-mode-on-docker) 以及 [Standalone 的 Kubernetes Application
集群模式]({{< ref "docs/deployment/resource-providers/standalone/kubernetes"
>}}#deploy-application-cluster)。
[Adaptive 调度器的局限性](#limitations-1) 同样也适用于 Reactive 模式.
+
+## Rescale History
+
+Before Flink 2.3, users and developers were unable to inspect the internal
details of `AdaptiveScheduler` rescaling history,
+causing operational inconvenience.
+For instance, users need visibility into specific resource changes,
parallelism adjustments,
+and the time spent on each internal state transition during the rescaling
process.
+This information is crucial for tuning parameters to achieve lower latency and
higher stability in rescaling.
+
+Therefore, Flink community introduced
[FLIP-495](https://cwiki.apache.org/confluence/x/TQr0Ew) to support recording
and storing rescaling history,
+and [FLIP-487](https://cwiki.apache.org/confluence/x/vZCMEw) to enable
querying via the REST API and displaying this history on the Web UI.
+
+You can enable rescale history for stream jobs with the `AdaptiveScheduler`
enabled by setting the following configuration item to a positive integer.
+This value indicates the number of recent rescale records retained for the job.
+
+- [`web.adaptive-scheduler.rescale-history.size`]({{< ref
"docs/deployment/config" >}}#web-adaptive-scheduler-rescale-history-size): `4`
+
+The default value of the configuration option is `0`. When the configuration
value is less than or equal to `0`, this feature will be disabled.
+
+### The Information and Style About Rescale History
+
+Since Flink version 2.3, a page for displaying `Rescales` has been introduced
in the Web UI,
+positioned at the same hierarchical level as the `Checkpoints` page and
featuring a similar style.
+This primarily includes the following sub-pages:
+
+- `Overview`
+ This sub-page displays recent rescale records across various rescale
terminal states,
+ along with fundamental job rescale statistics—such as the total number of
rescales since job startup and the counts of failures or successes.
+ Additionally, the page supports the display of detailed rescale information.
+
+- `History`
+ This sub-page displays abbreviated information for the most recent rescale
records (up to the configured
[`web.adaptive-scheduler.rescale-history.size`]({{< ref
"docs/deployment/config" >}}#web-adaptive-scheduler-rescale-history-size)
limit).
+ Additionally, the page supports the display of detailed rescale information
as outlined below:
+ - The basic information of a rescale
+ - <u>Rescale UUID</u>: The unique ID in a rescale consists of 32
hexadecimal characters (The UUID definition below is identical to the one here).
+ - <u>Attempt ID</u>: The number of rescale attempts triggered on the
same job resource requirements.
+ - <u>Requirements ID</u>: The unique UUID of resource requirements.
+ - <u>Trigger Cause</u>: The reason that triggered a rescale.
+ - <u>Terminal State</u>: The end state of a rescale.
+ - <u>Terminated Reason</u>: The reason of the rescale lifecycle
termination.
+ - <u>Start Time</u>: The start time of a rescale.
+ - <u>Duration</u>: Duration from the start of the rescale to its
completion or until now if the rescale operation hasn't completed, yet.
+ - <u>End Time</u>: The end time of a rescale if the rescale is
terminated, current time else.
+ - The basic attributes and rescale change per `Job Vertex`
+ - <u>ID</u>: The unique UUID of target `Job Vertex`.
+ - <u>Name</u>: The short name of target vertex.
+ - <u>Slot Sharing Group ID</u>: The unique UUID of target `Slot Sharing
Group`.
+ - <u>Previous Parallelism</u>: The parallelism of target vertex before
the current rescale.
+ - <u>Acquired Parallelism</u>: The parallelism of target vertex after
the current rescale.
+ - <u>Sufficient Parallelism</u>: The minimal parallelism of the vertex
that would have allowed the rescale operation to go through even if the desired
parallelism wasn't possible to be reached.
+ - <u>Desired Parallelism</u>: The desired parallelism of a `Job Vertex`
that was specified in the initial change request that triggered the rescale
operation.
+ - The basic attributes and rescale change per `Slot Sharing Group`
+ - <u>Slot Sharing Group ID</u>: The UUID of the `Slot Sharing Group` to
which target slot belongs.
+ - <u>Slot Sharing Group Name</u>: The name of the `Slot Sharing Group`
to which the slot belongs.
+ - <u>Previous Slot</u>: The number of slots before the rescale.
+ - <u>Acquired Slot</u>: The number of slots after the rescale.
+ - <u>Desired Slot</u>: The desired number of slots of the rescale.
+ - <u>Sufficient Slot</u>: The minimal number of slots to deploy tasks in
the rescale.
+ - <u>Request Profile</u>: The request resource profile of the `Slot
Sharing Group` in the rescale.
+ - <u>Acquired Profile</u>: The acquired resource profile of the `Slot
Sharing Group` in the rescale.
+ - The internal `Scheduler State History`] of `AdaptiveScheduler` within a
rescale (see [`AdaptiveScheduler states in
FLIP-160`](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=173083547#FLIP160:AdaptiveScheduler-Statemachineofthescheduler)
for further details)
+ - <u>State</u>: The scheduler state name.
+ - <u>Enter Time</u>: The time to enter the state.
+ - <u>Leave Time</u>: The time to leave the state.
+ - <u>Duration</u>: Time spent in the state (Leave Time − Enter Time).
+ - <u>Exception</u>: The exception information about current rescale
within the state.
+- `Summary`
+ This sub-page displays the total number of rescale events that have occurred
since the job was launched,
+ along with the respective counts of failures and successes.
+ Additionally, it provides statistical summaries of the rescale history,
+ such as rescale duration statistics categorized by rescale status, including
`Min`, `Max`, `Avg`, and `P50` metrics, etc.
+- `Configuration`
+ This sub-page displays the relevant parameter values used by the
`AdaptiveScheduler` during rescaling operations for the current streaming job.
+
+### More details
+
+See the [FLIP-495](https://cwiki.apache.org/confluence/x/TQr0Ew) and
[FLIP-487](https://cwiki.apache.org/confluence/x/vZCMEw) for more details.
+
{{< top >}}
diff --git a/docs/content/docs/deployment/elastic_scaling.md
b/docs/content/docs/deployment/elastic_scaling.md
index d0cdfdd06ca..cfe125067f9 100644
--- a/docs/content/docs/deployment/elastic_scaling.md
+++ b/docs/content/docs/deployment/elastic_scaling.md
@@ -198,4 +198,81 @@ Since Reactive Mode is a new, experimental feature, not
all features supported b
The [limitations of Adaptive Scheduler](#limitations-1) also apply to Reactive
Mode.
+## Rescale History
+
+Before Flink 2.3, users and developers were unable to inspect the internal
details of `AdaptiveScheduler` rescaling history,
+causing operational inconvenience.
+For instance, users need visibility into specific resource changes,
parallelism adjustments,
+and the time spent on each internal state transition during the rescaling
process.
+This information is crucial for tuning parameters to achieve lower latency and
higher stability in rescaling.
+
+Therefore, Flink community introduced
[FLIP-495](https://cwiki.apache.org/confluence/x/TQr0Ew) to support recording
and storing rescaling history,
+and [FLIP-487](https://cwiki.apache.org/confluence/x/vZCMEw) to enable
querying via the REST API and displaying this history on the Web UI.
+
+You can enable rescale history for stream jobs with the `AdaptiveScheduler`
enabled by setting the following configuration item to a positive integer.
+This value indicates the number of recent rescale records retained for the job.
+
+- [`web.adaptive-scheduler.rescale-history.size`]({{< ref
"docs/deployment/config" >}}#web-adaptive-scheduler-rescale-history-size): `4`
+
+The default value of the configuration option is `0`. When the configuration
value is less than or equal to `0`, this feature will be disabled.
+
+### The Information and Style About Rescale History
+
+Since Flink version 2.3, a page for displaying `Rescales` has been introduced
in the Web UI,
+positioned at the same hierarchical level as the `Checkpoints` page and
featuring a similar style.
+This primarily includes the following sub-pages:
+
+- `Overview`
+ This sub-page displays recent rescale records across various rescale
terminal states,
+ along with fundamental job rescale statistics—such as the total number of
rescales since job startup and the counts of failures or successes.
+ Additionally, the page supports the display of detailed rescale information.
+
+- `History`
+ This sub-page displays abbreviated information for the most recent rescale
records (up to the configured
[`web.adaptive-scheduler.rescale-history.size`]({{< ref
"docs/deployment/config" >}}#web-adaptive-scheduler-rescale-history-size)
limit).
+ Additionally, the page supports the display of detailed rescale information
as outlined below:
+ - The basic information of a rescale
+ - <u>Rescale UUID</u>: The unique ID in a rescale consists of 32
hexadecimal characters (The UUID definition below is identical to the one here).
+ - <u>Attempt ID</u>: The number of rescale attempts triggered on the
same job resource requirements.
+ - <u>Requirements ID</u>: The unique UUID of resource requirements.
+ - <u>Trigger Cause</u>: The reason that triggered a rescale.
+ - <u>Terminal State</u>: The end state of a rescale.
+ - <u>Terminated Reason</u>: The reason of the rescale lifecycle
termination.
+ - <u>Start Time</u>: The start time of a rescale.
+ - <u>Duration</u>: Duration from the start of the rescale to its
completion or until now if the rescale operation hasn't completed, yet.
+ - <u>End Time</u>: The end time of a rescale if the rescale is
terminated, current time else.
+ - The basic attributes and rescale change per `Job Vertex`
+ - <u>ID</u>: The unique UUID of target `Job Vertex`.
+ - <u>Name</u>: The short name of target vertex.
+ - <u>Slot Sharing Group ID</u>: The unique UUID of target `Slot Sharing
Group`.
+ - <u>Previous Parallelism</u>: The parallelism of target vertex before
the current rescale.
+ - <u>Acquired Parallelism</u>: The parallelism of target vertex after
the current rescale.
+ - <u>Sufficient Parallelism</u>: The minimal parallelism of the vertex
that would have allowed the rescale operation to go through even if the desired
parallelism wasn't possible to be reached.
+ - <u>Desired Parallelism</u>: The desired parallelism of a `Job Vertex`
that was specified in the initial change request that triggered the rescale
operation.
+ - The basic attributes and rescale change per `Slot Sharing Group`
+ - <u>Slot Sharing Group ID</u>: The UUID of the `Slot Sharing Group` to
which target slot belongs.
+ - <u>Slot Sharing Group Name</u>: The name of the `Slot Sharing Group`
to which the slot belongs.
+ - <u>Previous Slot</u>: The number of slots before the rescale.
+ - <u>Acquired Slot</u>: The number of slots after the rescale.
+ - <u>Desired Slot</u>: The desired number of slots of the rescale.
+ - <u>Sufficient Slot</u>: The minimal number of slots to deploy tasks in
the rescale.
+ - <u>Request Profile</u>: The request resource profile of the `Slot
Sharing Group` in the rescale.
+ - <u>Acquired Profile</u>: The acquired resource profile of the `Slot
Sharing Group` in the rescale.
+ - The internal `Scheduler State History`] of `AdaptiveScheduler` within a
rescale (see [`AdaptiveScheduler states in
FLIP-160`](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=173083547#FLIP160:AdaptiveScheduler-Statemachineofthescheduler)
for further details)
+ - <u>State</u>: The scheduler state name.
+ - <u>Enter Time</u>: The time to enter the state.
+ - <u>Leave Time</u>: The time to leave the state.
+ - <u>Duration</u>: Time spent in the state (Leave Time − Enter Time).
+ - <u>Exception</u>: The exception information about current rescale
within the state.
+- `Summary`
+ This sub-page displays the total number of rescale events that have occurred
since the job was launched,
+ along with the respective counts of failures and successes.
+ Additionally, it provides statistical summaries of the rescale history,
+ such as rescale duration statistics categorized by rescale status, including
`Min`, `Max`, `Avg`, and `P50` metrics, etc.
+- `Configuration`
+ This sub-page displays the relevant parameter values used by the
`AdaptiveScheduler` during rescaling operations for the current streaming job.
+
+### More details
+
+See the [FLIP-495](https://cwiki.apache.org/confluence/x/TQr0Ew) and
[FLIP-487](https://cwiki.apache.org/confluence/x/vZCMEw) for more details.
+
{{< top >}}
diff --git a/docs/layouts/shortcodes/generated/all_jobmanager_section.html
b/docs/layouts/shortcodes/generated/all_jobmanager_section.html
index 87d8875a6df..49ccf35c8da 100644
--- a/docs/layouts/shortcodes/generated/all_jobmanager_section.html
+++ b/docs/layouts/shortcodes/generated/all_jobmanager_section.html
@@ -162,7 +162,7 @@
<td><h5>web.adaptive-scheduler.rescale-history.size</h5></td>
<td style="word-wrap: break-word;">0</td>
<td>Integer</td>
- <td>The maximum number of the rescale records per job whose
scheduler is <code class="highlighter-rouge">Adaptive</code>. The feature will
be disabled when the configuration value is smaller or equals to 0.</td>
+ <td>The maximum number of the rescale records per job whose
scheduler is <code class="highlighter-rouge">Adaptive</code>. The feature will
be disabled when the configuration value is smaller or equals to 0.<br />Note
that high numbers may cause memory issues on the JobManager side.</td>
</tr>
<tr>
<td><h5>web.exception-history-size</h5></td>
diff --git a/docs/layouts/shortcodes/generated/web_configuration.html
b/docs/layouts/shortcodes/generated/web_configuration.html
index 38cf3c34a80..9b9c065278d 100644
--- a/docs/layouts/shortcodes/generated/web_configuration.html
+++ b/docs/layouts/shortcodes/generated/web_configuration.html
@@ -18,7 +18,7 @@
<td><h5>web.adaptive-scheduler.rescale-history.size</h5></td>
<td style="word-wrap: break-word;">0</td>
<td>Integer</td>
- <td>The maximum number of the rescale records per job whose
scheduler is <code class="highlighter-rouge">Adaptive</code>. The feature will
be disabled when the configuration value is smaller or equals to 0.</td>
+ <td>The maximum number of the rescale records per job whose
scheduler is <code class="highlighter-rouge">Adaptive</code>. The feature will
be disabled when the configuration value is smaller or equals to 0.<br />Note
that high numbers may cause memory issues on the JobManager side.</td>
</tr>
<tr>
<td><h5>web.cancel.enable</h5></td>
diff --git
a/flink-core/src/main/java/org/apache/flink/configuration/WebOptions.java
b/flink-core/src/main/java/org/apache/flink/configuration/WebOptions.java
index ee5b7391ab3..73da95e19ec 100644
--- a/flink-core/src/main/java/org/apache/flink/configuration/WebOptions.java
+++ b/flink-core/src/main/java/org/apache/flink/configuration/WebOptions.java
@@ -153,6 +153,9 @@ public class WebOptions {
"The maximum number of the rescale
records per job whose scheduler is %s. "
+ "The feature will be
disabled when the configuration value is smaller or equals to 0.",
code(JobManagerOptions.SchedulerType.Adaptive.name()))
+ .linebreak()
+ .text(
+ "Note that high numbers may cause
memory issues on the JobManager side.")
.build());
/** Timeout for asynchronous operations by the web monitor in
milliseconds. */