[GitHub] [flink] zhuzhurk commented on a diff in pull request #21801: [FLINK-30838][doc] Update documentation about the AdaptiveBatchScheduler

via GitHub Mon, 06 Feb 2023 22:18:23 -0800


zhuzhurk commented on code in PR #21801:
URL: https://github.com/apache/flink/pull/21801#discussion_r1098203733



##########
docs/content.zh/docs/deployment/elastic_scaling.md:
##########
@@ -159,30 +159,27 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批
 
 使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置算子的并行度为 `-1`
+- 不要明确指定算子的并行度
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要：
-- 配置 `jobmanager.scheduler: AdaptiveBatch`
-- 由于 ["只支持所有数据交换都为 BLOCKING 模式的作业"](#局限性-2), 需要将 
[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+当前 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器，无需额外配置。除非用户显式的配置了使用其他调度器，例如 
`jobmanager.scheduler: default`。需要注意的是，由于 ["只支持所有数据交换都为 BLOCKING 
模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 
`ALL-EXCHANGES-BLOCKING`(默认值) 。
 
 除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值。
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值。
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): 
期望每个任务平均处理的数据量大小。请注意，当出现数据倾斜，或者确定的并行度达到最大并行度（由于数据过多）时，一些任务实际处理的数据可能会远远超过这个值。
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
-
-#### 配置算子的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
+- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-min-parallelism): 允许自动设置的并行度最小值。
+- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-max-parallelism): 
允许自动设置的并行度最大值，如果该配置项没有配置将使用默认并行度作为允许自动设置的并行度最大值。
+- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): 
期望每个任务平均处理的数据量大小。请注意，当出现数据倾斜，或者确定的并行度达到最大并行度（由于数据过多）时，一些任务实际处理的数据可能会远远超过这个值。
+- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): 
source 算子的默认并行度
+
+#### 不要明确指定算子的并行度

Review Comment:
   -> 不要指定算子的并行度



##########
docs/content.zh/docs/deployment/elastic_scaling.md:
##########
@@ -159,30 +159,27 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批
 
 使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置算子的并行度为 `-1`
+- 不要明确指定算子的并行度
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要：
-- 配置 `jobmanager.scheduler: AdaptiveBatch`
-- 由于 ["只支持所有数据交换都为 BLOCKING 模式的作业"](#局限性-2), 需要将 
[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+当前 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器，无需额外配置。除非用户显式的配置了使用其他调度器，例如 
`jobmanager.scheduler: default`。需要注意的是，由于 ["只支持所有数据交换都为 BLOCKING 
模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 
`ALL-EXCHANGES-BLOCKING`(默认值) 。
 
 除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值。
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值。
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): 
期望每个任务平均处理的数据量大小。请注意，当出现数据倾斜，或者确定的并行度达到最大并行度（由于数据过多）时，一些任务实际处理的数据可能会远远超过这个值。
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
-
-#### 配置算子的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
+- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-min-parallelism): 允许自动设置的并行度最小值。
+- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-max-parallelism): 
允许自动设置的并行度最大值，如果该配置项没有配置将使用默认并行度作为允许自动设置的并行度最大值。
+- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): 
期望每个任务平均处理的数据量大小。请注意，当出现数据倾斜，或者确定的并行度达到最大并行度（由于数据过多）时，一些任务实际处理的数据可能会远远超过这个值。
+- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): 
source 算子的默认并行度
+
+#### 不要明确指定算子的并行度
+Adaptive Batch Scheduler 只会为用户未指定并行度的算子推导并行度。 所以如果你想算子的并行度被自动推导，需要避免通过算子的 
`setParallelism()` 方法来为其指定并行度。
+除此之外，对于 DataSet 作业还需要进行以下配置：

Review Comment:
   IIUC, There should be an empty new line above, to separate paragraphs.



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide 
parallelisms of operators
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
 - Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Do not specify the parallelism of operators.
   
 #### Configure to use Adaptive Batch Scheduler
-To use Adaptive Batch Scheduler, you need to:
-- Set `jobmanager.scheduler: AdaptiveBatch`.
-- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+At present, the Adaptive Batch Scheduler is the default scheduler for flink 
batch jobs, and no additional configuration is required unless you explicitly 
configured to use other schedulers, such as 'jobmanager. scheduler: default'. 
It should be noted that 
+leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
 
 In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average 
size of data volume to expect each task instance to process. Note that when 
data skew occurs, or the decided parallelism reaches the max parallelism (due 
to too much data), the data actually processed by some tasks may far exceed 
this value.
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
-
-#### Set the parallelism of operators to `-1`
-Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
-- Set `parallelism.default: -1`
-- Set `table.exec.resource.default-parallelism: -1` in SQL jobs.
-- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
-- Don't call `setParallelism()` on 
`StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs.
+- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound 
of allowed parallelism to set adaptively.
+- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound 
of allowed parallelism to set adaptively. Default parallelism will be used as 
upper bound of allowed parallelism if this configuration is not configured.
+- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): The 
average size of data volume to expect each task instance to process. Note that 
when data skew occurs, or the decided parallelism reaches the max parallelism 
(due to too much data), the data actually processed by some tasks may far 
exceed this value.
+- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): The 
default parallelism of data source.
+
+#### Avoid specify the parallelism of operators
+The Adaptive Batch Scheduler only decides the parallelism for operators for 
which the user has not specified the parallelism. So if you want the 
parallelism of the operator to be automatically decided, you need to avoid 
specifying the parallelism for the operator through the 'setParallelism()' 
method.
+In addition, the following configurations are required for DataSet jobs:
+- Set `parallelism.default: -1`.
+- Don't call `setParallelism()` on `ExecutionEnvironment` in DataSet jobs.

Review Comment:
   remove "in DataSet jobs".



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide 
parallelisms of operators
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
 - Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Do not specify the parallelism of operators.

Review Comment:
   Do not specify -> Avoid setting



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide 
parallelisms of operators
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
 - Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Do not specify the parallelism of operators.
   
 #### Configure to use Adaptive Batch Scheduler
-To use Adaptive Batch Scheduler, you need to:
-- Set `jobmanager.scheduler: AdaptiveBatch`.
-- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+At present, the Adaptive Batch Scheduler is the default scheduler for flink 
batch jobs, and no additional configuration is required unless you explicitly 
configured to use other schedulers, such as 'jobmanager. scheduler: default'. 
It should be noted that 

Review Comment:
   flink -> Flink.
   , and no additional -> . No additional
   unless you explicitly configured to use other schedulers -> unless other 
schedulers are explicitly configured
   such as 'jobmanager. scheduler: default' -> e.g. \`jobmanager.scheduler: 
default\`
   It should be noted that -> Note that you need to



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide 
parallelisms of operators
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
 - Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Do not specify the parallelism of operators.
   
 #### Configure to use Adaptive Batch Scheduler
-To use Adaptive Batch Scheduler, you need to:
-- Set `jobmanager.scheduler: AdaptiveBatch`.
-- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+At present, the Adaptive Batch Scheduler is the default scheduler for flink 
batch jobs, and no additional configuration is required unless you explicitly 
configured to use other schedulers, such as 'jobmanager. scheduler: default'. 
It should be noted that 
+leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
 
 In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average 
size of data volume to expect each task instance to process. Note that when 
data skew occurs, or the decided parallelism reaches the max parallelism (due 
to too much data), the data actually processed by some tasks may far exceed 
this value.
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
-
-#### Set the parallelism of operators to `-1`
-Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
-- Set `parallelism.default: -1`
-- Set `table.exec.resource.default-parallelism: -1` in SQL jobs.
-- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
-- Don't call `setParallelism()` on 
`StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs.
+- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound 
of allowed parallelism to set adaptively.
+- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound 
of allowed parallelism to set adaptively. Default parallelism will be used as 
upper bound of allowed parallelism if this configuration is not configured.

Review Comment:
   Default parallelism -> The default parallelism set via 
[\`parallelism.default\`]({{< ref "docs/deployment/config" >}}) or 
\`StreamExecutionEnvironment#setParallelism()\`
   



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide 
parallelisms of operators
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
 - Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Do not specify the parallelism of operators.
   
 #### Configure to use Adaptive Batch Scheduler
-To use Adaptive Batch Scheduler, you need to:
-- Set `jobmanager.scheduler: AdaptiveBatch`.
-- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+At present, the Adaptive Batch Scheduler is the default scheduler for flink 
batch jobs, and no additional configuration is required unless you explicitly 
configured to use other schedulers, such as 'jobmanager. scheduler: default'. 
It should be noted that 
+leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
 
 In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average 
size of data volume to expect each task instance to process. Note that when 
data skew occurs, or the decided parallelism reaches the max parallelism (due 
to too much data), the data actually processed by some tasks may far exceed 
this value.
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
-
-#### Set the parallelism of operators to `-1`
-Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
-- Set `parallelism.default: -1`
-- Set `table.exec.resource.default-parallelism: -1` in SQL jobs.
-- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
-- Don't call `setParallelism()` on 
`StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs.
+- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound 
of allowed parallelism to set adaptively.
+- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound 
of allowed parallelism to set adaptively. Default parallelism will be used as 
upper bound of allowed parallelism if this configuration is not configured.
+- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): The 
average size of data volume to expect each task instance to process. Note that 
when data skew occurs, or the decided parallelism reaches the max parallelism 
(due to too much data), the data actually processed by some tasks may far 
exceed this value.
+- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): The 
default parallelism of data source.
+
+#### Avoid specify the parallelism of operators

Review Comment:
   specify -> setting



##########
docs/content.zh/docs/deployment/speculative_execution.md:
##########
@@ -45,15 +45,15 @@ under the License.
 {{< /hint >}}
 
 ### 启用预测执行
-要启用预测执行，你需要设置以下配置项：
+要启用预测执行，你需要不显式设置配置项`jobmanager.scheduler`或者按如下方式配置：
 - `jobmanager.scheduler: AdaptiveBatch`
     - 因为当前只有 [Adaptive Batch Scheduler]({{< ref 
"docs/deployment/elastic_scaling" >}}#adaptive-batch-scheduler) 支持预测执行.
-- `jobmanager.adaptive-batch-scheduler.speculative.enabled: true`
+- `execution.batch.speculative.enabled: true`

Review Comment:
   I would rework the above part as below:
   ```
   你可以通过以下配置项启用预测执行：
   - `execution.batch.speculative.enabled: true`
   
   需要注意的是，当前只有 [Adaptive Batch Scheduler]({{< ref 
"docs/deployment/elastic_scaling" >}}#adaptive-batch-scheduler) 支持预测执行。不过 Flink 
批作业会默认使用该调度器，除非显式配置了其他调度器。
   ```



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide 
parallelisms of operators
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
 - Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Do not specify the parallelism of operators.
   
 #### Configure to use Adaptive Batch Scheduler
-To use Adaptive Batch Scheduler, you need to:
-- Set `jobmanager.scheduler: AdaptiveBatch`.
-- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+At present, the Adaptive Batch Scheduler is the default scheduler for flink 
batch jobs, and no additional configuration is required unless you explicitly 
configured to use other schedulers, such as 'jobmanager. scheduler: default'. 
It should be noted that 
+leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
 
 In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average 
size of data volume to expect each task instance to process. Note that when 
data skew occurs, or the decided parallelism reaches the max parallelism (due 
to too much data), the data actually processed by some tasks may far exceed 
this value.
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
-
-#### Set the parallelism of operators to `-1`
-Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
-- Set `parallelism.default: -1`
-- Set `table.exec.resource.default-parallelism: -1` in SQL jobs.
-- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
-- Don't call `setParallelism()` on 
`StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs.
+- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound 
of allowed parallelism to set adaptively.
+- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound 
of allowed parallelism to set adaptively. Default parallelism will be used as 
upper bound of allowed parallelism if this configuration is not configured.
+- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): The 
average size of data volume to expect each task instance to process. Note that 
when data skew occurs, or the decided parallelism reaches the max parallelism 
(due to too much data), the data actually processed by some tasks may far 
exceed this value.
+- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< 
ref "docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): The 
default parallelism of data source.
+
+#### Avoid specify the parallelism of operators
+The Adaptive Batch Scheduler only decides the parallelism for operators for 
which the user has not specified the parallelism. So if you want the 
parallelism of the operator to be automatically decided, you need to avoid 
specifying the parallelism for the operator through the 'setParallelism()' 
method.

Review Comment:
   for which the user has not specified the parallelism -> which do not have a 
parallelism set
   the operator -> an operator
   avoid specifying -> avoid setting



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhuzhurk commented on a diff in pull request #21801: [FLINK-30838][doc] Update documentation about the AdaptiveBatchScheduler

Reply via email to