zhuzhurk commented on code in PR #21801: URL: https://github.com/apache/flink/pull/21801#discussion_r1098203733
########## docs/content.zh/docs/deployment/elastic_scaling.md: ########## @@ -159,30 +159,27 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批 使用 Adaptive Batch Scheduler 自动推导算子的并行度,需要: - 启用 Adaptive Batch Scheduler -- 配置算子的并行度为 `-1` +- 不要明确指定算子的并行度 #### 启用 Adaptive Batch Scheduler -为了启用 Adaptive Batch Scheduler, 你需要: -- 配置 `jobmanager.scheduler: AdaptiveBatch` -- 由于 ["只支持所有数据交换都为 BLOCKING 模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。 +当前 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器,无需额外配置。除非用户显式的配置了使用其他调度器,例如 `jobmanager.scheduler: default`。需要注意的是,由于 ["只支持所有数据交换都为 BLOCKING 模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。 除此之外,使用 Adaptive Batch Scheduler 时,以下相关配置也可以调整: -- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值。 -- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值。 -- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): 期望每个任务平均处理的数据量大小。请注意,当出现数据倾斜,或者确定的并行度达到最大并行度(由于数据过多)时,一些任务实际处理的数据可能会远远超过这个值。 -- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 算子的默认并行度 - -#### 配置算子的并行度为 `-1` -Adaptive Batch Scheduler 只会为用户未指定并行度的算子(并行度为 `-1`)推导并行度。 所以如果你想自动推导算子的并行度,需要进行以下配置: +- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-min-parallelism): 允许自动设置的并行度最小值。 +- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-max-parallelism): 允许自动设置的并行度最大值,如果该配置项没有配置将使用默认并行度作为允许自动设置的并行度最大值。 +- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): 期望每个任务平均处理的数据量大小。请注意,当出现数据倾斜,或者确定的并行度达到最大并行度(由于数据过多)时,一些任务实际处理的数据可能会远远超过这个值。 +- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): source 算子的默认并行度 + +#### 不要明确指定算子的并行度 Review Comment: -> 不要指定算子的并行度 ########## docs/content.zh/docs/deployment/elastic_scaling.md: ########## @@ -159,30 +159,27 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批 使用 Adaptive Batch Scheduler 自动推导算子的并行度,需要: - 启用 Adaptive Batch Scheduler -- 配置算子的并行度为 `-1` +- 不要明确指定算子的并行度 #### 启用 Adaptive Batch Scheduler -为了启用 Adaptive Batch Scheduler, 你需要: -- 配置 `jobmanager.scheduler: AdaptiveBatch` -- 由于 ["只支持所有数据交换都为 BLOCKING 模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。 +当前 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器,无需额外配置。除非用户显式的配置了使用其他调度器,例如 `jobmanager.scheduler: default`。需要注意的是,由于 ["只支持所有数据交换都为 BLOCKING 模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。 除此之外,使用 Adaptive Batch Scheduler 时,以下相关配置也可以调整: -- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值。 -- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值。 -- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): 期望每个任务平均处理的数据量大小。请注意,当出现数据倾斜,或者确定的并行度达到最大并行度(由于数据过多)时,一些任务实际处理的数据可能会远远超过这个值。 -- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 算子的默认并行度 - -#### 配置算子的并行度为 `-1` -Adaptive Batch Scheduler 只会为用户未指定并行度的算子(并行度为 `-1`)推导并行度。 所以如果你想自动推导算子的并行度,需要进行以下配置: +- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-min-parallelism): 允许自动设置的并行度最小值。 +- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-max-parallelism): 允许自动设置的并行度最大值,如果该配置项没有配置将使用默认并行度作为允许自动设置的并行度最大值。 +- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): 期望每个任务平均处理的数据量大小。请注意,当出现数据倾斜,或者确定的并行度达到最大并行度(由于数据过多)时,一些任务实际处理的数据可能会远远超过这个值。 +- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): source 算子的默认并行度 + +#### 不要明确指定算子的并行度 +Adaptive Batch Scheduler 只会为用户未指定并行度的算子推导并行度。 所以如果你想算子的并行度被自动推导,需要避免通过算子的 `setParallelism()` 方法来为其指定并行度。 +除此之外,对于 DataSet 作业还需要进行以下配置: Review Comment: IIUC, There should be an empty new line above, to separate paragraphs. ########## docs/content/docs/deployment/elastic_scaling.md: ########## @@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide parallelisms of operators To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to: - Configure to use Adaptive Batch Scheduler. -- Set the parallelism of operators to `-1`. +- Do not specify the parallelism of operators. #### Configure to use Adaptive Batch Scheduler -To use Adaptive Batch Scheduler, you need to: -- Set `jobmanager.scheduler: AdaptiveBatch`. -- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). +At present, the Adaptive Batch Scheduler is the default scheduler for flink batch jobs, and no additional configuration is required unless you explicitly configured to use other schedulers, such as 'jobmanager. scheduler: default'. It should be noted that +leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). In addition, there are several related configuration options that may need adjustment when using Adaptive Batch Scheduler: -- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. -- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of data source. - -#### Set the parallelism of operators to `-1` -Adaptive Batch Scheduler will only decide parallelism for operators whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of operators to be decided automatically, you should configure as follows: -- Set `parallelism.default: -1` -- Set `table.exec.resource.default-parallelism: -1` in SQL jobs. -- Don't call `setParallelism()` for operators in DataStream/DataSet jobs. -- Don't call `setParallelism()` on `StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs. +- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound of allowed parallelism to set adaptively. +- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound of allowed parallelism to set adaptively. Default parallelism will be used as upper bound of allowed parallelism if this configuration is not configured. +- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. +- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): The default parallelism of data source. + +#### Avoid specify the parallelism of operators +The Adaptive Batch Scheduler only decides the parallelism for operators for which the user has not specified the parallelism. So if you want the parallelism of the operator to be automatically decided, you need to avoid specifying the parallelism for the operator through the 'setParallelism()' method. +In addition, the following configurations are required for DataSet jobs: +- Set `parallelism.default: -1`. +- Don't call `setParallelism()` on `ExecutionEnvironment` in DataSet jobs. Review Comment: remove "in DataSet jobs". ########## docs/content/docs/deployment/elastic_scaling.md: ########## @@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide parallelisms of operators To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to: - Configure to use Adaptive Batch Scheduler. -- Set the parallelism of operators to `-1`. +- Do not specify the parallelism of operators. Review Comment: Do not specify -> Avoid setting ########## docs/content/docs/deployment/elastic_scaling.md: ########## @@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide parallelisms of operators To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to: - Configure to use Adaptive Batch Scheduler. -- Set the parallelism of operators to `-1`. +- Do not specify the parallelism of operators. #### Configure to use Adaptive Batch Scheduler -To use Adaptive Batch Scheduler, you need to: -- Set `jobmanager.scheduler: AdaptiveBatch`. -- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). +At present, the Adaptive Batch Scheduler is the default scheduler for flink batch jobs, and no additional configuration is required unless you explicitly configured to use other schedulers, such as 'jobmanager. scheduler: default'. It should be noted that Review Comment: flink -> Flink. , and no additional -> . No additional unless you explicitly configured to use other schedulers -> unless other schedulers are explicitly configured such as 'jobmanager. scheduler: default' -> e.g. \`jobmanager.scheduler: default\` It should be noted that -> Note that you need to ########## docs/content/docs/deployment/elastic_scaling.md: ########## @@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide parallelisms of operators To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to: - Configure to use Adaptive Batch Scheduler. -- Set the parallelism of operators to `-1`. +- Do not specify the parallelism of operators. #### Configure to use Adaptive Batch Scheduler -To use Adaptive Batch Scheduler, you need to: -- Set `jobmanager.scheduler: AdaptiveBatch`. -- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). +At present, the Adaptive Batch Scheduler is the default scheduler for flink batch jobs, and no additional configuration is required unless you explicitly configured to use other schedulers, such as 'jobmanager. scheduler: default'. It should be noted that +leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). In addition, there are several related configuration options that may need adjustment when using Adaptive Batch Scheduler: -- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. -- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of data source. - -#### Set the parallelism of operators to `-1` -Adaptive Batch Scheduler will only decide parallelism for operators whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of operators to be decided automatically, you should configure as follows: -- Set `parallelism.default: -1` -- Set `table.exec.resource.default-parallelism: -1` in SQL jobs. -- Don't call `setParallelism()` for operators in DataStream/DataSet jobs. -- Don't call `setParallelism()` on `StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs. +- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound of allowed parallelism to set adaptively. +- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound of allowed parallelism to set adaptively. Default parallelism will be used as upper bound of allowed parallelism if this configuration is not configured. Review Comment: Default parallelism -> The default parallelism set via [\`parallelism.default\`]({{< ref "docs/deployment/config" >}}) or \`StreamExecutionEnvironment#setParallelism()\` ########## docs/content/docs/deployment/elastic_scaling.md: ########## @@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide parallelisms of operators To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to: - Configure to use Adaptive Batch Scheduler. -- Set the parallelism of operators to `-1`. +- Do not specify the parallelism of operators. #### Configure to use Adaptive Batch Scheduler -To use Adaptive Batch Scheduler, you need to: -- Set `jobmanager.scheduler: AdaptiveBatch`. -- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). +At present, the Adaptive Batch Scheduler is the default scheduler for flink batch jobs, and no additional configuration is required unless you explicitly configured to use other schedulers, such as 'jobmanager. scheduler: default'. It should be noted that +leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). In addition, there are several related configuration options that may need adjustment when using Adaptive Batch Scheduler: -- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. -- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of data source. - -#### Set the parallelism of operators to `-1` -Adaptive Batch Scheduler will only decide parallelism for operators whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of operators to be decided automatically, you should configure as follows: -- Set `parallelism.default: -1` -- Set `table.exec.resource.default-parallelism: -1` in SQL jobs. -- Don't call `setParallelism()` for operators in DataStream/DataSet jobs. -- Don't call `setParallelism()` on `StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs. +- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound of allowed parallelism to set adaptively. +- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound of allowed parallelism to set adaptively. Default parallelism will be used as upper bound of allowed parallelism if this configuration is not configured. +- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. +- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): The default parallelism of data source. + +#### Avoid specify the parallelism of operators Review Comment: specify -> setting ########## docs/content.zh/docs/deployment/speculative_execution.md: ########## @@ -45,15 +45,15 @@ under the License. {{< /hint >}} ### 启用预测执行 -要启用预测执行,你需要设置以下配置项: +要启用预测执行,你需要不显式设置配置项`jobmanager.scheduler`或者按如下方式配置: - `jobmanager.scheduler: AdaptiveBatch` - 因为当前只有 [Adaptive Batch Scheduler]({{< ref "docs/deployment/elastic_scaling" >}}#adaptive-batch-scheduler) 支持预测执行. -- `jobmanager.adaptive-batch-scheduler.speculative.enabled: true` +- `execution.batch.speculative.enabled: true` Review Comment: I would rework the above part as below: ``` 你可以通过以下配置项启用预测执行: - `execution.batch.speculative.enabled: true` 需要注意的是,当前只有 [Adaptive Batch Scheduler]({{< ref "docs/deployment/elastic_scaling" >}}#adaptive-batch-scheduler) 支持预测执行。不过 Flink 批作业会默认使用该调度器,除非显式配置了其他调度器。 ``` ########## docs/content/docs/deployment/elastic_scaling.md: ########## @@ -161,30 +161,28 @@ The Adaptive Batch Scheduler can automatically decide parallelisms of operators To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to: - Configure to use Adaptive Batch Scheduler. -- Set the parallelism of operators to `-1`. +- Do not specify the parallelism of operators. #### Configure to use Adaptive Batch Scheduler -To use Adaptive Batch Scheduler, you need to: -- Set `jobmanager.scheduler: AdaptiveBatch`. -- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). +At present, the Adaptive Batch Scheduler is the default scheduler for flink batch jobs, and no additional configuration is required unless you explicitly configured to use other schedulers, such as 'jobmanager. scheduler: default'. It should be noted that +leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) unset or explicitly set it to `ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#limitations-2). In addition, there are several related configuration options that may need adjustment when using Adaptive Batch Scheduler: -- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively. -- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. -- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of data source. - -#### Set the parallelism of operators to `-1` -Adaptive Batch Scheduler will only decide parallelism for operators whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of operators to be decided automatically, you should configure as follows: -- Set `parallelism.default: -1` -- Set `table.exec.resource.default-parallelism: -1` in SQL jobs. -- Don't call `setParallelism()` for operators in DataStream/DataSet jobs. -- Don't call `setParallelism()` on `StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs. +- [`execution.batch.adaptive.auto-parallelism.min-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-min-parallelism): The lower bound of allowed parallelism to set adaptively. +- [`execution.batch.adaptive.auto-parallelism.max-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-max-parallelism): The upper bound of allowed parallelism to set adaptively. Default parallelism will be used as upper bound of allowed parallelism if this configuration is not configured. +- [`execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-task): The average size of data volume to expect each task instance to process. Note that when data skew occurs, or the decided parallelism reaches the max parallelism (due to too much data), the data actually processed by some tasks may far exceed this value. +- [`execution.batch.adaptive.auto-parallelism.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-default-source-parallelism): The default parallelism of data source. + +#### Avoid specify the parallelism of operators +The Adaptive Batch Scheduler only decides the parallelism for operators for which the user has not specified the parallelism. So if you want the parallelism of the operator to be automatically decided, you need to avoid specifying the parallelism for the operator through the 'setParallelism()' method. Review Comment: for which the user has not specified the parallelism -> which do not have a parallelism set the operator -> an operator avoid specifying -> avoid setting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org