[GitHub] [flink] zhuzhurk commented on a change in pull request #18757: [FLINK-25226][doc] Add documentation about the AdaptiveBatchScheduler

GitBox Mon, 14 Feb 2022 20:52:38 -0800


zhuzhurk commented on a change in pull request #18757:
URL: https://github.com/apache/flink/pull/18757#discussion_r806435641




##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+- 批作业用户可以从并行度调优中解脱出来
+- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
+- SQL作业中的节点也可以分配不同的并行性
+
+### Usage
+
+使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+- 启用 Adaptive Batch Scheduler
+- 配置节点的并行度为 `-1`
+
+#### 启用 Adaptive Batch Scheduler
+为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): 允许设置的并行度最小值
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): 允许设置的并行度最大值

Review comment:
       设置 -> 自动设置

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：

Review comment:
       消耗 -> 消费

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.
+
+### Performance tuning
+
+1. It's recommended to use `Sort Shuffle` and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the network memory consumption from parallelism, so for 
large scale jobs, the possibility of "Insufficient number of network buffers" 
error can be decreased.

Review comment:
       so for -> so that for

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：

Review comment:
       ...的调度器 -> ...的批处理作业调度器

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+- 批作业用户可以从并行度调优中解脱出来
+- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
+- SQL作业中的节点也可以分配不同的并行性
+
+### Usage
+
+使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+- 启用 Adaptive Batch Scheduler
+- 配置节点的并行度为 `-1`
+
+#### 启用 Adaptive Batch Scheduler
+为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): 允许设置的并行度最小值
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): 允许设置的并行度最大值
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): 期望每个任务处理的数据量大小
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): source 
节点的默认并行度
+
+#### 配置节点的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+- 配置 `parallelism.default` 为 `-1`
+- 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
+- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+
+### 性能调优
+
+1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。
+2. 不建议为 [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism) 
配置太大的值，否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量，过多的 subpartition 可能会影响 hash 
shuffle 的性能，或者由于小包影响网络传输的性能。
+
+### 限制
+
+- **ALL-EDGES-BLOCKING batch jobs only**: 目前 Adaptive Batch Scheduler 只支持 
ALL-EDGES-BLOCKING 的批作业。
+- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch 
Scheduler 时，对于 broadcast 边，上游节点发送的数据量和下游节点接收的数据量可能会不相等，这在显示上会困扰用户。细节详见 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)

Review comment:
       这在显示上会困扰用户 -> 这在 Web UI 的显示上可能会困扰用户

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.

Review comment:
       datastream/dataset

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+- 批作业用户可以从并行度调优中解脱出来
+- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
+- SQL作业中的节点也可以分配不同的并行性
+
+### Usage
+
+使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+- 启用 Adaptive Batch Scheduler
+- 配置节点的并行度为 `-1`
+
+#### 启用 Adaptive Batch Scheduler
+为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): 允许设置的并行度最小值

Review comment:
       设置 -> 自动设置

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:

Review comment:
       can -> to

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:

Review comment:
       through -> with

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.
+
+### Performance tuning
+
+1. It's recommended to use `Sort Shuffle` and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the network memory consumption from parallelism, so for 
large scale jobs, the possibility of "Insufficient number of network buffers" 
error can be decreased.
+2. It's not recommended to configure an excessive value for 
[`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism), otherwise it will 
affect the performance. Because this option can affect the number of 
subpartitions produced by upstream tasks, excessive number of subpartitions may 
degrade the performance of hash shuffle and the performance of network 
transmission due to small packets.
+                                                                               
                                                                                
                                                                                
                                       
+### Limitations
+
+- **ALL-EDGES-BLOCKING batch jobs only**: The first version of Adaptive Batch 
Scheduler only supports ALL-EDGES-BLOCKING batch jobs only.

Review comment:
       ALL-EDGES-BLOCKING -> ALL-EXCHANGES-BLOCKING
   
   And maybe add a link to the config option "execution.batch-shuffle-mode" for 
reference?

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+- 批作业用户可以从并行度调优中解脱出来
+- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
+- SQL作业中的节点也可以分配不同的并行性
+
+### Usage

Review comment:
       Usage -> 用法

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs

Review comment:
       Maybe add this doc page to `Operations` -> `Batch`, similar to 
"https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/blocking_shuffle/";?

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.
+
+### Performance tuning
+
+1. It's recommended to use `Sort Shuffle` and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the network memory consumption from parallelism, so for 
large scale jobs, the possibility of "Insufficient number of network buffers" 
error can be decreased.

Review comment:
       network memory consumption -> required network memory

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+- 批作业用户可以从并行度调优中解脱出来
+- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
+- SQL作业中的节点也可以分配不同的并行性
+
+### Usage
+
+使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+- 启用 Adaptive Batch Scheduler
+- 配置节点的并行度为 `-1`
+
+#### 启用 Adaptive Batch Scheduler
+为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): 允许设置的并行度最小值
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): 允许设置的并行度最大值
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): 期望每个任务处理的数据量大小
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): source 
节点的默认并行度
+
+#### 配置节点的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+- 配置 `parallelism.default` 为 `-1`
+- 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
+- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度

Review comment:
       DataStream -> DataSet/DataStream

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.
+
+### Performance tuning
+
+1. It's recommended to use `Sort Shuffle` and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the network memory consumption from parallelism, so for 
large scale jobs, the possibility of "Insufficient number of network buffers" 
error can be decreased.
+2. It's not recommended to configure an excessive value for 
[`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism), otherwise it will 
affect the performance. Because this option can affect the number of 
subpartitions produced by upstream tasks, excessive number of subpartitions may 
degrade the performance of hash shuffle and the performance of network 
transmission due to small packets.
+                                                                               
                                                                                
                                                                                
                                       
+### Limitations
+
+- **ALL-EDGES-BLOCKING batch jobs only**: The first version of Adaptive Batch 
Scheduler only supports ALL-EDGES-BLOCKING batch jobs only.

Review comment:
       there are 2 `only` and either should be removed

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:

Review comment:
       > there are several optional config options that might need adjustment
   
   -> there are several related configuration options that may need adjustment

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.
+
+### Performance tuning
+
+1. It's recommended to use `Sort Shuffle` and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the network memory consumption from parallelism, so for 
large scale jobs, the possibility of "Insufficient number of network buffers" 
error can be decreased.
+2. It's not recommended to configure an excessive value for 
[`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism), otherwise it will 
affect the performance. Because this option can affect the number of 
subpartitions produced by upstream tasks, excessive number of subpartitions may 
degrade the performance of hash shuffle and the performance of network 
transmission due to small packets.
+                                                                               
                                                                                
                                                                                
                                       
+### Limitations
+
+- **ALL-EDGES-BLOCKING batch jobs only**: The first version of Adaptive Batch 
Scheduler only supports ALL-EDGES-BLOCKING batch jobs only.

Review comment:
       The first version of -> At the moment,

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+- 批作业用户可以从并行度调优中解脱出来
+- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
+- SQL作业中的节点也可以分配不同的并行性
+
+### Usage
+
+使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+- 启用 Adaptive Batch Scheduler
+- 配置节点的并行度为 `-1`
+
+#### 启用 Adaptive Batch Scheduler
+为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): 允许设置的并行度最小值
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): 允许设置的并行度最大值
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): 期望每个任务处理的数据量大小
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): source 
节点的默认并行度
+
+#### 配置节点的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+- 配置 `parallelism.default` 为 `-1`
+- 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
+- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+
+### 性能调优
+
+1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。

Review comment:
       这降低了 -> 这样可以降低

##########
File path: docs/content/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -0,0 +1,63 @@
+---
+title: Adaptive Batch Scheduler
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of job 
vertices for batch jobs. If a job vertex is not set with a parallelism, the 
scheduler will decide parallelism for the job vertex according to the size of 
its consumed datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can be vertex level and can better fit 
consumed datasets which have a varying volume size every day
+- Vertices from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for job vertices through Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of job vertices to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to set the 
[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several 
optional config options that might need adjustment when using Adaptive Batch 
Scheduler:
+- [`jobmanager.scheduler.adaptive-batch.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.scheduler.adaptive-batch.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.scheduler.adaptive-batch.source-parallelism.default`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-scheduler-adaptive-batch-source-parallelism-default): The 
default parallelism of source vertices
+
+#### Set the parallelism of job vertices to `-1`
+Adaptive Batch Scheduler will only decide parallelism for job vertices whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of vertices can be decided automatically, you should configure as 
follows:
+- Set `paralleims.default` to `-1`
+- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
+- Don't call `setParallelism()` for operators in datastream jobs.
+
+### Performance tuning
+
+1. It's recommended to use `Sort Shuffle` and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the network memory consumption from parallelism, so for 
large scale jobs, the possibility of "Insufficient number of network buffers" 
error can be decreased.

Review comment:
       > the possibility of "Insufficient number of network buffers" error can 
be decreased
   
   -> "Insufficient number of network buffers" errors are less likely to happen




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhuzhurk commented on a change in pull request #18757: [FLINK-25226][doc] Add documentation about the AdaptiveBatchScheduler

Reply via email to