[GitHub] [flink] PatrickRen commented on a change in pull request #15767: [FLINK-22393][docs-zh] Translate the page of "Execution Mode (Batch/Streaming)" into Chinese

GitBox Thu, 06 May 2021 21:37:06 -0700


PatrickRen commented on a change in pull request #15767:
URL: https://github.com/apache/flink/pull/15767#discussion_r627879221




##########
File path: docs/content.zh/docs/dev/datastream/execution_mode.md
##########
@@ -24,125 +24,70 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Execution Mode (Batch/Streaming)
+# 执行模式（流/批）
+DataStream API 支持不同的运行时执行模式，你可以根据你的用例需要和作业特点进行选择。
 
-The DataStream API supports different runtime execution modes from which you
-can choose depending on the requirements of your use case and the
-characteristics of your job.
+DataStream API 
有一种”经典“的执行行为，我们称之为`流（STREAMING）`执行模式。这种模式适用于需要连续增量处理，而且预计无限期保持在线的无边界作业。
 
-There is the "classic" execution behavior of the DataStream API, which we call
-`STREAMING` execution mode. This should be used for unbounded jobs that require
-continuous incremental processing and are expected to stay online indefinitely.
+此外，还有一种批式执行模式，我们称之为`批（BATCH）`执行模式。这种执行作业的方式更容易让人联想到批处理框架，比如 
MapReduce。这种执行模式适用于有一个已知的固定输入，而且不会连续运行的有边界作业。
 
-Additionally, there is a batch-style execution mode that we call `BATCH`
-execution mode. This executes jobs in a way that is more reminiscent of batch
-processing frameworks such as MapReduce. This should be used for bounded jobs
-for which you have a known fixed input and which do not run continuously.
+Apache Flink 对流处理和批处理统一方法，意味着无论配置何种执行模式，在有界输入上执行的 DataStream 应用都会产生相同的*最终* 
结果。重要的是要注意*最终* 
在这里是什么意思：一个在`流`模式执行的作业可能会产生增量更新（想想数据库中的插入（upsert）操作），而`批`作业只在最后产生一个最终结果。尽管计算方法不同，只要呈现方式得当，最终结果会是相同的。
 
-Apache Flink's unified approach to stream and batch processing means that a
-DataStream application executed over bounded input will produce the same
-*final* results regardless of the configured execution mode. It is important to
-note what *final* means here: a job executing in `STREAMING` mode might produce
-incremental updates (think upserts in a database) while a `BATCH` job would
-only produce one final result at the end. The final result will be the same if
-interpreted correctly but the way to get there can be different.
+通过启用`批`执行，我们允许 Flink 应用只有在我们知道输入是有边界的时侯才会使用到的额外的优化。例如，可以使用不同的关联（join）/ 
聚合（aggregation）策略，允许实现更高效的任务调度和故障恢复行为的不同 shuffle。下面我们将介绍一些执行行为的细节。
 
-By enabling `BATCH` execution, we allow Flink to apply additional optimizations
-that we can only do when we know that our input is bounded. For example,
-different join/aggregation strategies can be used, in addition to a different
-shuffle implementation that allows more efficient task scheduling and failure
-recovery behavior. We will go into some of the details of the execution
-behavior below.
+## 什么时候可以/应该使用批执行模式？
 
-## When can/should I use BATCH execution mode?
+`批`执行模式只能用于 _有边界_ 的作业/Flink 
程序。边界是数据源的一个属性，告诉我们在执行前，来自该数据源的所有输入是否都是已知的，或者是否会有新的数据出现，可能是无限的。而对一个作业来说，如果它的所有源都是有边界的，则它就是有边界的，否则就是无边界的。
 
-The `BATCH` execution mode can only be used for Jobs/Flink Programs that are
-_bounded_. Boundedness is a property of a data source that tells us whether all
-the input coming from that source is known before execution or whether new data
-will show up, potentially indefinitely. A job, in turn, is bounded if all its
-sources are bounded, and unbounded otherwise.
+而`流`执行模式，既可用于有边界任务，也可用于无边界任务。
 
-`STREAMING` execution mode, on the other hand, can be used for both bounded and
-unbounded jobs.
+一般来说，在你的程序是有边界的时候，你应该使用`批`执行模式，因为这样做会更高效。当你的程序是无边界的时候，你必须使用`流`执行模式，因为只有这种模式足够通用，能够处理连续的数据流。
 
-As a rule of thumb, you should be using `BATCH` execution mode when your 
program
-is bounded because this will be more efficient. You have to use `STREAMING`
-execution mode when your program is unbounded because only this mode is general
-enough to be able to deal with continuous data streams.
+一个明显的例外是当你想使用一个有边界作业去自展一些作业状态，并将状态使用在之后的无边界作业的时候。例如，通过`流`模式运行一个有边界作业，取一个 
savepoint，然后在一个无边界作业上恢复这个 savepoint。这是一个非常特殊的用例，当我们允许将 savepoint 
作为`批`执行作业的附加输出时，这个用例可能很快就会过时。
 
-One obvious outlier is when you want to use a bounded job to bootstrap some job
-state that you then want to use in an unbounded job. For example, by running a
-bounded job using `STREAMING` mode, taking a savepoint, and then restoring that
-savepoint on an unbounded job. This is a very specific use case and one that
-might soon become obsolete when we allow producing a savepoint as additional
-output of a `BATCH` execution job.
+另一个你可能会使用`流`模式运行有边界作业的情况是当你为最终会在无边界数据源写测试代码的时候。对于测试来说，在这些情况下使用有边界数据源可能更自然。
 
-Another case where you might run a bounded job using `STREAMING` mode is when
-writing tests for code that will eventually run with unbounded sources. For
-testing it can be more natural to use a bounded source in those cases.
+## 配置批执行模式
 
-## Configuring BATCH execution mode
+执行模式可以通过 `execute.runtim-mode` 设置来配置。有三种可选的值：

Review comment:
       execute.runtim"e" -mode

##########
File path: docs/content.zh/docs/dev/datastream/execution_mode.md
##########
@@ -161,235 +106,125 @@ source.name("source")
        .sinkTo(...).name("sink");
 ```
 
-Operations that imply a 1-to-1 connection pattern between operations, such as
-`map()`, `flatMap()`, or `filter()` can just forward data straight to the next
-operation, which allows these operations to be chained together. This means
-that Flink would not normally insert a network shuffle between them.
+包含 1-to-1 连接模式的操作，比如 `map()`、 `flatMap()` 或 
`filter()`，可以直接将数据转发到下一个操作，这使得这些操作可以被链接在一起。这意味着 Flink 一般不会在他们之间插入网络 shuffle。
 
-Operation such as `keyBy()` or `rebalance()` on the other hand require data to
-be shuffled between different parallel instances of tasks. This induces a
-network shuffle.
+而像 `keyBy()` 或者 `rebalance()` 这样需要在不同的任务并行实例之间进行数据 shuffle 的操作，就会引起网络 shuffle。
 
-For the above example Flink would group operations together as tasks like this:
+对于上面的例子，Flink 会将操作分组为这些任务:
 
-- Task1: `source`, `map1`, and `map2`
-- Task2: `map3`, `map4`
-- Task3: `map5`, `map6`, and `sink`
+- 任务1: `source`、 `map1` 和 `map2`
+- 任务2: `map3` 和 `map4`
+- 任务3: `map5` 、 `map6` 和 `sink`
 
-And we have a network shuffle between Tasks 1 and 2, and also Tasks 2 and 3.
-This is a visual representation of that job:
+我们在任务1到任务2、任务2到任务3之间各有一次网络 shuffle。这是该作业的可视化表示：
 
 {{< img src="/fig/datastream-example-job-graph.svg" alt="Example Job Graph" >}}
 
-#### STREAMING Execution Mode
-
-In `STREAMING` execution mode, all tasks need to be online/running all the
-time.  This allows Flink to immediately process new records through the whole
-pipeline, which we need for continuous and low-latency stream processing. This
-also means that the TaskManagers that are allotted to a job need to have enough
-resources to run all the tasks at the same time.
-
-Network shuffles are _pipelined_, meaning that records are immediately sent to
-downstream tasks, with some buffering on the network layer. Again, this is
-required because when processing a continuous stream of data there are no
-natural points (in time) where data could be materialized between tasks (or
-pipelines of tasks). This contrasts with `BATCH` execution mode where
-intermediate results can be materialized, as explained below.
-
-#### BATCH Execution Mode
-
-In `BATCH` execution mode, the tasks of a job can be separated into stages that
-can be executed one after another. We can do this because the input is bounded
-and Flink can therefore fully process one stage of the pipeline before moving
-on to the next. In the above example the job would have three stages that
-correspond to the three tasks that are separated by the shuffle barriers.
-
-Instead of sending records immediately to downstream tasks, as explained above
-for `STREAMING` mode, processing in stages requires Flink to materialize
-intermediate results of tasks to some non-ephemeral storage which allows
-downstream tasks to read them after upstream tasks have already gone off line.
-This will increase the latency of processing but comes with other interesting
-properties. For one, this allows Flink to backtrack to the latest available
-results when a failure happens instead of restarting the whole job. Another
-side effect is that `BATCH` jobs can execute on fewer resources (in terms of
-available slots at TaskManagers) because the system can execute tasks
-sequentially one after the other.
-
-TaskManagers will keep intermediate results at least as long as downstream
-tasks have not consumed them. (Technically, they will be kept until the
-consuming *pipelined regions* have produced their output.) After
-that, they will be kept for as long as space allows in order to allow the
-aforementioned backtracking to earlier results in case of a failure.
+#### 流执行模式
+
+在`流`执行模式下，所有任务需要一直在线/运行。这使得 
Flink可以通过整个管道立即处理新的记录，以达到我们需要的连续和低延迟的流处理。这同样意味着分配给某个作业的 TaskManagers 
需要有足够的资源来同时运行所有的任务。
+
+网络 shuffle 是 _流水线_ 
式的，这意味着记录会立即发送给下游任务，在网络层上进行一些缓冲。同样，这也是必须的，因为当处理连续的数据流时，在任务（或任务管道）之间没有可以实体化的自然数据点（时间点）。这与`批`执行模式形成了鲜明的对比，在`批`执行模式下，中间的结果可以被实体化，如下所述。
+
+#### 批执行模式
+
+在`批`执行模式下，一个作业的任务可以被分离成可以一个接一个执行的阶段。我们之所以能做到这一点，是因为输入是有边界的，因此 Flink 
可以在进入下一个阶段之前完全处理管道中的一个阶段。在上面的例子中，工作会有三个阶段，对应着被 shuffle 界线分开的三个任务。
+
+不同于上文所介绍的`流`模式立即向下游任务发送记录，分阶段处理要求 Flink 
将任务的中间结果实体化到一些非永久存储中，让下游任务在上游任务已经下线后再读取。这将增加处理的延迟，但也会带来其他有趣的特性。其一，这允许 Flink 
在故障发生时回溯到最新的可用结果，而不是重新启动整个任务。其二，`批`作业可以在更少的资源上执行（就 TaskManagers 
的可用槽而言），因为系统可以一个接一个地顺序执行任务。
+
+TaskManagers 
将至少在下游任务开始消费它们前保留中间结果（从技术上讲，它们将被保留到消费的*流水线区域*产生它们的输出为止）。在这之后，只要空间允许，它们就会被保留，以便在失败的情况下，可以回溯到前面涉及的结果。
 
 ### State Backends / State
 
-In `STREAMING` mode, Flink uses a [StateBackend]({{< ref 
"docs/dev/datastream/fault-tolerance/state_backends" >}}) to control how state 
is stored and how
-checkpointing works.
+在`流`模式下，Flink 使用 [StateBackend]({{< ref 
"docs/dev/datastream/fault-tolerance/state_backends" >}}) 来控制状态的存储方式和检查点的工作方式。
 
-In `BATCH` mode, the configured state backend is ignored. Instead, the input of
-a keyed operation is grouped by key (using sorting) and then we process all
-records of a key in turn. This allows keeping only the state of only one key at
-the same time. State for a given key will be discarded when moving on to the
-next key.
+在`批`模式下，配置的 state backend 被忽略。取而代之的是，keyed 
操作的输入按键分组（使用排序），然后我们依次处理一个键的所有记录。这样就可以在同一时间只保留一个键的状态。当进行到下一个键时，一个给定键的状态将被丢弃。
 
-See [FLIP-140](https://cwiki.apache.org/confluence/x/kDh4CQ) for background
-information on this.
+关于这方面的背景信息，请参见 [FLIP-140](https://cwiki.apache.org/confluence/x/kDh4CQ)。
 
-### Order of Processing
+### 处理顺序
 
-The order in which records are processed in operators or user-defined 
functions (UDFs) can differ between `BATCH` and `STREAMING` execution.
+在`批`执行和`流`执行中，算子或用户自定义函数（UDFs）处理记录的顺序可能不同。
 
-In `STREAMING` mode, user-defined functions should not make any assumptions 
about incoming records' order.
-Data is processed as soon as it arrives.
+在`流`模式下，用户自定义函数不应该对传入记录的顺序做任何假设。数据一到达就被处理。
 
-In `BATCH` execution mode, there are some operations where Flink guarantees 
order. 
-The ordering can be a side effect of the particular task scheduling,
-network shuffle, and state backend (see above), or a conscious choice by the 
system.
+在`批`执行模式下，Flink 通过一些操作确保顺序。排序可以是特定调度任务、网络 shuffle、上文提到的 state backend 
或是系统有意识选择的副作用。
 
-There are three general types of input that we can differentiate:
+我们可以将常见输入类型分为三类：
 
-- _broadcast input_: input from a broadcast stream (see also [Broadcast
-  State]({{< ref "docs/dev/datastream/fault-tolerance/broadcast_state" >}}))
-- _regular input_: input that is neither broadcast nor keyed
-- _keyed input_: input from a `KeyedStream`
 
-Functions, or Operators, that consume multiple input types will process them 
in the following order:
+- _广播输入（broadcast input)_： 从广播流输入（参见 [广播状态（Broadcast State）]({{< ref 
"docs/dev/datastream/fault-tolerance/broadcast_state" >}}))
+- _常规输入（regular input）_： 从广播或 keyed 输入
+- _keyed 输入（keyed input）_： 从 `KeyedStream` 输入
 
-- broadcast inputs are processed first
-- regular inputs are processed second
-- keyed inputs are processed last
+消费多种类型输入的函数或是算子可以使用以下顺序处理：
 
-For functions that consume from multiple regular or broadcast inputs &mdash; 
such as a `CoProcessFunction` &mdash; Flink has the right to process data from 
any input of that type in any order.
+- 广播输入第一个处理
+- 常规输入第二个处理
+- keyed 输入最后处理
 
-For functions that consume from multiple keyed inputs &mdash; such as a 
`KeyedCoProcessFunction` &mdash; Flink processes all records for a single key 
from all keyed inputs before moving on to the next. 
+对于从多个常规或广播输入进行消费的函数 &mdash; 比如 `CoProcessFunction` &mdash; Flink 
有权从任一输入以任意顺序处理数据。
 
+对于从多个keyed输入进行消费的函数 &mdash; 比如 `KeyedCoProcessFunction` &mdash; Flink 
先处理单一键中的所有记录再处理下一个。
 
-### Event Time / Watermarks
 
-When it comes to supporting [event time]({{< ref 
"docs/dev/datastream/event-time/generating_watermarks" >}}), Flink’s
-streaming runtime builds on the pessimistic assumption that events may come
-out-of-order, _i.e._ an event with timestamp `t` may come after an event with
-timestamp `t+1`. Because of this, the system can never be sure that no more
-elements with timestamp `t < T` for a given timestamp `T` can come in the
-future. To amortise the impact of this out-of-orderness on the final result
-while making the system practical, in `STREAMING` mode, Flink uses a heuristic
-called [Watermarks]({{< ref "docs/concepts/time" 
>}}#event-time-and-watermarks).
-A watermark with timestamp `T` signals that no element with timestamp `t < T` 
will follow.
+### 事件时间/水印
 
-In `BATCH` mode, where the input dataset is known in advance, there is no need
-for such a heuristic as, at the very least, elements can be sorted by timestamp
-so that they are processed in temporal order. For readers familiar with
-streaming, in `BATCH` we can assume “perfect watermarks”.
+在支持[事件时间]({{< ref "docs/dev/datastream/event-time/generating_watermarks" 
>}})方面，Flink 的流运行时间建立在一个事件可能是乱序到来的悲观假设上的，即一个时间戳 `t` 的事件可能会在一个时间戳 `t+1` 
的事件之后出现。因为如此，系统永远无法确定在给定的时间戳 `T` 下，未来不会再有时间戳 `t < T` 
的元素出现。为了摊平这种失序性对最终结果的影响，同时使系统实用，在`流`模式下，Flink 使用了一种名为 [Watermarks]({{< ref 
"docs/concepts/time" >}}#event-time-and-watermarks) 的启发式方法。一个带有时间戳 `T` 
的水印标志着再没有时间戳 `t < T` 的元素跟进。
 
-Given the above, in `BATCH` mode, we only need a `MAX_WATERMARK` at the end of
-the input associated with each key, or at the end of input if the input stream
-is not keyed. Based on this scheme, all registered timers will fire at the *end
-of time* and user-defined `WatermarkAssigners` or `WatermarkGenerators` are
-ignored. Specifying a `WatermarkStrategy` is still important, though, because
-its `TimestampAssigner` will still be used to assign timestamps to records.
+在`批`模式下，输入的数据集是事先已知的，不需要这样的启发式方法，因为至少可以按照时间戳对元素进行排序，从而按照时间顺序进行处理。对于熟悉流的读者来说，在`批`中，我们可以假设”完美的
 Watermark“。
 
-### Processing Time
+综上所述，在`批`模式下，我们只需要在输入的末尾有一个与每个键相关的 `MAX_WATERMARK`，如果输入流没有键，则在输入的末尾需要一个 
`MAX_WATERMARK`。基于这个方案，所有注册的定时器都会在*时间结束*时触发，用户定义的 `WatermarkAssigners` 或 
`WatermarkStrategies` 会被忽略。但细化一个 `WatermarkStrategy` 仍然是重要的，因为它的 
`TimestampAssigner` 仍然会被用来给记录分配时间戳。
 
-Processing Time is the wall-clock time on the machine that a record is
-processed, at the specific instance that the record is being processed. Based
-on this definition, we see that the results of a computation that is based on
-processing time are not reproducible. This is because the same record processed
-twice will have two different timestamps.
+### 处理时间
 
-Despite the above, using processing time in `STREAMING` mode can be useful. The
-reason has to do with the fact that streaming pipelines often ingest their
-unbounded input in *real time* so there is a correlation between event time and
-processing time. In addition, because of the above, in `STREAMING` mode `1h` in
-event time can often be almost `1h` in processing time, or wall-clock time. So
-using processing time can be used for early (incomplete) firings that give
-hints about the expected results.
+处理时间是指在处理记录的具体实例上，处理记录的机器上的挂钟时间。根据这个定义，我们知道基于处理时间的计算结果是不可重复的。因为同一条记录被处理两次，会有两个不同的时间戳。
 
-This correlation does not exist in the batch world where the input dataset is
-static and known in advance.  Given this, in `BATCH` mode we allow users to
-request the current processing time and register processing time timers, but,
-as in the case of Event Time, all the timers are going to fire at the end of
-the input.
+尽管如此，在`流`模式下处理时间还是很有用的。原因在于因为流式管道从 *真实时间* 
摄取无边界输入，所以事件时间和处理时间之间存在相关性。此外，由于上述原因，在`流`模式下事件时间的`1小时`也往往可以几乎是处理时间，或者叫挂钟时间的`1小时`。所以使用处理时间可以用于早期（不完全）触发，给出预期结果的提示。
 
-Conceptually, we can imagine that processing time does not advance during the
-execution of a job and we fast-forward to the *end of time* when the whole
-input is processed.
+在批处理世界中，这种相关性并不存在，因为在批处理世界中，输入的数据集是静态的，是预先知道的。鉴于此，在`批`模式中，我们允许用户请求当前的处理时间，并注册处理时间计时器，但与事件时间的情况一样，所有的计时器都要在输入结束时触发。
 
-### Failure Recovery
+在概念上，我们可以想象，在作业执行过程中，处理时间不会提前，当整个输入处理完毕后，我们会快进到时间结束。
 
-In `STREAMING` execution mode, Flink uses checkpoints for failure recovery.
-Take a look at the [checkpointing documentation]({{< ref 
"docs/dev/datastream/fault-tolerance/checkpointing" >}}) for hands-on 
documentation about this and
-how to configure it. There is also a more introductory section about [fault
-tolerance via state snapshots]({{< ref "docs/learn-flink/fault_tolerance" >}}) 
that
-explains the concepts at a higher level.
+<a name="failure-recovery"></a> 
+### 故障恢复
 
-One of the characteristics of checkpointing for failure recovery is that Flink
-will restart all the running tasks from a checkpoint in case of a failure. This
-can be more costly than what we have to do in `BATCH` mode (as explained
-below), which is one of the reasons that you should use `BATCH` execution mode
-if your job allows it.
+在`流`执行模式下，Flink 使用 checkpoints 进行故障恢复。请参看 [checkpointing 文档]({{< ref 
"docs/dev/datastream/fault-tolerance/checkpointing" 
>}})，了解关于如何实践和配置它。关于[通过状态快照进行容错]({{< ref "docs/learn-flink/fault_tolerance" 
>}})，也有一个比较入门的章节，从更高的层面解释了这些概念。
 
-In `BATCH` execution mode, Flink will try and backtrack to previous processing
-stages for which intermediate results are still available. Potentially, only
-the tasks that failed (or their predecessors in the graph) will have to be
-restarted, which can improve processing efficiency and overall processing time
-of the job compared to restarting all tasks from a checkpoint.
+Checkpointing 用于故障恢复的特点之一是，在发生故障时，Flink 会从 checkpoint 
重新启动所有正在运行的任务。这可能比我们在`批`模式下所要做的事情代价更高（如下文所解释），这也是如果你的任务允许的话应该使用`批`执行模式的原因之一。
 
-## Important Considerations
+在`批`执行模式下，Flink 会尝试并回溯到之前的中间结果仍可获取的处理阶段。只有失败的任务（或它们在图中的前辈）才可能需要重新启动。这与从 
checkpoint 重新启动所有任务相比，可以提高作业的处理效率和整体处理时间。
 
-Compared to classic `STREAMING` execution mode, in `BATCH` mode some things
-might not work as expected. Some features will work slightly differently while
-others are not supported.
+## 重要的考虑因素
 
-Behavior Change in BATCH mode:
+与经典的`流`执行模式相比，在`批`模式下，有些东西可能无法按照预期工作。一些功能的工作方式会略有不同，而其他功能会不支持。
 
-* "Rolling" operations such as [reduce()]({{< ref 
"docs/dev/datastream/operators/overview" >}}#reduce) 
-  or [sum()]({{< ref "docs/dev/datastream/operators/overview" >}}#aggregations)
-  emit an incremental update for every new record that arrives in `STREAMING`
-  mode. In `BATCH` mode, these operations are not "rolling". They emit only the
-  final result.
+`批`模式下的行为变化：
 
+* “滚动"操作，如 [reduce()]({{< ref "docs/dev/datastream/operators/overview" 
>}}#reduce) 或 [sum()]({{< ref "docs/dev/datastream/operators/overview" 
>}}#aggregations)，会对`流`模式下每一条新记录发出增量更新。在`批`模式下，这些操作不是"滚动”。它们只发出最终结果。
 
-Unsupported in BATCH mode:
+`批`模式下不支持的:
 
-* [Checkpointing]({{< ref "docs/concepts/stateful-stream-processing" 
>}}#stateful-stream-processing) 
-  and any operations that depend on checkpointing do not work.
-* [Iterations]({{< ref "docs/dev/datastream/operators/overview" >}}#iterate)
+* [Checkpointing]({{< ref "docs/concepts/stateful-stream-processing" 
>}}#stateful-stream-processing) 和任何依赖于 checkpointing 的操作都不支持。
+* [迭代（Iterations）]({{< ref "docs/dev/datastream/operators/overview" 
>}}#iterate)
 
-Custom operators should be implemented with care, otherwise they might behave
-improperly. See also additional explanations below for more details.
+自定义算子应谨慎执行，否则可能会有不恰当的行为。更多细节请参见下面的补充说明。
 
 ### Checkpointing
 
-As explained [above](#failure-recovery), failure recovery for batch programs
-does not use checkpointing.
+如[上文所述](#故障恢复)，批处理程序的故障恢复不使用检查点。
 
-It is important to remember that because there are no checkpoints, certain
-features such as {{< javadoc 
file="org/apache/flink/api/common/state/CheckpointListener.html" 
name="CheckpointListener">}}
-and, as a result,  Kafka's [EXACTLY_ONCE]({{< ref 
"docs/connectors/datastream/kafka" >}}#kafka-producers-and-fault-tolerance) 
mode or `StreamingFileSink`'s
-[OnCheckpointRollingPolicy]({{< ref 
"docs/connectors/datastream/streamfile_sink" >}}#rolling-policy)
-won't work. If you need a transactional sink that works in
-`BATCH` mode make sure it uses the Unified Sink API as proposed in
-[FLIP-143](https://cwiki.apache.org/confluence/x/KEJ4CQ).
+重要的是要记住，因为没有 checkpoints，某些功能如 ({{< javadoc 
file="org/apache/flink/api/common/state/CheckpointListener.html" 
name="CheckpointListener">}})，以及因此，Kafka 的 [精确一次（EXACTLY_ONCE）]({{< ref 
"docs/connectors/datastream/kafka" >}}#kafka-producers-and-fault-tolerance) 模式或 
`StreamingFileSink` 的 [OnCheckpointRollingPolicy]({{< ref 
"docs/connectors/datastream/streamfile_sink" >}}#rolling-policy) 将无法工作。
+如果你需要一个在`批`模式下工作的事务型 sink，请确保它使用 
[FLIP-143](https://cwiki.apache.org/confluence/x/KEJ4CQ) 中提出的统一 Sink API。
 
-You can still use all the [state primitives]({{< ref 
"docs/dev/datastream/fault-tolerance/state" >}}#working-with-state),
-it's just that the mechanism used for failure recovery will be different.
+你仍然可以使用所有的 [状态原语（state primitives）]({{< ref 
"docs/dev/datastream/fault-tolerance/state" 
>}}#working-with-state)，只是用于故障恢复的机制会有所不同。
 
-### Writing Custom Operators
+### 编写自定义算子
 
 {{< hint info >}}
-**Note:** Custom operators are an advanced usage pattern of Apache Flink. For 
most
-use-cases, consider using a (keyed-)process function instead.
-{{< /hint >}}
+**注意：** 自定义算子是 Apache Flink 的一种高级使用模式。对于大多数的使用情况，可以考虑使用（keyed-）过程函数来代替。

Review comment:
       “可以考虑使用（keyed-）~~过程~~ 处理 函数来代替”




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] PatrickRen commented on a change in pull request #15767: [FLINK-22393][docs-zh] Translate the page of "Execution Mode (Batch/Streaming)" into Chinese

Reply via email to