[GitHub] [flink] klion26 commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese

GitBox Mon, 22 Jun 2020 23:05:03 -0700


klion26 commented on a change in pull request #12727:
URL: https://github.com/apache/flink/pull/12727#discussion_r443971994




##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。

Review comment:
       `并且 keyed state 的每一项的工作副本` 这个感觉有点奇怪，能否再优化下呢

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。

Review comment:
       这里能够按照中文的描述优化下吗，比如说 `Flink 有两种 statebackend，一种是，一种是“ 等

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。

Review comment:
       `Flink 可以恢复应用程序的完整状态并恢复处理` 这句话感觉读起来有一点拗口。能否再优化下呢？

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。

Review comment:
       `Operator state 对于需要它的机器节点来说也是本地的` 这个改成 `Operator State 也存在机器节点本地` 
会更好一些吗？
   
   `连续快照`  -> `快照` 就行了。

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。

Review comment:
       建议这行和上面一行合并，否则 “堆内存中。”和 “这种基于”之间会有空格。
   你可以执行 `./docs/docker/run.sh` 然后`./build.sh -p` 之后打开 `localhost:4000` 
找到翻译的页面查看效果

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。

Review comment:
       这里不是说 基于堆的 state backend 有两种方式，而是说两种 `state backend` （也就是后面说的）会使用基于堆的 
state

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。

Review comment:
       `更加巨大` -> `更大` 会好一些吗

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />

Review comment:
       这里在中文的翻译中，把链接改为 `{% link fig/stream_barriers.svg %}` 
吧，现在[社区](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html)
 建议把链接的形式进行修改。
   其他地方也修改一下

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。

Review comment:
       `该状态副本包含了处理所有事件直至 sources 中的那些位置。` 这里能否按照中文的描述进行些优化呢？读起来顺序有点像英文的语句

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />
 
-Checkpoint _n_ will contain the state of each operator that resulted from 
having consumed **every
-event before checkpoint barrier _n_, and none of the events after it**.
+Checkpoint _n_ 将包含每个 operator 的 state，这些 state 产生于消费 **在 checkpoint barrier 
_n_ 之前的所有事件，并且不会包含任何在此之后的事件**。

Review comment:
       `这些 state 产生于消费 **在 checkpoint barrier _n_ 之前的所有事件，并且不会包含任何在此之后的事件**。` 
这句话读起来感觉有点拗口，能否再优化下

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />
 
-Checkpoint _n_ will contain the state of each operator that resulted from 
having consumed **every
-event before checkpoint barrier _n_, and none of the events after it**.
+Checkpoint _n_ 将包含每个 operator 的 state，这些 state 产生于消费 **在 checkpoint barrier 
_n_ 之前的所有事件，并且不会包含任何在此之后的事件**。
 
-As each operator in the job graph receives one of these barriers, it records 
its state. Operators
-with two input streams (such as a `CoProcessFunction`) perform _barrier 
alignment_ so that the
-snapshot will reflect the state resulting from consuming events from both 
input streams up to (but
-not past) both barriers.
+当 job graph 中的每个 operator 接收到这些 barriers 其中之一时，它就会记录下其状态。
+拥有两个输入流的 Operators（例如 `CoProcessFunction`）会执行 _barrier 对齐（barrier alignment）_ 
以便快照将反应由于消费所有输入流 events 直到（但不超过）所有 barriers 而产生的状态。
 
 <img src="{{ site.baseurl }}/fig/stream_aligning.svg" alt="Barrier alignment" 
class="center" width="100%" />
 
-Flink's state backends use a copy-on-write mechanism to allow stream 
processing to continue
-unimpeded while older versions of the state are being asynchronously 
snapshotted. Only when the
-snapshots have been durably persisted will these older versions of the state 
be garbage collected.
+Flink 的 state backends 使用写时复制（copy-on-write）机制允许流处理在异步快照状态的旧版本时不受阻碍地继续。
+只有当快照被持久保存时，这些旧版本的状态才会被垃圾回收。
 
-### Exactly Once Guarantees
+### 精确一次保证
 
-When things go wrong in a stream processing application, it is possible to 
have either lost, or
-duplicated results. With Flink, depending on the choices you make for your 
application and the
-cluster you run it on, any of these outcomes is possible:
+当流处理应用程序发生错误的时候，可能产生缺失，或者重复冗余的结果。
+使用 Flink，根据你为应用程序和运行的集群所做的选择，可能会出现以下任何结果：
 
-- Flink makes no effort to recover from failures (_at most once_)
-- Nothing is lost, but you may experience duplicated results (_at least once_)
-- Nothing is lost or duplicated (_exactly once_)
+- Flink 不会努力从故障中恢复（_at most once_）
+- 没有任何丢失，但是你可能会得到重复冗余的结果（_at least once_）
+- 没有丢失或冗余重复（_exactly once_）
 
-Given that Flink recovers from faults by rewinding and replaying the source 
data streams, when the
-ideal situation is described as **exactly once** this does *not* mean that 
every event will be
-processed exactly once. Instead, it means that _every event will affect the 
state being managed by
-Flink exactly once_. 
+假设 Flink 通过倒回和重新发送 source 数据流从故障中恢复，当理想情况被描述为 **精确一次（exactly once）** 
时，这并*不*意味着每个事件都将被精确一次处理。

Review comment:
       `倒回` 改成 `回退` 会好一些吗？
   这个地方为什么要加”假设“呢？

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />
 
-Checkpoint _n_ will contain the state of each operator that resulted from 
having consumed **every
-event before checkpoint barrier _n_, and none of the events after it**.
+Checkpoint _n_ 将包含每个 operator 的 state，这些 state 产生于消费 **在 checkpoint barrier 
_n_ 之前的所有事件，并且不会包含任何在此之后的事件**。
 
-As each operator in the job graph receives one of these barriers, it records 
its state. Operators
-with two input streams (such as a `CoProcessFunction`) perform _barrier 
alignment_ so that the
-snapshot will reflect the state resulting from consuming events from both 
input streams up to (but
-not past) both barriers.
+当 job graph 中的每个 operator 接收到这些 barriers 其中之一时，它就会记录下其状态。
+拥有两个输入流的 Operators（例如 `CoProcessFunction`）会执行 _barrier 对齐（barrier alignment）_ 
以便快照将反应由于消费所有输入流 events 直到（但不超过）所有 barriers 而产生的状态。
 
 <img src="{{ site.baseurl }}/fig/stream_aligning.svg" alt="Barrier alignment" 
class="center" width="100%" />
 
-Flink's state backends use a copy-on-write mechanism to allow stream 
processing to continue
-unimpeded while older versions of the state are being asynchronously 
snapshotted. Only when the
-snapshots have been durably persisted will these older versions of the state 
be garbage collected.
+Flink 的 state backends 使用写时复制（copy-on-write）机制允许流处理在异步快照状态的旧版本时不受阻碍地继续。
+只有当快照被持久保存时，这些旧版本的状态才会被垃圾回收。
 
-### Exactly Once Guarantees
+### 精确一次保证
 
-When things go wrong in a stream processing application, it is possible to 
have either lost, or
-duplicated results. With Flink, depending on the choices you make for your 
application and the
-cluster you run it on, any of these outcomes is possible:
+当流处理应用程序发生错误的时候，可能产生缺失，或者重复冗余的结果。
+使用 Flink，根据你为应用程序和运行的集群所做的选择，可能会出现以下任何结果：
 
-- Flink makes no effort to recover from failures (_at most once_)
-- Nothing is lost, but you may experience duplicated results (_at least once_)
-- Nothing is lost or duplicated (_exactly once_)
+- Flink 不会努力从故障中恢复（_at most once_）
+- 没有任何丢失，但是你可能会得到重复冗余的结果（_at least once_）
+- 没有丢失或冗余重复（_exactly once_）
 
-Given that Flink recovers from faults by rewinding and replaying the source 
data streams, when the
-ideal situation is described as **exactly once** this does *not* mean that 
every event will be
-processed exactly once. Instead, it means that _every event will affect the 
state being managed by
-Flink exactly once_. 
+假设 Flink 通过倒回和重新发送 source 数据流从故障中恢复，当理想情况被描述为 **精确一次（exactly once）** 
时，这并*不*意味着每个事件都将被精确一次处理。
+相反，这意味着 _每一个事件都会影响 Flink 管理的状态精确一次_。

Review comment:
       建议这行和上面的进行合并，否则这两行的文字中间会多一个空格

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />
 
-Checkpoint _n_ will contain the state of each operator that resulted from 
having consumed **every
-event before checkpoint barrier _n_, and none of the events after it**.
+Checkpoint _n_ 将包含每个 operator 的 state，这些 state 产生于消费 **在 checkpoint barrier 
_n_ 之前的所有事件，并且不会包含任何在此之后的事件**。
 
-As each operator in the job graph receives one of these barriers, it records 
its state. Operators
-with two input streams (such as a `CoProcessFunction`) perform _barrier 
alignment_ so that the
-snapshot will reflect the state resulting from consuming events from both 
input streams up to (but
-not past) both barriers.
+当 job graph 中的每个 operator 接收到这些 barriers 其中之一时，它就会记录下其状态。

Review comment:
       `接收到这些 barriers 其中之一时` -> `接收到 barriers 时`。
   现在的翻译是 `barrier 其中之一` 意思就是有很多，但实际上这里每个 barrier 都会触发一次 checkpoint 的 
snapshot，只要 TM 接受到 barrier 后就会进行 snapshot。

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />
 
-Checkpoint _n_ will contain the state of each operator that resulted from 
having consumed **every
-event before checkpoint barrier _n_, and none of the events after it**.
+Checkpoint _n_ 将包含每个 operator 的 state，这些 state 产生于消费 **在 checkpoint barrier 
_n_ 之前的所有事件，并且不会包含任何在此之后的事件**。
 
-As each operator in the job graph receives one of these barriers, it records 
its state. Operators
-with two input streams (such as a `CoProcessFunction`) perform _barrier 
alignment_ so that the
-snapshot will reflect the state resulting from consuming events from both 
input streams up to (but
-not past) both barriers.
+当 job graph 中的每个 operator 接收到这些 barriers 其中之一时，它就会记录下其状态。
+拥有两个输入流的 Operators（例如 `CoProcessFunction`）会执行 _barrier 对齐（barrier alignment）_ 
以便快照将反应由于消费所有输入流 events 直到（但不超过）所有 barriers 而产生的状态。
 
 <img src="{{ site.baseurl }}/fig/stream_aligning.svg" alt="Barrier alignment" 
class="center" width="100%" />
 
-Flink's state backends use a copy-on-write mechanism to allow stream 
processing to continue
-unimpeded while older versions of the state are being asynchronously 
snapshotted. Only when the
-snapshots have been durably persisted will these older versions of the state 
be garbage collected.
+Flink 的 state backends 使用写时复制（copy-on-write）机制允许流处理在异步快照状态的旧版本时不受阻碍地继续。
+只有当快照被持久保存时，这些旧版本的状态才会被垃圾回收。
 
-### Exactly Once Guarantees
+### 精确一次保证
 
-When things go wrong in a stream processing application, it is possible to 
have either lost, or
-duplicated results. With Flink, depending on the choices you make for your 
application and the
-cluster you run it on, any of these outcomes is possible:
+当流处理应用程序发生错误的时候，可能产生缺失，或者重复冗余的结果。
+使用 Flink，根据你为应用程序和运行的集群所做的选择，可能会出现以下任何结果：
 
-- Flink makes no effort to recover from failures (_at most once_)
-- Nothing is lost, but you may experience duplicated results (_at least once_)
-- Nothing is lost or duplicated (_exactly once_)
+- Flink 不会努力从故障中恢复（_at most once_）
+- 没有任何丢失，但是你可能会得到重复冗余的结果（_at least once_）
+- 没有丢失或冗余重复（_exactly once_）
 
-Given that Flink recovers from faults by rewinding and replaying the source 
data streams, when the
-ideal situation is described as **exactly once** this does *not* mean that 
every event will be
-processed exactly once. Instead, it means that _every event will affect the 
state being managed by
-Flink exactly once_. 
+假设 Flink 通过倒回和重新发送 source 数据流从故障中恢复，当理想情况被描述为 **精确一次（exactly once）** 
时，这并*不*意味着每个事件都将被精确一次处理。
+相反，这意味着 _每一个事件都会影响 Flink 管理的状态精确一次_。
 
-Barrier alignment is only needed for providing exactly once guarantees. If you 
don't need this, you
-can gain some performance by configuring Flink to use 
`CheckpointingMode.AT_LEAST_ONCE`, which has
-the effect of disabling barrier alignment.
+Barrier 只需要在提供精确一次的语义保证时需要对齐（Barrier alignment）。
+如果不需要这种语义，可以通过将 Flink 配置为使用 `CheckpointingMode.AT_LEAST_ONCE` 
来获得一些性能，这具有关闭屏障对齐（Barrier alignment）的效果。
 
-### Exactly Once End-to-end
+### 端到端精确一次
 
-To achieve exactly once end-to-end, so that every event from the sources 
affects the sinks exactly
-once, the following must be true:
+为了实现端到端的精确一次，以便 sources 中的每个事件都仅精确一次对 sinks 生效，必须满足以下条件：
 
-1. your sources must be replayable, and
-2. your sinks must be transactional (or idempotent)
+1. 你的 sources 必须是可重播的，并且

Review comment:
       `重播` -> `重放` 会好一些吗

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />
 
-Checkpoint _n_ will contain the state of each operator that resulted from 
having consumed **every
-event before checkpoint barrier _n_, and none of the events after it**.
+Checkpoint _n_ 将包含每个 operator 的 state，这些 state 产生于消费 **在 checkpoint barrier 
_n_ 之前的所有事件，并且不会包含任何在此之后的事件**。
 
-As each operator in the job graph receives one of these barriers, it records 
its state. Operators
-with two input streams (such as a `CoProcessFunction`) perform _barrier 
alignment_ so that the
-snapshot will reflect the state resulting from consuming events from both 
input streams up to (but
-not past) both barriers.
+当 job graph 中的每个 operator 接收到这些 barriers 其中之一时，它就会记录下其状态。
+拥有两个输入流的 Operators（例如 `CoProcessFunction`）会执行 _barrier 对齐（barrier alignment）_ 
以便快照将反应由于消费所有输入流 events 直到（但不超过）所有 barriers 而产生的状态。

Review comment:
       `以便快照将反应由于消费所有输入流 events 直到（但不超过）所有 barriers 而产生的状态` 这句话能否再优化下。这里的意思是 
barrier 对齐后，快照就能够包括两个输入流的 barrier 之前的所有数据了。

##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。

Review comment:
       这句话读起来更像是英文的顺序，能否改成 ”_快照_ -- 是 XXX 的通用术语“ 这样呢？




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] klion26 commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese

Reply via email to