klion26 commented on a change in pull request #12139: URL: https://github.com/apache/flink/pull/12139#discussion_r426997228
########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -27,75 +27,54 @@ under the License. {:toc} <div class="alert alert-warning"> - <strong>Note:</strong> The client APIs for queryable state are currently in an evolving state and - there are <strong>no guarantees</strong> made about stability of the provided interfaces. It is - likely that there will be breaking API changes on the client side in the upcoming Flink versions. + <strong>注意:</strong> 目前 querable state 的客户端 API 还在不断演进,<strong>不保证</strong>现有接口的稳定性。在后续的 Flink 版本中有可能发生 API 变化。 </div> -In a nutshell, this feature exposes Flink's managed keyed (partitioned) state -(see [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) to the outside world and -allows the user to query a job's state from outside Flink. For some scenarios, queryable state -eliminates the need for distributed operations/transactions with external systems such as key-value -stores which are often the bottleneck in practice. In addition, this feature may be particularly -useful for debugging purposes. +简而言之, 这个特性将 Flink 的 managed keyed (partitioned) state +(参考 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) 暴露给外部,从而用户可以在 Flink 外部查询作业 state。 +在某些场景中,Queryable State 消除了对外部系统的分布式操作以及事务的需求,比如 KV 存储系统,而这些外部系统往往会成为瓶颈。除此之外,这个特性对于调试作业非常有用。 <div class="alert alert-warning"> - <strong>Attention:</strong> When querying a state object, that object is accessed from a concurrent - thread without any synchronization or copying. This is a design choice, as any of the above would lead - to increased job latency, which we wanted to avoid. Since any state backend using Java heap space, - <i>e.g.</i> <code>MemoryStateBackend</code> or <code>FsStateBackend</code>, does not work - with copies when retrieving values but instead directly references the stored values, read-modify-write - patterns are unsafe and may cause the queryable state server to fail due to concurrent modifications. - The <code>RocksDBStateBackend</code> is safe from these issues. + <strong>注意:</strong> 进行查询时,state 会在并发线程中被访问,但 state 不会进行同步和拷贝。这种设计是为了避免同步和拷贝带来的作业延时。对于使用 Java 堆内存的 state backend, + <i>比如</i> <code>MemoryStateBackend</code> 或者 <code>FsStateBackend</code>,它们获取状态时不会进行拷贝,而是直接引用状态对象,所以对状态的 read-modify-write 是不安全的,并且 + 可能会因为并发修改导致查询失败。但 <code>RocksDBStateBackend</code> 是安全的,不会遇到上述问题。 </div> -## Architecture +## 架构 -Before showing how to use the Queryable State, it is useful to briefly describe the entities that compose it. -The Queryable State feature consists of three main entities: +在展示如何使用 Queryable State 之前,先简单描述一下该特性的组成部分,主要包括以下三部分: - 1. the `QueryableStateClient`, which (potentially) runs outside the Flink cluster and submits the user queries, - 2. the `QueryableStateClientProxy`, which runs on each `TaskManager` (*i.e.* inside the Flink cluster) and is responsible - for receiving the client's queries, fetching the requested state from the responsible Task Manager on his behalf, and - returning it to the client, and - 3. the `QueryableStateServer` which runs on each `TaskManager` and is responsible for serving the locally stored state. + 1. `QueryableStateClient`,默认运行在 Flink 集群外部,负责提交用户的查询请求; + 2. `QueryableStateClientProxy`,运行在每个 `TaskManager` 上(*即* Flink 集群内部),负责接收客户端的查询请求,从所负责的 Task Manager 获取请求的 state,并返回给客户端; + 3. `QueryableStateServer`, 运行在 `TaskManager` 上,负责服务本地存储的 state。 -The client connects to one of the proxies and sends a request for the state associated with a specific -key, `k`. As stated in [Working with State]({{ site.baseurl }}/dev/stream/state/state.html), keyed state is organized in -*Key Groups*, and each `TaskManager` is assigned a number of these key groups. To discover which `TaskManager` is -responsible for the key group holding `k`, the proxy will ask the `JobManager`. Based on the answer, the proxy will -then query the `QueryableStateServer` running on that `TaskManager` for the state associated with `k`, and forward the -response back to the client. +客户端连接到一个代理,并发送请求获取特定 `k` 对应的 state。 如 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)所述,keyed state 按照 +*Key Groups* 进行划分,每个 `TaskManager` 会分配其中的一些 key groups。代理会询问 `JobManager` 以找到 `k` 所属 key group 的 TaskManager。根据返回的结果, 代理 +将会向运行在 `TaskManager` 上的 `QueryableStateServer` 查询 `k` 对应的 state, 并将结果返回给客户端。 -## Activating Queryable State +## 激活 Queryable State -To enable queryable state on your Flink cluster, you need to do the following: +为了在 Flink 集群上使用 queryable state,需要进行以下操作: - 1. copy the `flink-queryable-state-runtime{{ site.scala_version_suffix }}-{{site.version }}.jar` -from the `opt/` folder of your [Flink distribution](https://flink.apache.org/downloads.html "Apache Flink: Downloads"), -to the `lib/` folder. - 2. set the property `queryable-state.enable` to `true`. See the [Configuration]({{ site.baseurl }}/ops/config.html#queryable-state) documentation for details and additional parameters. + 1. 将 `flink-queryable-state-runtime{{ site.scala_version_suffix }}-{{site.version }}.jar` +从 [Flink distribution](https://flink.apache.org/downloads.html "Apache Flink: Downloads") 的 `opt/` 目录拷贝到 `lib/` 目录; + 2. 将参数 `queryable-state.enable` 设置为 `true`。详细信息以及其它配置可参考文档 [Configuration]({{ site.baseurl }}/ops/config.html#queryable-state)。 -To verify that your cluster is running with queryable state enabled, check the logs of any -task manager for the line: `"Started the Queryable State Proxy Server @ ..."`. +为了验证集群的 queryable stat 已经被激活,可以检查任意 task manager 的日志中是否包含 "Started the Queryable State Proxy Server @ ..."。 Review comment: ```suggestion 为了验证集群的 queryable state 已经被激活,可以检查任意 task manager 的日志中是否包含 "Started the Queryable State Proxy Server @ ..."。 ``` ########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -180,18 +154,16 @@ jar which must be explicitly included as a dependency in the `pom.xml` of your p {% endhighlight %} </div> -For more on this, you can check how to [set up a Flink program]({{ site.baseurl }}/dev/projectsetup/dependencies.html). +关于依赖的更多信息, 可以参考如何[配置Flink项目]({{ site.baseurl }}/zh/dev/projectsetup/dependencies.html). Review comment: ```suggestion 关于依赖的更多信息, 可以参考如何[配置 Flink 项目]({{ site.baseurl }}/zh/dev/projectsetup/dependencies.html). ``` ########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -27,75 +27,54 @@ under the License. {:toc} <div class="alert alert-warning"> - <strong>Note:</strong> The client APIs for queryable state are currently in an evolving state and - there are <strong>no guarantees</strong> made about stability of the provided interfaces. It is - likely that there will be breaking API changes on the client side in the upcoming Flink versions. + <strong>注意:</strong> 目前 querable state 的客户端 API 还在不断演进,<strong>不保证</strong>现有接口的稳定性。在后续的 Flink 版本中有可能发生 API 变化。 </div> -In a nutshell, this feature exposes Flink's managed keyed (partitioned) state -(see [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) to the outside world and -allows the user to query a job's state from outside Flink. For some scenarios, queryable state -eliminates the need for distributed operations/transactions with external systems such as key-value -stores which are often the bottleneck in practice. In addition, this feature may be particularly -useful for debugging purposes. +简而言之, 这个特性将 Flink 的 managed keyed (partitioned) state +(参考 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) 暴露给外部,从而用户可以在 Flink 外部查询作业 state。 +在某些场景中,Queryable State 消除了对外部系统的分布式操作以及事务的需求,比如 KV 存储系统,而这些外部系统往往会成为瓶颈。除此之外,这个特性对于调试作业非常有用。 <div class="alert alert-warning"> - <strong>Attention:</strong> When querying a state object, that object is accessed from a concurrent - thread without any synchronization or copying. This is a design choice, as any of the above would lead - to increased job latency, which we wanted to avoid. Since any state backend using Java heap space, - <i>e.g.</i> <code>MemoryStateBackend</code> or <code>FsStateBackend</code>, does not work - with copies when retrieving values but instead directly references the stored values, read-modify-write - patterns are unsafe and may cause the queryable state server to fail due to concurrent modifications. - The <code>RocksDBStateBackend</code> is safe from these issues. + <strong>注意:</strong> 进行查询时,state 会在并发线程中被访问,但 state 不会进行同步和拷贝。这种设计是为了避免同步和拷贝带来的作业延时。对于使用 Java 堆内存的 state backend, + <i>比如</i> <code>MemoryStateBackend</code> 或者 <code>FsStateBackend</code>,它们获取状态时不会进行拷贝,而是直接引用状态对象,所以对状态的 read-modify-write 是不安全的,并且 + 可能会因为并发修改导致查询失败。但 <code>RocksDBStateBackend</code> 是安全的,不会遇到上述问题。 Review comment: 这里不需要空行,空行会多加一个空格 ########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -27,75 +27,54 @@ under the License. {:toc} <div class="alert alert-warning"> - <strong>Note:</strong> The client APIs for queryable state are currently in an evolving state and - there are <strong>no guarantees</strong> made about stability of the provided interfaces. It is - likely that there will be breaking API changes on the client side in the upcoming Flink versions. + <strong>注意:</strong> 目前 querable state 的客户端 API 还在不断演进,<strong>不保证</strong>现有接口的稳定性。在后续的 Flink 版本中有可能发生 API 变化。 </div> -In a nutshell, this feature exposes Flink's managed keyed (partitioned) state -(see [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) to the outside world and -allows the user to query a job's state from outside Flink. For some scenarios, queryable state -eliminates the need for distributed operations/transactions with external systems such as key-value -stores which are often the bottleneck in practice. In addition, this feature may be particularly -useful for debugging purposes. +简而言之, 这个特性将 Flink 的 managed keyed (partitioned) state +(参考 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) 暴露给外部,从而用户可以在 Flink 外部查询作业 state。 +在某些场景中,Queryable State 消除了对外部系统的分布式操作以及事务的需求,比如 KV 存储系统,而这些外部系统往往会成为瓶颈。除此之外,这个特性对于调试作业非常有用。 <div class="alert alert-warning"> - <strong>Attention:</strong> When querying a state object, that object is accessed from a concurrent - thread without any synchronization or copying. This is a design choice, as any of the above would lead - to increased job latency, which we wanted to avoid. Since any state backend using Java heap space, - <i>e.g.</i> <code>MemoryStateBackend</code> or <code>FsStateBackend</code>, does not work - with copies when retrieving values but instead directly references the stored values, read-modify-write - patterns are unsafe and may cause the queryable state server to fail due to concurrent modifications. - The <code>RocksDBStateBackend</code> is safe from these issues. + <strong>注意:</strong> 进行查询时,state 会在并发线程中被访问,但 state 不会进行同步和拷贝。这种设计是为了避免同步和拷贝带来的作业延时。对于使用 Java 堆内存的 state backend, + <i>比如</i> <code>MemoryStateBackend</code> 或者 <code>FsStateBackend</code>,它们获取状态时不会进行拷贝,而是直接引用状态对象,所以对状态的 read-modify-write 是不安全的,并且 + 可能会因为并发修改导致查询失败。但 <code>RocksDBStateBackend</code> 是安全的,不会遇到上述问题。 </div> -## Architecture +## 架构 -Before showing how to use the Queryable State, it is useful to briefly describe the entities that compose it. -The Queryable State feature consists of three main entities: +在展示如何使用 Queryable State 之前,先简单描述一下该特性的组成部分,主要包括以下三部分: - 1. the `QueryableStateClient`, which (potentially) runs outside the Flink cluster and submits the user queries, - 2. the `QueryableStateClientProxy`, which runs on each `TaskManager` (*i.e.* inside the Flink cluster) and is responsible - for receiving the client's queries, fetching the requested state from the responsible Task Manager on his behalf, and - returning it to the client, and - 3. the `QueryableStateServer` which runs on each `TaskManager` and is responsible for serving the locally stored state. + 1. `QueryableStateClient`,默认运行在 Flink 集群外部,负责提交用户的查询请求; + 2. `QueryableStateClientProxy`,运行在每个 `TaskManager` 上(*即* Flink 集群内部),负责接收客户端的查询请求,从所负责的 Task Manager 获取请求的 state,并返回给客户端; + 3. `QueryableStateServer`, 运行在 `TaskManager` 上,负责服务本地存储的 state。 -The client connects to one of the proxies and sends a request for the state associated with a specific -key, `k`. As stated in [Working with State]({{ site.baseurl }}/dev/stream/state/state.html), keyed state is organized in -*Key Groups*, and each `TaskManager` is assigned a number of these key groups. To discover which `TaskManager` is -responsible for the key group holding `k`, the proxy will ask the `JobManager`. Based on the answer, the proxy will -then query the `QueryableStateServer` running on that `TaskManager` for the state associated with `k`, and forward the -response back to the client. +客户端连接到一个代理,并发送请求获取特定 `k` 对应的 state。 如 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)所述,keyed state 按照 Review comment: ```suggestion 客户端连接到一个代理,并发送请求获取特定 `k` 对应的 state。 如 [Working with State]({{ site.baseurl }}/zh/dev/stream/state/state.html) 所述,keyed state 按照 ``` ########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -119,28 +98,25 @@ QueryableStateStream asQueryableState( <div class="alert alert-info"> - <strong>Note:</strong> There is no queryable <code>ListState</code> sink as it would result in an ever-growing - list which may not be cleaned up and thus will eventually consume too much memory. + <strong>注意:</strong> 没有可查询的 <code>ListState</code> sink,因为这种情况下 list 会不断增长,并且可能不会被清理,最终会消耗大量的内存。 </div> -The returned `QueryableStateStream` can be seen as a sink and **cannot** be further transformed. Internally, a -`QueryableStateStream` gets translated to an operator which uses all incoming records to update the queryable state -instance. The updating logic is implied by the type of the `StateDescriptor` provided in the `asQueryableState` call. -In a program like the following, all records of the keyed stream will be used to update the state instance via the -`ValueState.update(value)`: +返回的 `QueryableStateStream` 可以被视作一个sink,而且**不能再**被进一步转换。在内部实现上,一个 `QueryableStateStream` 被转换成一个 operator, +使用输入的数据来更新 queryable state。state 如何更新是由 `asQueryableState` 提供的 `StateDescriptor` 来决定的。在下面的代码中, keyed stream 的所有数据 +将会通过 `ValueState.update(value)` 来更新状态: Review comment: 这里不需要换行 ########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -150,20 +126,18 @@ descriptor.setQueryable("query-name"); // queryable state name {% endhighlight %} <div class="alert alert-info"> - <strong>Note:</strong> The <code>queryableStateName</code> parameter may be chosen arbitrarily and is only - used for queries. It does not have to be identical to the state's own name. + <strong>注意:</strong> 参数 <code>queryableStateName</code> 可以任意选取,并且只被用来进行查询,它可以和 state 的名称不同。 </div> -This variant has no limitations as to which type of state can be made queryable. This means that this can be used for -any `ValueState`, `ReduceState`, `ListState`, `MapState`, `AggregatingState`, and the currently deprecated `FoldingState`. +这种方式不会限制 state 类型,即任意的 `ValueState`、`ReduceState`、`ListState`、`MapState`、`AggregatingState` 以及已弃用的 `FoldingState` +均可作为 queryable state。 -## Querying State +## 查询 state -So far, you have set up your cluster to run with queryable state and you have declared (some of) your state as -queryable. Now it is time to see how to query this state. +目前为止,你已经激活了集群的 queryable state 功能,并且将一些 state 设置成了可查询的,接下来将会展示如何进行查询。 -For this you can use the `QueryableStateClient` helper class. This is available in the `flink-queryable-state-client` -jar which must be explicitly included as a dependency in the `pom.xml` of your project along with `flink-core`, as shown below: +为了进行查询,可以使用辅助类 `QueryableStateClient`,这个类位于 `flink-queryable-state-client` 的jar中,在项目的 `pom.xml` 需要显示添加 Review comment: ```suggestion 为了进行查询,可以使用辅助类 `QueryableStateClient`,这个类位于 `flink-queryable-state-client` 的 jar 中,在项目的 `pom.xml` 需要显示添加 ``` ########## File path: docs/dev/stream/state/queryable_state.zh.md ########## @@ -27,75 +27,54 @@ under the License. {:toc} <div class="alert alert-warning"> - <strong>Note:</strong> The client APIs for queryable state are currently in an evolving state and - there are <strong>no guarantees</strong> made about stability of the provided interfaces. It is - likely that there will be breaking API changes on the client side in the upcoming Flink versions. + <strong>注意:</strong> 目前 querable state 的客户端 API 还在不断演进,<strong>不保证</strong>现有接口的稳定性。在后续的 Flink 版本中有可能发生 API 变化。 </div> -In a nutshell, this feature exposes Flink's managed keyed (partitioned) state -(see [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) to the outside world and -allows the user to query a job's state from outside Flink. For some scenarios, queryable state -eliminates the need for distributed operations/transactions with external systems such as key-value -stores which are often the bottleneck in practice. In addition, this feature may be particularly -useful for debugging purposes. +简而言之, 这个特性将 Flink 的 managed keyed (partitioned) state +(参考 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) 暴露给外部,从而用户可以在 Flink 外部查询作业 state。 +在某些场景中,Queryable State 消除了对外部系统的分布式操作以及事务的需求,比如 KV 存储系统,而这些外部系统往往会成为瓶颈。除此之外,这个特性对于调试作业非常有用。 <div class="alert alert-warning"> - <strong>Attention:</strong> When querying a state object, that object is accessed from a concurrent - thread without any synchronization or copying. This is a design choice, as any of the above would lead - to increased job latency, which we wanted to avoid. Since any state backend using Java heap space, - <i>e.g.</i> <code>MemoryStateBackend</code> or <code>FsStateBackend</code>, does not work - with copies when retrieving values but instead directly references the stored values, read-modify-write - patterns are unsafe and may cause the queryable state server to fail due to concurrent modifications. - The <code>RocksDBStateBackend</code> is safe from these issues. + <strong>注意:</strong> 进行查询时,state 会在并发线程中被访问,但 state 不会进行同步和拷贝。这种设计是为了避免同步和拷贝带来的作业延时。对于使用 Java 堆内存的 state backend, + <i>比如</i> <code>MemoryStateBackend</code> 或者 <code>FsStateBackend</code>,它们获取状态时不会进行拷贝,而是直接引用状态对象,所以对状态的 read-modify-write 是不安全的,并且 + 可能会因为并发修改导致查询失败。但 <code>RocksDBStateBackend</code> 是安全的,不会遇到上述问题。 </div> -## Architecture +## 架构 -Before showing how to use the Queryable State, it is useful to briefly describe the entities that compose it. -The Queryable State feature consists of three main entities: +在展示如何使用 Queryable State 之前,先简单描述一下该特性的组成部分,主要包括以下三部分: - 1. the `QueryableStateClient`, which (potentially) runs outside the Flink cluster and submits the user queries, - 2. the `QueryableStateClientProxy`, which runs on each `TaskManager` (*i.e.* inside the Flink cluster) and is responsible - for receiving the client's queries, fetching the requested state from the responsible Task Manager on his behalf, and - returning it to the client, and - 3. the `QueryableStateServer` which runs on each `TaskManager` and is responsible for serving the locally stored state. + 1. `QueryableStateClient`,默认运行在 Flink 集群外部,负责提交用户的查询请求; + 2. `QueryableStateClientProxy`,运行在每个 `TaskManager` 上(*即* Flink 集群内部),负责接收客户端的查询请求,从所负责的 Task Manager 获取请求的 state,并返回给客户端; + 3. `QueryableStateServer`, 运行在 `TaskManager` 上,负责服务本地存储的 state。 -The client connects to one of the proxies and sends a request for the state associated with a specific -key, `k`. As stated in [Working with State]({{ site.baseurl }}/dev/stream/state/state.html), keyed state is organized in -*Key Groups*, and each `TaskManager` is assigned a number of these key groups. To discover which `TaskManager` is -responsible for the key group holding `k`, the proxy will ask the `JobManager`. Based on the answer, the proxy will -then query the `QueryableStateServer` running on that `TaskManager` for the state associated with `k`, and forward the -response back to the client. +客户端连接到一个代理,并发送请求获取特定 `k` 对应的 state。 如 [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)所述,keyed state 按照 +*Key Groups* 进行划分,每个 `TaskManager` 会分配其中的一些 key groups。代理会询问 `JobManager` 以找到 `k` 所属 key group 的 TaskManager。根据返回的结果, 代理 +将会向运行在 `TaskManager` 上的 `QueryableStateServer` 查询 `k` 对应的 state, 并将结果返回给客户端。 Review comment: 这里不需要换行 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org