Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4966#discussion_r149337715
  
    --- Diff: docs/dev/stream/state/queryable_state.md ---
    @@ -32,38 +32,67 @@ under the License.
       likely that there will be breaking API changes on the client side in the 
upcoming Flink versions.
     </div>
     
    -In a nutshell, this feature allows users to query Flink's managed 
partitioned state
    -(see [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) 
from outside of
    -Flink. For some scenarios, queryable state thus eliminates the need for 
distributed
    -operations/transactions with external systems such as key-value stores 
which are often the
    -bottleneck in practice.
    +In a nutshell, this feature exposes Flink's managed keyed (partitioned) 
state
    +(see [Working with State]({{ site.baseurl }}/dev/stream/state/state.html)) 
to the outside world and 
    +allows the user to query a job's state from outside Flink. For some 
scenarios, queryable state 
    +eliminates the need for distributed operations/transactions with external 
systems such as key-value 
    +stores which are often the bottleneck in practice. In addition, this 
feature may be particularly 
    +useful for debugging purposes.
     
     <div class="alert alert-warning">
    -  <strong>Attention:</strong> Queryable state accesses keyed state from a 
concurrent thread rather
    -  than synchronizing with the operator and potentially blocking its 
operation. Since any state
    -  backend using Java heap space, e.g. MemoryStateBackend or
    -  FsStateBackend, does not work with copies when retrieving values but 
instead directly
    -  references the stored values, read-modify-write patterns are unsafe and 
may cause the
    -  queryable state server to fail due to concurrent modifications.
    -  The RocksDBStateBackend is safe from these issues.
    +  <strong>Attention:</strong> When querying a state object, that object is 
accessed from a concurrent 
    +  thread without any synchronization or copying. This is a design choice, 
as any of the above would lead
    +  to increased job latency, which we wanted to avoid. Since any state 
backend using Java heap space, 
    +  <i>e.g.</i> <code>MemoryStateBackend</code> or 
<code>FsStateBackend</code>, does not work 
    +  with copies when retrieving values but instead directly references the 
stored values, read-modify-write 
    +  patterns are unsafe and may cause the queryable state server to fail due 
to concurrent modifications.
    +  The <code>RocksDBStateBackend</code> is safe from these issues.
     </div>
     
    +## Architecture
    +
    +Before showing how to use the Queryable State, it is useful to briefly 
describe the entities that compose it.
    +The Queryable State consists of three main entities:
    +
    + 1. the `QueryableStateClient`, which (potentially) runs outside the Flink 
cluster and submits the user queries, 
    + 2. the `QueryableStateClientProxy`, which runs on each `TaskManager` 
(*i.e.* inside the Flink cluster) and is responsible 
    + for receiving the client's queries, fetching the requested state on his 
behalf, and returning it to the client, and 
    + 3. the `QueryableStateServer` which runs on each `TaskManager` and is 
responsible for serving the locally stored state.
    + 
    +In a nutshell, the client will connect to one of the proxies and send a 
request for the state associated with a specific 
    +key, `k`. As stated in [Working with State]({{ site.baseurl 
}}/dev/stream/state/state.html), keyed state is organized in 
    +*Key Groups*, and each `TaskManager` is assigned a number of these key 
groups. To discover which `TaskManager` is 
    +responsible for the key group holding `k`, the proxy will ask the 
`JobManager`. Based on the answer, the proxy will 
    +then query the `QueryableStateServer` running on that `TaskManager` for 
the state associated with `k`, and forward the
    +response back to the client.
    +
    +## Activating Queryable State
    +
    +To enable queryable state on your Flink cluster, you just have to copy the 
    +`flink-queryable-state-runtime{{ site.scala_version_suffix 
}}-{{site.version }}.jar` 
    +from the `opt/` folder of your [Flink 
distribution](https://flink.apache.org/downloads.html "Apache Flink: 
Downloads"), 
    +to the `lib/` folder. In other case, the queryable state feature is not 
enabled. 
    --- End diff --
    
    nit: "Otherwise, ..."


---

Reply via email to