Re: [Proposal] Support State Batching and Prefetching over FnApi

Luke Cwik Mon, 14 Jun 2021 15:21:17 -0700

Enhancements to the SDK which allow for greater declaration of intent would
be useful but the overall issue is that the SDK can send multiple read
requests through the use of readLater() and then block on read() without
the runner being aware that the SDK is blocked.


The runner could implement these strategies today:
* have X pending state lookups, gather all new incoming state requests into
one batch and as soon as one of the pending state lookups has finished then
issue the new batch. This likely will increase load on the state lookup
system since if there are two readLater()'s per element that are back to
back then those won't get batched together
* gather incoming state requests until there are X requests or Y time has
passed. This is strictly slower since we will have to wait till Y time has
passed pretty regularly.

On Mon, Jun 14, 2021 at 3:05 PM Kenneth Knowles <k...@apache.org> wrote:

> I didn't see user API or SDK changes that I would expect in this proposal.
> Maybe I missed it? The main big win for state batching in the runners core
> trigger & window implementation is batching requests across a whole bundle.
> Certainly across elements. This probably requires something like either:
>
>  - a user API change (to have something like this:
> https://github.com/apache/beam/blob/f47a9424723e20abf807098dd6e9eef6e74c16cc/runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/AfterAllStateMachine.java#L52
> )
>  - the SDK to use metadata/annotations (like this:
> https://github.com/apache/beam/blob/f47a9424723e20abf807098dd6e9eef6e74c16cc/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L420)
> instead
>
> That will allow the hints to be gathered across multiple elements. And it
> would see the Fn API streaming protocol might mean the implementation is
> different than it is in ReduceFnRunner.
>
> Kenn
>
> On Mon, Jun 14, 2021 at 2:46 PM Luke Cwik <lc...@google.com> wrote:
>
>> The third approach prevents you from batching across state keys which
>> would be the most common type of batching.
>>
>> On Thu, May 6, 2021 at 3:13 PM Rui Wang <ruw...@google.com> wrote:
>>
>>> At this moment, the third approach in the doc is preferred. To recap,
>>> the third approach is the one that only changes FnApi by adding a repeated
>>> field in the state request to support batching over FnApi.
>>>
>>> This approach has the following benefits:
>>> 1. Avoid double requests problem introduced by prefetching (prefetching
>>> needs two requests, one for prefetch and one for blocking fetch).
>>> 2. This approach does not conflict with prefetching so no backward
>>> compatibility issue even when we want to add prefetching in FnApi. So this
>>> approach can be a good starting point.
>>>
>>> The caveat though is this approach does not support smart prefetching
>>> (which needs runners support). However we can add that in the future if
>>> necessary and that won't conflict with existing design.
>>>
>>> Please let us know if you have any objection before the implementation.
>>>
>>>
>>> -Rui
>>>
>>> On Mon, Mar 22, 2021 at 12:27 PM Rui Wang <amaliu...@apache.org> wrote:
>>>
>>>> Hi Community,
>>>>
>>>> Andrew Crites and I drafted a document to discuss how to support state
>>>> prefetching and batching over FnApi, which seems a missing functionality in
>>>> FnApi. This will help us support Java state readLater() Api over FnApi.
>>>>
>>>> Please see:
>>>> https://docs.google.com/document/d/1Z3a5YOZyYsN8MeS6hRhCXX31m9bKCXSOtjKSl7wX40c/edit?usp=sharing&resourcekey=0-eiNl525kmb3Av2bqgCsZUA
>>>>
>>>>
>>>> -Rui
>>>>
>>>

Re: [Proposal] Support State Batching and Prefetching over FnApi

Reply via email to