Re: Write-through-cache in State logic

Thomas Weise Tue, 16 Jul 2019 09:43:25 -0700

Thanks for the pointer. For streaming, it will be important to support
caching across bundles. It appears that even the Java SDK doesn't support
that yet?


https://github.com/apache/beam/blob/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java#L221

Regarding clear/append: It would be nice if both could occur within a
single Fn Api roundtrip when the state is persisted.

Thanks,
Thomas



On Tue, Jul 16, 2019 at 6:58 AM Lukasz Cwik <[email protected]> wrote:

> User state is built on top of read, append and clear and not off a read
> and write paradigm to allow for blind appends.
>
> The optimization you speak of can be done completely inside the SDK
> without any additional protocol being required as long as you clear the
> state first and then append all your new data. The Beam Java SDK does this
> for all runners when executed portably[1]. You could port the same logic to
> the Beam Python SDK as well.
>
> 1:
> https://github.com/apache/beam/blob/41478d00d34598e56471d99d0845ac16efa5b8ef/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java#L84
>
> On Tue, Jul 16, 2019 at 5:54 AM Robert Bradshaw <[email protected]>
> wrote:
>
>> Python workers also have a per-bundle SDK-side cache. A protocol has
>> been proposed, but hasn't yet been implemented in any SDKs or runners.
>>
>> On Tue, Jul 16, 2019 at 6:02 AM Reuven Lax <[email protected]> wrote:
>> >
>> > It's runner dependent. Some runners (e.g. the Dataflow runner) do have
>> such a cache, though I think it's currently has a cap for large bags.
>> >
>> > Reuven
>> >
>> > On Mon, Jul 15, 2019 at 8:48 PM Rakesh Kumar <[email protected]>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have been using python sdk for the application and also using
>> BagState in production. I was wondering whether state logic has any
>> write-through-cache implemented or not. If we are sending every read and
>> write request through network then it comes with a performance cost. We can
>> avoid network call for a read operation if we have write-through-cache.
>> >> I have superficially looked into the implementation and I didn't see
>> any cache implementation.
>> >>
>> >> is it possible to have this cache? would it cause any issue if we have
>> the caching layer?
>> >>
>>
>

Re: Write-through-cache in State logic

Reply via email to