[
https://issues.apache.org/jira/browse/BEAM-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-7745:
--------------------------------
Priority: P3 (was: P2)
> StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state
> access pattern during normal operation
> -------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7745
> URL: https://issues.apache.org/jira/browse/BEAM-7745
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Steve Niemitz
> Priority: P3
> Labels: stale-P2
>
> I spent some time tracking down sources of uncached state fetches in my job,
> and one large category was the interaction of StreamingSideInputDoFnRunner +
> StreamingSideInputFetcher.
> Basically, during standard operations, when the main input is NOT blocked by
> the side input, the side input fetcher will perform an uncached state read
> for every input element. Changing it to cache the blockedMap state gave me a
> ~30-40% increase in throughput in my job.
> The interaction is a little complicated, and there's a couple optimizations
> here I can see.
>
> Primarily, the blockedMap is only persisted if it is non-empty. Because the
> WindmillStateCache won't cache a null value, this means that the "nothing is
> blocked" signal is never actually cached, and will issue a state read to
> windmill for each input element. The solution here seems like it is to
> persist an empty map rather than a null when there are no blocked elements.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)