[ https://issues.apache.org/jira/browse/BEAM-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Beam JIRA Bot updated BEAM-7745: -------------------------------- Labels: stale-P2 (was: ) > StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state > access pattern during normal operation > ------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-7745 > URL: https://issues.apache.org/jira/browse/BEAM-7745 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow > Reporter: Steve Niemitz > Priority: P2 > Labels: stale-P2 > > I spent some time tracking down sources of uncached state fetches in my job, > and one large category was the interaction of StreamingSideInputDoFnRunner + > StreamingSideInputFetcher. > Basically, during standard operations, when the main input is NOT blocked by > the side input, the side input fetcher will perform an uncached state read > for every input element. Changing it to cache the blockedMap state gave me a > ~30-40% increase in throughput in my job. > The interaction is a little complicated, and there's a couple optimizations > here I can see. > > Primarily, the blockedMap is only persisted if it is non-empty. Because the > WindmillStateCache won't cache a null value, this means that the "nothing is > blocked" signal is never actually cached, and will issue a state read to > windmill for each input element. The solution here seems like it is to > persist an empty map rather than a null when there are no blocked elements. > -- This message was sent by Atlassian Jira (v8.3.4#803005)