[jira] [Updated] (BEAM-7745) StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state access pattern during normal operation
[ https://issues.apache.org/jira/browse/BEAM-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beam JIRA Bot updated BEAM-7745: Labels: stale-P2 (was: ) > StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state > access pattern during normal operation > --- > > Key: BEAM-7745 > URL: https://issues.apache.org/jira/browse/BEAM-7745 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Steve Niemitz >Priority: P2 > Labels: stale-P2 > > I spent some time tracking down sources of uncached state fetches in my job, > and one large category was the interaction of StreamingSideInputDoFnRunner + > StreamingSideInputFetcher. > Basically, during standard operations, when the main input is NOT blocked by > the side input, the side input fetcher will perform an uncached state read > for every input element. Changing it to cache the blockedMap state gave me a > ~30-40% increase in throughput in my job. > The interaction is a little complicated, and there's a couple optimizations > here I can see. > > Primarily, the blockedMap is only persisted if it is non-empty. Because the > WindmillStateCache won't cache a null value, this means that the "nothing is > blocked" signal is never actually cached, and will issue a state read to > windmill for each input element. The solution here seems like it is to > persist an empty map rather than a null when there are no blocked elements. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7745) StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state access pattern during normal operation
[ https://issues.apache.org/jira/browse/BEAM-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-7745: --- Status: Open (was: Triage Needed) > StreamingSideInputDoFnRunner/StreamingSideInputFetcher have suboptimal state > access pattern during normal operation > --- > > Key: BEAM-7745 > URL: https://issues.apache.org/jira/browse/BEAM-7745 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Steve Niemitz >Priority: Major > > I spent some time tracking down sources of uncached state fetches in my job, > and one large category was the interaction of StreamingSideInputDoFnRunner + > StreamingSideInputFetcher. > Basically, during standard operations, when the main input is NOT blocked by > the side input, the side input fetcher will perform an uncached state read > for every input element. Changing it to cache the blockedMap state gave me a > ~30-40% increase in throughput in my job. > The interaction is a little complicated, and there's a couple optimizations > here I can see. > > Primarily, the blockedMap is only persisted if it is non-empty. Because the > WindmillStateCache won't cache a null value, this means that the "nothing is > blocked" signal is never actually cached, and will issue a state read to > windmill for each input element. The solution here seems like it is to > persist an empty map rather than a null when there are no blocked elements. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)