On Thu, Feb 22, 2024 at 10:16 AM Robert Bradshaw <rober...@google.com> wrote: > > On Thu, Feb 22, 2024 at 9:37 AM Reuven Lax via dev <dev@beam.apache.org> > wrote: > > > > On Thu, Feb 22, 2024 at 9:26 AM Kenneth Knowles <k...@apache.org> wrote: > >> > >> Wow I love your input Reuven. Of course "the source" that you are applying > >> backpressure to is often a runner's shuffle so it may be state anyhow, but > >> it is good to give the runner the choice of how to figure that out and > >> maybe chain backpressure further. > > > > > > Sort of - however most (streaming) runners apply backpressure through > > shuffle as well. This means that while some amount of data will accumulate > > in shuffle, eventually the backpressure will push back to the source. > > Caveat of course is that this is mostly true for streaming runners, not > > batch runners. > > For batch it's still preferable to keep the data upstream in shuffle > (which has less size limitations) than state (which must reside in > worker memory, though only one key at a time).
And for drain (or even cancel), it's preferable to have as much as possible upstream in the source than sitting in state.