I have some knowledge and opinions about StepContext and *Internals. - StepContext is an interface that basically just grew as "whatever we need right now, add it". It is also mostly left over from before Beam. It is not really the result of conscious holistic design. So no one should feel attached to the details.
- StepContext is *almost* designed to be per-key. It is not a real requirement and not all StepContext instances are for any well-defined key). But since there is just one StateInternals and one TimerInternals, which are implicitly per key (or some other kind of scoping) it basically means StepContext can only have one key (at a time). Yet every other method has nothing to do with the key. So that is crufty. Whenever you delete a *Context bucket-of-whatever class, it brings a smile. So if there were a coherent design here, I'd be excited. But thinking more small-scale, it seems OK for StepContext to have a StateInternalsFactory and TimerInternalsFactory instead, but I don't think that helps you so much. But other than that, I would say that the cleanup timer and cleanup are per key because state and timers are per key. To execute a stateful DoFn you need to linearize execution for each key anyhow, so aren't all your elements for a key? If not, then does that mean you have linearized all inputs across a key range? (which is fine, just less parallel) Did I understand the proposal? I might not have. Perhaps this is a discussion best over code, if you have 15-minutes to sketch. Kenn On Thu, Apr 6, 2017 at 8:57 AM, Thomas Weise <[email protected]> wrote: > In the example, why not have StateInternalsStateCleaner and > TimeInternalsCleanupTimer > depend on the context to get the current internals instead? > > > On Thu, Apr 6, 2017 at 8:37 AM, JingsongLee <[email protected]> > wrote: > > > There is no suitable way to get the CurrentKey. > > I think using StepContext.timerInternals() and > StepContext.stateInternals() > > is better. > > best, > > JingsongLee > > ------------------------------------------------------------ > ------From:Thomas > > Weise <[email protected]>Time:2017 Apr 6 (Thu) 12:45To:dev < > > [email protected]>Subject:StatefulDoFnRunner > > Hi, > > > > While working on the support for splittable DoFn, I see a few cases where > > the DoFn runner classes slightly complicate reuse across elements (or > make > > it a bit awkward to implement for the runner). > > > > StateInternalsStateCleaner and TimeInternalsCleanupTimer take > xxxInternals > > instances. But since those are key specific, the runner writer has to > > perform acrobatics to flip the key on these internal instances on a per > > element basis (to avoid having to recreate the other objects that refer > to > > them). > > > > Would it be possible to instead use the factory and retrieve the > internals > > by key? The runner then has the choice to optimize as > > needed. In general, I > > think it would be nice if the processing context related classes are > > designed so that they promote reuse of object instances > > across elements and > > bundles and minimize object creation on a per key basis? > > > > Thanks, > > Thomas > > > > >
