Re: StatefulDoFnRunner

Kenneth Knowles Thu, 06 Apr 2017 13:27:35 -0700

I have some knowledge and opinions about StepContext and *Internals.

 - StepContext is an interface that basically just grew as "whatever we
need right now, add it". It is also mostly left over from before Beam. It
is not really the result of conscious holistic design. So no one should
feel attached to the details.


 - StepContext is *almost* designed to be per-key. It is not a real
requirement and not all StepContext instances are for any well-defined
key). But since there is just one StateInternals and one TimerInternals,
which are implicitly per key (or some other kind of scoping) it basically
means StepContext can only have one key (at a time). Yet every other method
has nothing to do with the key. So that is crufty.

Whenever you delete a *Context bucket-of-whatever class, it brings a smile.
So if there were a coherent design here, I'd be excited. But thinking more
small-scale, it seems OK for StepContext to have a StateInternalsFactory
and TimerInternalsFactory instead, but I don't think that helps you so much.

But other than that, I would say that the cleanup timer and cleanup are per
key because state and timers are per key. To execute a stateful DoFn you
need to linearize execution for each key anyhow, so aren't all your
elements for a key? If not, then does that mean you have linearized all
inputs across a key range? (which is fine, just less parallel)

Did I understand the proposal? I might not have. Perhaps this is a
discussion best over code, if you have 15-minutes to sketch.

Kenn

On Thu, Apr 6, 2017 at 8:57 AM, Thomas Weise <[email protected]> wrote:

> In the example, why not have StateInternalsStateCleaner and
> TimeInternalsCleanupTimer
> depend on the context to get the current internals instead?
>
>
> On Thu, Apr 6, 2017 at 8:37 AM, JingsongLee <[email protected]>
> wrote:
>
> > There is no suitable way to get the CurrentKey.
> >  I think using StepContext.timerInternals() and
> StepContext.stateInternals()
> > is better.
> > best,
> > JingsongLee
> > ------------------------------------------------------------
> ------From:Thomas
> > Weise <[email protected]>Time:2017 Apr 6 (Thu) 12:45To:dev <
> > [email protected]>Subject:StatefulDoFnRunner
> > Hi,
> >
> > While working on the support for splittable DoFn, I see a few cases where
> > the DoFn runner classes slightly complicate reuse across elements (or
> make
> > it a bit awkward to implement for the runner).
> >
> > StateInternalsStateCleaner and TimeInternalsCleanupTimer take
> xxxInternals
> > instances. But since those are key specific, the runner writer has to
> > perform acrobatics to flip the key on these internal instances on a per
> > element basis (to avoid having to recreate the other objects that refer
> to
> > them).
> >
> > Would it be possible to instead use the factory and retrieve the
> internals
> > by key? The runner then has the choice to optimize as
> > needed. In general, I
> > think it would be nice if the processing context related classes are
> > designed so that they promote reuse of object instances
> > across elements and
> > bundles and minimize object creation on a per key basis?
> >
> > Thanks,
> > Thomas
> >
> >
>

Re: StatefulDoFnRunner

Reply via email to