On Thu, Feb 23, 2017 at 9:02 PM, Shen Li <[email protected]> wrote:
> Thanks a lot for explaining. As the SimpleDoFnRunner only takes one > StepContext object in its arguments, do you mean the runner should create a > new SimpleDoFnRunner for each key? > Yes, that is right. You have some flexibility how to manage this. You can see the DirectRunner creates a new DoFnRunner for each bundle, and separately keeps track of whether a bundle has one consistent key: https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluator.java#L76 Kenn > Best, > > Shen > > On Thu, Feb 23, 2017 at 10:16 PM, Kenneth Knowles <[email protected]> > wrote: > > > Hi Shen, > > > > The way that this is done is that the StepContext.stateInternals() is > > specialized to be per-key by the runner, before you create the > > SimpleDoFnRunner. Does this help? > > > > Kenn > > > > On Thu, Feb 23, 2017 at 3:03 PM, Shen Li <[email protected]> wrote: > > > > > Hi, > > > > > > I am trying to understand how a runner can support the Stateful ParDo. > > > Currently our runner relies on the SimpleDoFnRunner (Beam-0.5.0). But > it > > > cannot pass ParDoTest#testValueStateSideOutput ( > > > https://github.com/apache/beam/blob/master/sdks/java/ > > > core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1607 > > > ) > > > > > > The reason is that it maintains a global state instead of a *per-key* > > state > > > as explained in the latest Beam blog. I scanned through the > > implementation > > > of the SimpleDoFnRunner, the implementation does not seem to consider > the > > > *per-key* state either ( > > > https://github.com/apache/beam/blob/master/runners/core- > > > java/src/main/java/org/apache/beam/runners/core/ > > SimpleDoFnRunner.java#L626 > > > ). > > > > > > May I ask what is the recommended solution to handle stateful ParDo? > > > > > > Thanks, > > > > > > Shen > > > > > >
