I think WindowMappingFn (https://issues.apache.org/jira/browse/BEAM-260 /
https://s.apache.org/beam-windowmappingfn-1-pager) is a good fit for this.
There are details to shake out.

One big thing it does not address well (because it is focused only on GC
thresholds) is specifically which windows need their state accessible from
which others, hence how much parallelism is available and how much
communication is there between windows. Today it is somewhat moot because
we don't use that parallelism.

On Wed, Jan 11, 2017 at 10:03 AM, Lukasz Cwik <[email protected]>
wrote:

> Bundle processing order is indeterminate, wouldn't accessing user state of
> a different window lead to indeterminate state information. This seems to
> be even weaker then what you get from side inputs that are triggered
> multiple times.
>
> On Wed, Jan 11, 2017 at 10:01 AM, Tyler Akidau <[email protected]> wrote:
>
> > On Wed, Jan 11, 2017 at 9:43 AM Robert Bradshaw
> > <[email protected]>
> > wrote:
> >
> > > On Wed, Jan 11, 2017 at 8:59 AM, Lukasz Cwik <[email protected]
> >
> > > wrote:
> > > > I was under the impression that user state was scoped to a ParDo and
> > was
> > > > not shareable across multiple ParDos. Wouldn't rewindowing require
> the
> > > > usage of multiple ParDos and hence not allow for state to be shared?
> > >
> > > No, you'd do something like
> > >
> > > pc.apply(WindowInto(grouping_windowing))
> > >   .apply(GroupByKey())
> > >   .apply(WindowInto(state_windowing)
> > >   .apply(ParDo(state_using_dofn)
> > >
> > > You could reify the window after GroupByKey if you need to inspect it.
> > >
> > > However, I'm liking the idea of being able to associate different
> > > WindowFns with particular state tags similar to side inputs (though
> > > the default would be the windowing of the main input).
> > >
> >
> > Can you expand upon what you mean by this? I'm not sure I understand what
> > you're getting at yet.
> >
> > -Tyler
> >
> >
> > >
> > > > On Tue, Jan 10, 2017 at 10:51 PM, Robert Bradshaw <
> > > > [email protected]> wrote:
> > > >
> > > >> Possibly this could be handled by rewindowing and the current
> > > semantics. If
> > > >> not, maybe treat state like a side input with its own windowing and
> > > window
> > > >> mapping fn.
> > > >>
> > > >> On Jan 10, 2017 3:14 PM, "Ben Chambers (JIRA)" <[email protected]>
> > wrote:
> > > >>
> > > >> > Ben Chambers created BEAM-1261:
> > > >> > ----------------------------------
> > > >> >
> > > >> >              Summary: State API should allow state to be managed
> in
> > > >> > different windows
> > > >> >                  Key: BEAM-1261
> > > >> >                  URL: https://issues.apache.org/
> > jira/browse/BEAM-1261
> > > >> >              Project: Beam
> > > >> >           Issue Type: Bug
> > > >> >           Components: beam-model, sdk-java-core
> > > >> >             Reporter: Ben Chambers
> > > >> >             Assignee: Kenneth Knowles
> > > >> >
> > > >> >
> > > >> > For example, even if the elements are being processed in fixed
> > > windows of
> > > >> > an hour, it may be desirable for the state to "roll over" between
> > > windows
> > > >> > (or be available to all windows).
> > > >> >
> > > >> > It will also be necessary to figure out when this state should be
> > > deleted
> > > >> > (TTL? maximum retention?)
> > > >> >
> > > >> > Another problem is how to deal with out of order data. If data
> comes
> > > in
> > > >> > from the 10:00 AM window, should its state changes be visible to
> the
> > > data
> > > >> > in the 9:00 AM window?
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > This message was sent by Atlassian JIRA
> > > >> > (v6.3.4#6332)
> > > >> >
> > > >>
> > >
> >
>

Reply via email to