+Sam Rohde <sro...@google.com> is working on streaming support for
Interactive Beam.

The high level idea is to capture a bounded segment of the unbounded data
source for replayablity and determinism, and to use TestStream, which has
the ability to control the clock of the pipeline, to replay the data, so
streaming semantics that depend on processing time (e.g. processing time
triggers) will be deterministic.

David

On Thu, Oct 24, 2019 at 4:39 PM Robert Bradshaw <rober...@google.com> wrote:

> Yes, there are plans to support streaming for interactive beam. David
> Yan (cc'd) is leading this effort.
>
> On Thu, Oct 24, 2019 at 1:50 PM Harsh Vardhan <anan...@google.com> wrote:
> >
> > Thanks, +1 to adding support for streaming on Interactive Beam (+David
> Yan)
> >
> >
> > On Thu, Oct 24, 2019 at 1:45 PM Hai Lu <lhai...@apache.org> wrote:
> >>
> >> Hi Robert,
> >>
> >> We're trying out iBeam at LinkedIn for Python. As Igor mentioned, there
> seems to be some inconsistency in the behavior of interactive beam. We can
> suggest some fixes from our end but we would need some support from the
> community.
> >>
> >> Also, is there a plan to support iBeam for streaming mode? We're
> interested in that use case as well.
> >>
> >> Thanks,
> >> Hai
> >>
> >> On Mon, Oct 21, 2019 at 4:45 PM Robert Bradshaw <rober...@google.com>
> wrote:
> >>>
> >>> Thanks for trying this out. Yes, this is definitely something that
> >>> should be supported (and tested).
> >>>
> >>> On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com>
> wrote:
> >>> >
> >>> > Hi everyone,
> >>> >
> >>> > The interactive beam example using the DirectRunner fails after
> execution of the last cell. The recursion limit is exceeded during the
> calculation of the cache label because of a circular reference in the
> PipelineInfo object.
> >>> >
> >>> > The constructor for the PipelineInfo class creates a mapping from
> each pcollection to the transforms that produce and consume it. The issue
> arises when there exists a transform that is both a producer and a consumer
> for the same pcollection. This occurs when a transform's expand method
> returns the same pcoll object that's passed into it. The specific transform
> causing the failure of the example is MaybeReshuffle, which is used in the
> Create transform. Replacing "return pcoll" with "return pcoll | Map(lambda
> x: x)" seems to fix the problem.
> >>> >
> >>> > A workaround for this issue on the interactive beam side would be
> fairly simple, but it seems to me that there should be more validation of
> pipelines to prevent the use of transforms that return the same pcoll
> that's passed in, or at least a mention of this in the transform style
> guide. My understanding is that pcollections are produced by a single
> transform (they even have a field called "producer" that references only
> one transform). If that's the case then that property of pcollections
> should be enforced.
> >>> >
> >>> > I made ticket BEAM-8451 to track this issue.
> >>> >
> >>> > I'm still new to beam so I apologize if I'm fundamentally
> misunderstanding something. I'm not exactly sure what the next step should
> be and would appreciate some recommendations. I can submit a PR to solve
> the immediate problem of the failing example but the underlying problem
> should also be addressed at some point. I also apologize if people are
> already aware of this problem.
> >>> >
> >>> > Thank You!
> >>> > Igor Durovic
>

Reply via email to