Thanks, +1 to adding support for streaming on Interactive Beam (+David Yan
<david...@google.com>)


On Thu, Oct 24, 2019 at 1:45 PM Hai Lu <lhai...@apache.org> wrote:

> Hi Robert,
>
> We're trying out iBeam at LinkedIn for Python. As Igor mentioned, there
> seems to be some inconsistency in the behavior of interactive beam. We can
> suggest some fixes from our end but we would need some support from the
> community.
>
> Also, is there a plan to support iBeam for streaming mode? We're
> interested in that use case as well.
>
> Thanks,
> Hai
>
> On Mon, Oct 21, 2019 at 4:45 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Thanks for trying this out. Yes, this is definitely something that
>> should be supported (and tested).
>>
>> On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com> wrote:
>> >
>> > Hi everyone,
>> >
>> > The interactive beam example using the DirectRunner fails after
>> execution of the last cell. The recursion limit is exceeded during the
>> calculation of the cache label because of a circular reference in the
>> PipelineInfo object.
>> >
>> > The constructor for the PipelineInfo class creates a mapping from each
>> pcollection to the transforms that produce and consume it. The issue arises
>> when there exists a transform that is both a producer and a consumer for
>> the same pcollection. This occurs when a transform's expand method returns
>> the same pcoll object that's passed into it. The specific transform causing
>> the failure of the example is MaybeReshuffle, which is used in the Create
>> transform. Replacing "return pcoll" with "return pcoll | Map(lambda x: x)"
>> seems to fix the problem.
>> >
>> > A workaround for this issue on the interactive beam side would be
>> fairly simple, but it seems to me that there should be more validation of
>> pipelines to prevent the use of transforms that return the same pcoll
>> that's passed in, or at least a mention of this in the transform style
>> guide. My understanding is that pcollections are produced by a single
>> transform (they even have a field called "producer" that references only
>> one transform). If that's the case then that property of pcollections
>> should be enforced.
>> >
>> > I made ticket BEAM-8451 to track this issue.
>> >
>> > I'm still new to beam so I apologize if I'm fundamentally
>> misunderstanding something. I'm not exactly sure what the next step should
>> be and would appreciate some recommendations. I can submit a PR to solve
>> the immediate problem of the failing example but the underlying problem
>> should also be addressed at some point. I also apologize if people are
>> already aware of this problem.
>> >
>> > Thank You!
>> > Igor Durovic
>>
>

Reply via email to