Yes, there are plans to support streaming for interactive beam. David
Yan (cc'd) is leading this effort.

On Thu, Oct 24, 2019 at 1:50 PM Harsh Vardhan <anan...@google.com> wrote:
>
> Thanks, +1 to adding support for streaming on Interactive Beam (+David Yan)
>
>
> On Thu, Oct 24, 2019 at 1:45 PM Hai Lu <lhai...@apache.org> wrote:
>>
>> Hi Robert,
>>
>> We're trying out iBeam at LinkedIn for Python. As Igor mentioned, there 
>> seems to be some inconsistency in the behavior of interactive beam. We can 
>> suggest some fixes from our end but we would need some support from the 
>> community.
>>
>> Also, is there a plan to support iBeam for streaming mode? We're interested 
>> in that use case as well.
>>
>> Thanks,
>> Hai
>>
>> On Mon, Oct 21, 2019 at 4:45 PM Robert Bradshaw <rober...@google.com> wrote:
>>>
>>> Thanks for trying this out. Yes, this is definitely something that
>>> should be supported (and tested).
>>>
>>> On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com> wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > The interactive beam example using the DirectRunner fails after execution 
>>> > of the last cell. The recursion limit is exceeded during the calculation 
>>> > of the cache label because of a circular reference in the PipelineInfo 
>>> > object.
>>> >
>>> > The constructor for the PipelineInfo class creates a mapping from each 
>>> > pcollection to the transforms that produce and consume it. The issue 
>>> > arises when there exists a transform that is both a producer and a 
>>> > consumer for the same pcollection. This occurs when a transform's expand 
>>> > method returns the same pcoll object that's passed into it. The specific 
>>> > transform causing the failure of the example is MaybeReshuffle, which is 
>>> > used in the Create transform. Replacing "return pcoll" with "return pcoll 
>>> > | Map(lambda x: x)" seems to fix the problem.
>>> >
>>> > A workaround for this issue on the interactive beam side would be fairly 
>>> > simple, but it seems to me that there should be more validation of 
>>> > pipelines to prevent the use of transforms that return the same pcoll 
>>> > that's passed in, or at least a mention of this in the transform style 
>>> > guide. My understanding is that pcollections are produced by a single 
>>> > transform (they even have a field called "producer" that references only 
>>> > one transform). If that's the case then that property of pcollections 
>>> > should be enforced.
>>> >
>>> > I made ticket BEAM-8451 to track this issue.
>>> >
>>> > I'm still new to beam so I apologize if I'm fundamentally 
>>> > misunderstanding something. I'm not exactly sure what the next step 
>>> > should be and would appreciate some recommendations. I can submit a PR to 
>>> > solve the immediate problem of the failing example but the underlying 
>>> > problem should also be addressed at some point. I also apologize if 
>>> > people are already aware of this problem.
>>> >
>>> > Thank You!
>>> > Igor Durovic

Reply via email to