Thanks, +1 to adding support for streaming on Interactive Beam (+David Yan <david...@google.com>)
On Thu, Oct 24, 2019 at 1:45 PM Hai Lu <lhai...@apache.org> wrote: > Hi Robert, > > We're trying out iBeam at LinkedIn for Python. As Igor mentioned, there > seems to be some inconsistency in the behavior of interactive beam. We can > suggest some fixes from our end but we would need some support from the > community. > > Also, is there a plan to support iBeam for streaming mode? We're > interested in that use case as well. > > Thanks, > Hai > > On Mon, Oct 21, 2019 at 4:45 PM Robert Bradshaw <rober...@google.com> > wrote: > >> Thanks for trying this out. Yes, this is definitely something that >> should be supported (and tested). >> >> On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com> wrote: >> > >> > Hi everyone, >> > >> > The interactive beam example using the DirectRunner fails after >> execution of the last cell. The recursion limit is exceeded during the >> calculation of the cache label because of a circular reference in the >> PipelineInfo object. >> > >> > The constructor for the PipelineInfo class creates a mapping from each >> pcollection to the transforms that produce and consume it. The issue arises >> when there exists a transform that is both a producer and a consumer for >> the same pcollection. This occurs when a transform's expand method returns >> the same pcoll object that's passed into it. The specific transform causing >> the failure of the example is MaybeReshuffle, which is used in the Create >> transform. Replacing "return pcoll" with "return pcoll | Map(lambda x: x)" >> seems to fix the problem. >> > >> > A workaround for this issue on the interactive beam side would be >> fairly simple, but it seems to me that there should be more validation of >> pipelines to prevent the use of transforms that return the same pcoll >> that's passed in, or at least a mention of this in the transform style >> guide. My understanding is that pcollections are produced by a single >> transform (they even have a field called "producer" that references only >> one transform). If that's the case then that property of pcollections >> should be enforced. >> > >> > I made ticket BEAM-8451 to track this issue. >> > >> > I'm still new to beam so I apologize if I'm fundamentally >> misunderstanding something. I'm not exactly sure what the next step should >> be and would appreciate some recommendations. I can submit a PR to solve >> the immediate problem of the failing example but the underlying problem >> should also be addressed at some point. I also apologize if people are >> already aware of this problem. >> > >> > Thank You! >> > Igor Durovic >> >