Yes, there are plans to support streaming for interactive beam. David Yan (cc'd) is leading this effort.
On Thu, Oct 24, 2019 at 1:50 PM Harsh Vardhan <anan...@google.com> wrote: > > Thanks, +1 to adding support for streaming on Interactive Beam (+David Yan) > > > On Thu, Oct 24, 2019 at 1:45 PM Hai Lu <lhai...@apache.org> wrote: >> >> Hi Robert, >> >> We're trying out iBeam at LinkedIn for Python. As Igor mentioned, there >> seems to be some inconsistency in the behavior of interactive beam. We can >> suggest some fixes from our end but we would need some support from the >> community. >> >> Also, is there a plan to support iBeam for streaming mode? We're interested >> in that use case as well. >> >> Thanks, >> Hai >> >> On Mon, Oct 21, 2019 at 4:45 PM Robert Bradshaw <rober...@google.com> wrote: >>> >>> Thanks for trying this out. Yes, this is definitely something that >>> should be supported (and tested). >>> >>> On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com> wrote: >>> > >>> > Hi everyone, >>> > >>> > The interactive beam example using the DirectRunner fails after execution >>> > of the last cell. The recursion limit is exceeded during the calculation >>> > of the cache label because of a circular reference in the PipelineInfo >>> > object. >>> > >>> > The constructor for the PipelineInfo class creates a mapping from each >>> > pcollection to the transforms that produce and consume it. The issue >>> > arises when there exists a transform that is both a producer and a >>> > consumer for the same pcollection. This occurs when a transform's expand >>> > method returns the same pcoll object that's passed into it. The specific >>> > transform causing the failure of the example is MaybeReshuffle, which is >>> > used in the Create transform. Replacing "return pcoll" with "return pcoll >>> > | Map(lambda x: x)" seems to fix the problem. >>> > >>> > A workaround for this issue on the interactive beam side would be fairly >>> > simple, but it seems to me that there should be more validation of >>> > pipelines to prevent the use of transforms that return the same pcoll >>> > that's passed in, or at least a mention of this in the transform style >>> > guide. My understanding is that pcollections are produced by a single >>> > transform (they even have a field called "producer" that references only >>> > one transform). If that's the case then that property of pcollections >>> > should be enforced. >>> > >>> > I made ticket BEAM-8451 to track this issue. >>> > >>> > I'm still new to beam so I apologize if I'm fundamentally >>> > misunderstanding something. I'm not exactly sure what the next step >>> > should be and would appreciate some recommendations. I can submit a PR to >>> > solve the immediate problem of the failing example but the underlying >>> > problem should also be addressed at some point. I also apologize if >>> > people are already aware of this problem. >>> > >>> > Thank You! >>> > Igor Durovic