Thanks for trying this out. Yes, this is definitely something that should be supported (and tested).
On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com> wrote: > > Hi everyone, > > The interactive beam example using the DirectRunner fails after execution of > the last cell. The recursion limit is exceeded during the calculation of the > cache label because of a circular reference in the PipelineInfo object. > > The constructor for the PipelineInfo class creates a mapping from each > pcollection to the transforms that produce and consume it. The issue arises > when there exists a transform that is both a producer and a consumer for the > same pcollection. This occurs when a transform's expand method returns the > same pcoll object that's passed into it. The specific transform causing the > failure of the example is MaybeReshuffle, which is used in the Create > transform. Replacing "return pcoll" with "return pcoll | Map(lambda x: x)" > seems to fix the problem. > > A workaround for this issue on the interactive beam side would be fairly > simple, but it seems to me that there should be more validation of pipelines > to prevent the use of transforms that return the same pcoll that's passed in, > or at least a mention of this in the transform style guide. My understanding > is that pcollections are produced by a single transform (they even have a > field called "producer" that references only one transform). If that's the > case then that property of pcollections should be enforced. > > I made ticket BEAM-8451 to track this issue. > > I'm still new to beam so I apologize if I'm fundamentally misunderstanding > something. I'm not exactly sure what the next step should be and would > appreciate some recommendations. I can submit a PR to solve the immediate > problem of the failing example but the underlying problem should also be > addressed at some point. I also apologize if people are already aware of this > problem. > > Thank You! > Igor Durovic