Thanks for trying this out. Yes, this is definitely something that
should be supported (and tested).

On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic <id.te...@gmail.com> wrote:
>
> Hi everyone,
>
> The interactive beam example using the DirectRunner fails after execution of 
> the last cell. The recursion limit is exceeded during the calculation of the 
> cache label because of a circular reference in the PipelineInfo object.
>
> The constructor for the PipelineInfo class creates a mapping from each 
> pcollection to the transforms that produce and consume it. The issue arises 
> when there exists a transform that is both a producer and a consumer for the 
> same pcollection. This occurs when a transform's expand method returns the 
> same pcoll object that's passed into it. The specific transform causing the 
> failure of the example is MaybeReshuffle, which is used in the Create 
> transform. Replacing "return pcoll" with "return pcoll | Map(lambda x: x)" 
> seems to fix the problem.
>
> A workaround for this issue on the interactive beam side would be fairly 
> simple, but it seems to me that there should be more validation of pipelines 
> to prevent the use of transforms that return the same pcoll that's passed in, 
> or at least a mention of this in the transform style guide. My understanding 
> is that pcollections are produced by a single transform (they even have a 
> field called "producer" that references only one transform). If that's the 
> case then that property of pcollections should be enforced.
>
> I made ticket BEAM-8451 to track this issue.
>
> I'm still new to beam so I apologize if I'm fundamentally misunderstanding 
> something. I'm not exactly sure what the next step should be and would 
> appreciate some recommendations. I can submit a PR to solve the immediate 
> problem of the failing example but the underlying problem should also be 
> addressed at some point. I also apologize if people are already aware of this 
> problem.
>
> Thank You!
> Igor Durovic

Reply via email to