[ https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047920#comment-17047920 ]
Rui Wang commented on BEAM-9322: -------------------------------- [~rohdesam] per some discussion happen in PR (or somewhere else), I move this Jira to 2.21.0. Please let me know if you don't agree. > Python SDK ignores manually set PCollection tags > ------------------------------------------------ > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Sam Rohde > Assignee: Sam Rohde > Priority: Critical > Fix For: 2.21.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)