[ 
https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=414398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414398
 ]

ASF GitHub Bot logged work on BEAM-9322:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Apr/20 01:01
            Start Date: 02/Apr/20 01:01
    Worklog Time Spent: 10m 
      Work Description: robertwb commented on pull request #11283: [BEAM-9322] 
[BEAM-1833] Better naming for composite transform output tags.
URL: https://github.com/apache/beam/pull/11283#discussion_r401995268
 
 

 ##########
 File path: sdks/python/apache_beam/pipeline.py
 ##########
 @@ -671,7 +671,11 @@ def apply(
         # If the user wants the old implementation of always generated
         # PCollection output ids, then set the tag to None first, then count up
         # from 1.
-        tag = len(current.outputs) if None in current.outputs else None
+        base = tag
+        counter = 0
+        while tag in current.outputs:
+          counter += 1
+          tag = '%s_%d' % (base, counter)
         current.add_output(result, tag)
 
 Review comment:
   I am relatively confident in this change, as it preserves the essential 
characteristic (that output names are unique) and defaults to the same thing 
for all single-output transforms. However, I have added the opt-out you had 
originally with a note just in case. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 414398)
    Time Spent: 3h  (was: 2h 50m)

> Python SDK ignores manually set PCollection tags
> ------------------------------------------------
>
>                 Key: BEAM-9322
>                 URL: https://issues.apache.org/jira/browse/BEAM-9322
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Sam Rohde
>            Assignee: Sam Rohde
>            Priority: Critical
>             Fix For: 2.21.0
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Python SDK currently ignores any tags set on PCollections manually when 
> applying PTransforms when adding the PCollection to the PTransform 
> [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]].
>  In the 
> [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]]
>  method, the tag is set to None for all PValues, meaning the output tags are 
> set to an enumeration index over the PCollection outputs. The tags are not 
> propagated to correctly which can be a problem on relying on the output 
> PCollection tags to match the user set values.
> The fix is to correct BEAM-1833, and always pass in the tags. However, that 
> doesn't fix the problem for nested PCollections. If you have a dict of lists 
> of PCollections, what should their tags be correctly set to? In order to fix 
> this, first propagate the correct tag then talk with the community about the 
> best auto-generated tags.
> Some users may rely on the old implementation, so a flag will be created: 
> "force_generated_pcollection_output_ids" and be default set to False. If 
> True, this will go to the old implementation and generate tags for 
> PCollections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to