[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438031 ]
ASF GitHub Bot logged work on BEAM-9322: ---------------------------------------- Author: ASF GitHub Bot Created on: 27/May/20 21:42 Start Date: 27/May/20 21:42 Worklog Time Spent: 10m Work Description: rohdesamuel opened a new pull request #11838: URL: https://github.com/apache/beam/pull/11838 Change-Id: I6a8eba4e323bf0fff318a56e44e512916c06266f https://github.com/apache/beam/pull/11765 removes the ability to set the output id on TestStreams with single outputs. This PR circumvents this by always adding a dummy output to the TestStream so that it will always output a dict, so that we can control the output ids. ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) XLang | --- | --- | --- | [](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) Pre-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ --- |Java | Python | Go | Website --- | --- | --- | --- | --- Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 438031) Time Spent: 4h 40m (was: 4.5h) > Python SDK ignores manually set PCollection tags > ------------------------------------------------ > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Sam Rohde > Assignee: Sam Rohde > Priority: P1 > Time Spent: 4h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)