[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=388971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388971 ]
ASF GitHub Bot logged work on BEAM-1833: ---------------------------------------- Author: ASF GitHub Bot Created on: 18/Feb/20 18:21 Start Date: 18/Feb/20 18:21 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10860: [BEAM-1833] Fixes BEAM-1833 URL: https://github.com/apache/beam/pull/10860#discussion_r380851306 ########## File path: CHANGES.md ########## @@ -36,6 +36,7 @@ * ReadFromPubSub(topic=<topic>) in Python previously created a subscription under the same project as the topic. Now it will create the subscription under the project specified in pipeline_options. If the project is not specified in pipeline_options, then it will create the subscription under the same project as the topic. ([BEAM-3453](https://issues.apache.org/jira/browse/BEAM-3453)). * SpannerAccessor in Java is now package-private to reduce API surface. `SpannerConfig.connectToSpanner` has been moved to `SpannerAccessor.create`. ([BEAM-9310](https://issues.apache.org/jira/browse/BEAM-9310)). +* PCollections will now have their tags correctly propagated through the Pipeline. Users may expect the old implementation which gave PCollection output ids a monotonically increasing id. To go back to the old implementation, use the "force_generated_pcollection_output_ids" experiment. The default is the new implementation (force_generated_pcollection_output_ids=False). Review comment: I looked into your question a bit more about determinism in the generated ids, and I found that since it traverses dicts, the order is non-deterministic. DoOutputsTuple manually add themselves correctly to their producer. For Tuples, I'm keeping the old implementation. For PValues, I fixed the bug to now correctly propagate the tag. Unless I'm mistaken, for the user to want to use the old implementation they are then either: - Relying on a bug (PValue) - Relying on non-deterministic behavior for generated tags (dicts) - Or using Tuples, which I didn't change. Is this okay to leave the default as-is? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 388971) Time Spent: 40m (was: 0.5h) > Restructure Python pipeline construction to better follow the Runner API > ------------------------------------------------------------------------ > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Reporter: Robert Bradshaw > Assignee: Sam Rohde > Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)