[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

ASF GitHub Bot (Jira) Tue, 18 Feb 2020 10:22:24 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=388971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388971
 ]


ASF GitHub Bot logged work on BEAM-1833:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Feb/20 18:21
            Start Date: 18/Feb/20 18:21
    Worklog Time Spent: 10m 
      Work Description: rohdesamuel commented on pull request #10860: 
[BEAM-1833] Fixes BEAM-1833
URL: https://github.com/apache/beam/pull/10860#discussion_r380851306
 
 

 ##########
 File path: CHANGES.md
 ##########
 @@ -36,6 +36,7 @@
 
 * ReadFromPubSub(topic=<topic>) in Python previously created a subscription 
under the same project as the topic. Now it will create the subscription under 
the project specified in pipeline_options. If the project is not specified in 
pipeline_options, then it will create the subscription under the same project 
as the topic. ([BEAM-3453](https://issues.apache.org/jira/browse/BEAM-3453)).
 * SpannerAccessor in Java is now package-private to reduce API surface. 
`SpannerConfig.connectToSpanner` has been moved to `SpannerAccessor.create`. 
([BEAM-9310](https://issues.apache.org/jira/browse/BEAM-9310)).
+* PCollections will now have their tags correctly propagated through the 
Pipeline. Users may expect the old implementation which gave PCollection output 
ids a monotonically increasing id. To go back to the old implementation, use 
the "force_generated_pcollection_output_ids" experiment. The default is the new 
implementation (force_generated_pcollection_output_ids=False).
 
 Review comment:
   I looked into your question a bit more about determinism in the generated 
ids, and I found that since it traverses dicts, the order is non-deterministic. 
DoOutputsTuple manually add themselves correctly to their producer. For Tuples, 
I'm keeping the old implementation. For PValues, I fixed the bug to now 
correctly propagate the tag. 
   
   Unless I'm mistaken, for the user to want to use the old implementation they 
are then either:
   
   - Relying on a bug (PValue)
   - Relying on non-deterministic behavior for generated tags (dicts)
   - Or using Tuples, which I didn't change.
   
   Is this okay to leave the default as-is?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 388971)
    Time Spent: 40m  (was: 0.5h)

> Restructure Python pipeline construction to better follow the Runner API
> ------------------------------------------------------------------------
>
>                 Key: BEAM-1833
>                 URL: https://issues.apache.org/jira/browse/BEAM-1833
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Robert Bradshaw
>            Assignee: Sam Rohde
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The most important part is removing the runner.apply overrides, but there are 
> also various other improvements (e.g. all inputs and outputs should be named).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

Reply via email to