[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=455029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455029 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 06/Jul/20 18:58 Start Date: 06/Jul/20 18:58 Worklog Time Spent: 10m Work Description: pabloem merged pull request #11765: URL: https://github.com/apache/beam/pull/11765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 455029) Time Spent: 10h 40m (was: 10.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 10h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=455027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455027 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 06/Jul/20 18:57 Start Date: 06/Jul/20 18:57 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-654409876 thanks. Sam has confirmed the import works. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 455027) Time Spent: 10h 20m (was: 10h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 10h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=455028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455028 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 06/Jul/20 18:57 Start Date: 06/Jul/20 18:57 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-654409890 > just to confirm: have you verified that this imports (and tests) well into google repository? > @rohdesamuel Yes, the last test successfully imported (6/24). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 455028) Time Spent: 10.5h (was: 10h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 10.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=453220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453220 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 01/Jul/20 04:21 Start Date: 01/Jul/20 04:21 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-652180770 just to confirm: have you verified that this imports (and tests) well into google repository? @rohdesamuel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453220) Time Spent: 10h 10m (was: 10h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 10h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=453139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453139 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 30/Jun/20 22:06 Start Date: 30/Jun/20 22:06 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-652071096 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453139) Time Spent: 10h (was: 9h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 10h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=453138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453138 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 30/Jun/20 22:04 Start Date: 30/Jun/20 22:04 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-652070641 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453138) Time Spent: 9h 50m (was: 9h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 9h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=453099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453099 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 30/Jun/20 20:07 Start Date: 30/Jun/20 20:07 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-652015699 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453099) Time Spent: 9h 40m (was: 9.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 9h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=452666&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452666 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 29/Jun/20 22:11 Start Date: 29/Jun/20 22:11 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-651396398 can you rebase this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 452666) Time Spent: 9.5h (was: 9h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 9.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=451688&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451688 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Jun/20 18:41 Start Date: 26/Jun/20 18:41 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-650335332 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 451688) Time Spent: 9h 20m (was: 9h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 9h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=451284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451284 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 25/Jun/20 20:28 Start Date: 25/Jun/20 20:28 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-649800364 hm I don't know why this triggered all tests... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 451284) Time Spent: 9h 10m (was: 9h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 9h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=451283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451283 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 25/Jun/20 20:27 Start Date: 25/Jun/20 20:27 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-649800068 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 451283) Time Spent: 9h (was: 8h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 9h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=451224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451224 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 25/Jun/20 18:04 Start Date: 25/Jun/20 18:04 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-649734817 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 451224) Time Spent: 8h 50m (was: 8h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 8h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=447546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447546 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 17/Jun/20 21:58 Start Date: 17/Jun/20 21:58 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-645647208 Had to force-push to rebase to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 447546) Time Spent: 8h 40m (was: 8.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 8h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=447545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447545 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 17/Jun/20 21:57 Start Date: 17/Jun/20 21:57 Worklog Time Spent: 10m Work Description: rohdesamuel commented on a change in pull request #11765: URL: https://github.com/apache/beam/pull/11765#discussion_r441856904 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -270,11 +256,19 @@ def get_named_nested_pvalues(pvalueish): tagged_values = pvalueish.items() else: if isinstance(pvalueish, (pvalue.PValue, pvalue.DoOutputsTuple)): - yield None, pvalueish + # For transforms that only have a tagged PCollection as an output, + # propagate that tag forward. + if first_iteration and isinstance(pvalueish, pvalue.PValue): +yield pvalueish.tag, pvalueish Review comment: Yep, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 447545) Time Spent: 8.5h (was: 8h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 8.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=446821&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446821 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 16/Jun/20 18:44 Start Date: 16/Jun/20 18:44 Worklog Time Spent: 10m Work Description: pabloem merged pull request #11838: URL: https://github.com/apache/beam/pull/11838 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 446821) Time Spent: 8h 10m (was: 8h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 8h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=446822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446822 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 16/Jun/20 18:44 Start Date: 16/Jun/20 18:44 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-644946194 we only had to believe in ourselves : D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 446822) Time Spent: 8h 20m (was: 8h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 8h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=446794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446794 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 16/Jun/20 17:54 Start Date: 16/Jun/20 17:54 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-644917017 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 446794) Time Spent: 8h (was: 7h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 8h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=446607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446607 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 16/Jun/20 16:07 Start Date: 16/Jun/20 16:07 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-644860181 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 446607) Time Spent: 7h 50m (was: 7h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 7h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=446181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446181 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 16/Jun/20 00:12 Start Date: 16/Jun/20 00:12 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-644455059 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 446181) Time Spent: 7h 40m (was: 7.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 7h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=445151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-445151 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 12/Jun/20 20:32 Start Date: 12/Jun/20 20:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-643471534 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 445151) Time Spent: 7.5h (was: 7h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 7.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=444718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-444718 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 12/Jun/20 04:27 Start Date: 12/Jun/20 04:27 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-643057803 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 444718) Time Spent: 7h 20m (was: 7h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 7h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=444628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-444628 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 11/Jun/20 23:49 Start Date: 11/Jun/20 23:49 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-642985766 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 444628) Time Spent: 7h 10m (was: 7h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 7h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=444627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-444627 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 11/Jun/20 23:49 Start Date: 11/Jun/20 23:49 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-642985718 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 444627) Time Spent: 7h (was: 6h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 7h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=443879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443879 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 10/Jun/20 17:46 Start Date: 10/Jun/20 17:46 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11765: URL: https://github.com/apache/beam/pull/11765#discussion_r438299971 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -270,11 +256,19 @@ def get_named_nested_pvalues(pvalueish): tagged_values = pvalueish.items() else: if isinstance(pvalueish, (pvalue.PValue, pvalue.DoOutputsTuple)): - yield None, pvalueish + # For transforms that only have a tagged PCollection as an output, + # propagate that tag forward. + if first_iteration and isinstance(pvalueish, pvalue.PValue): +yield pvalueish.tag, pvalueish Review comment: We can remove this now that the TestStream change is going in right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 443879) Time Spent: 6h 50m (was: 6h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 6h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=443874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443874 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 10/Jun/20 17:36 Start Date: 10/Jun/20 17:36 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-642155990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 443874) Time Spent: 6h 40m (was: 6.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Priority: P1 > Time Spent: 6h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=443482&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443482 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 10/Jun/20 00:04 Start Date: 10/Jun/20 00:04 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r437787906 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -291,10 +291,10 @@ def expand(self, pbegin): assert isinstance(pbegin, pvalue.PBegin) self.pipeline = pbegin.pipeline if not self.output_tags: - self.output_tags = set([None]) + self.output_tags = {None} Review comment: OK, in that case I'm fine with this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 443482) Time Spent: 6.5h (was: 6h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Labels: stale-assigned > Time Spent: 6.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=443477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443477 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 09/Jun/20 23:54 Start Date: 09/Jun/20 23:54 Worklog Time Spent: 10m Work Description: rohdesamuel commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r437784983 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -291,10 +291,10 @@ def expand(self, pbegin): assert isinstance(pbegin, pvalue.PBegin) self.pipeline = pbegin.pipeline if not self.output_tags: - self.output_tags = set([None]) + self.output_tags = {None} Review comment: This is a little harder to implement, mainly because the TestStream retrieves its output_tags from the keys of the PTransform payload holding it. This means that output_tags = None and output_tags = {None} look the same to the PTransform payload outputs as a map with a single key being None. When a TestStream is reconstructed, even if the original output_tags was unset, it will be constructed with output_tags = {None}. I think the best we can do is to treat {None} and None the same way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 443477) Time Spent: 6h 20m (was: 6h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Labels: stale-assigned > Time Spent: 6h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=443384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443384 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 09/Jun/20 20:37 Start Date: 09/Jun/20 20:37 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r437702523 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -291,10 +291,10 @@ def expand(self, pbegin): assert isinstance(pbegin, pvalue.PBegin) self.pipeline = pbegin.pipeline if not self.output_tags: - self.output_tags = set([None]) + self.output_tags = {None} Review comment: If the user explicitly sets the output tags to {None}, they might be expecting a dict. (Specifically, they might get a set from elsewhere, and set the output tags from that set, and it would be awkward to have to check that set to determine how to interpret the result. So in this case I would do ``` if not self.output_tags: return pvalue.PCollection(self.pipeline, is_bounded=False) else: return { ... for tag in self.output_tags} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 443384) Time Spent: 6h (was: 5h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Labels: stale-assigned > Time Spent: 6h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=443385&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443385 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 09/Jun/20 20:37 Start Date: 09/Jun/20 20:37 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-641555415 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 443385) Time Spent: 6h 10m (was: 6h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Labels: stale-assigned > Time Spent: 6h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438383 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 28/May/20 18:37 Start Date: 28/May/20 18:37 Worklog Time Spent: 10m Work Description: rohdesamuel commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r432041629 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -315,6 +327,7 @@ def read_multiple(self, labels): StreamingCacheSource(self._cache_dir, l, self._is_cache_complete).read(tail=True) for l in labels +if not [sub_l for sub_l in l if self.sentinel_label() in sub_l] Review comment: Sorry, I changed the PR and it looks like your comment is out of date. Can you PTAL? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438383) Time Spent: 5h 50m (was: 5h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 5h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438066 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 22:33 Start Date: 27/May/20 22:33 Worklog Time Spent: 10m Work Description: KevinGG commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431480466 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -315,6 +327,7 @@ def read_multiple(self, labels): StreamingCacheSource(self._cache_dir, l, self._is_cache_complete).read(tail=True) for l in labels +if not [sub_l for sub_l in l if self.sentinel_label() in sub_l] Review comment: Or if a label `l` is a `list` of `str` and `*labels` is a `list` of `list` of `str`, then this makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438066) Time Spent: 5h 40m (was: 5.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 5h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438061 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 22:25 Start Date: 27/May/20 22:25 Worklog Time Spent: 10m Work Description: KevinGG commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431469495 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -315,6 +327,7 @@ def read_multiple(self, labels): StreamingCacheSource(self._cache_dir, l, self._is_cache_complete).read(tail=True) for l in labels +if not [sub_l for sub_l in l if self.sentinel_label() in sub_l] Review comment: This is a little hard to read. Isn't a label `l` a `str`, so a `sub_l` is a character of that `str`? I suppose `if not [sub_l for ...]` evaluates to `True` when the `[sub_l for ...]` is empty. And the emptiness of `[sub_l for ...]` is based on whether the `sentinel_label` exists in the `sub_l`? This is where I get confused. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438061) Time Spent: 5.5h (was: 5h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 5.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438051 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 22:10 Start Date: 27/May/20 22:10 Worklog Time Spent: 10m Work Description: rohdesamuel commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431471609 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -304,6 +304,18 @@ def read(self, *labels): return iter([]), -1 return StreamingCache.Reader([header], [reader]).read(), 1 + @staticmethod + def sentinel_label(): Review comment: Yeah that can work, I like that because it keeps the same semantics. I'll go with the {None} alternative because the output_tags are always manually specified in the from_runner_api_parameter method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438051) Time Spent: 5h 20m (was: 5h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 5h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438040 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 21:57 Start Date: 27/May/20 21:57 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-634964145 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438040) Time Spent: 5h 10m (was: 5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 5h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438039 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 21:56 Start Date: 27/May/20 21:56 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431465968 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -304,6 +304,18 @@ def read(self, *labels): return iter([]), -1 return StreamingCache.Reader([header], [reader]).read(), 1 + @staticmethod + def sentinel_label(): Review comment: Rather than introduce a sentinel label, how about returning a dict from expand iff output_tags was manually specified (or, alternatively, something other than `{None}`)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438039) Time Spent: 5h (was: 4h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438035 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 21:51 Start Date: 27/May/20 21:51 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-634961730 R: @KevinGG This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438035) Time Spent: 4h 50m (was: 4h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 4h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=438031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438031 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/May/20 21:42 Start Date: 27/May/20 21:42 Worklog Time Spent: 10m Work Description: rohdesamuel opened a new pull request #11838: URL: https://github.com/apache/beam/pull/11838 Change-Id: I6a8eba4e323bf0fff318a56e44e512916c06266f https://github.com/apache/beam/pull/11765 removes the ability to set the output id on TestStreams with single outputs. This PR circumvents this by always adding a dummy output to the TestStream so that it will always output a dict, so that we can control the output ids. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_Val
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=436260&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436260 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 21/May/20 23:04 Start Date: 21/May/20 23:04 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11765: URL: https://github.com/apache/beam/pull/11765#discussion_r428960780 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -270,11 +256,19 @@ def get_named_nested_pvalues(pvalueish): tagged_values = pvalueish.items() else: if isinstance(pvalueish, (pvalue.PValue, pvalue.DoOutputsTuple)): - yield None, pvalueish + # For transforms that only have a tagged PCollection as an output, + # propagate that tag forward. + if first_iteration and isinstance(pvalueish, pvalue.PValue): +yield pvalueish.tag, pvalueish Review comment: I think this may break some google3 runners. Can you ensure that this imports correctly? (Could you also explain why this is needed?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436260) Time Spent: 4.5h (was: 4h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 4.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=436162&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-436162 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 21/May/20 19:29 Start Date: 21/May/20 19:29 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11765: URL: https://github.com/apache/beam/pull/11765#issuecomment-632299883 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 436162) Time Spent: 4h 20m (was: 4h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P1 > Time Spent: 4h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=435718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-435718 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 20/May/20 21:56 Start Date: 20/May/20 21:56 Worklog Time Spent: 10m Work Description: rohdesamuel opened a new pull request #11765: URL: https://github.com/apache/beam/pull/11765 Change-Id: I8c2d660b175442d1917fe2b1ae166c0f4a1caaca This turns "passthrough_pcollection_output_ids" and "force_generated_pcollection_output_ids" to True by default. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](htt
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=418922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418922 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 08/Apr/20 21:45 Start Date: 08/Apr/20 21:45 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418922) Time Spent: 4h (was: 3h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 4h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=418916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418916 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 08/Apr/20 21:36 Start Date: 08/Apr/20 21:36 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#issuecomment-611207545 > @rohdesamuel is this good to go? LGTM, sorry I thought you had the ball. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418916) Time Spent: 3h 50m (was: 3h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=415779&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415779 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 03/Apr/20 22:15 Start Date: 03/Apr/20 22:15 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#issuecomment-608712134 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415779) Time Spent: 3.5h (was: 3h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=415780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415780 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 03/Apr/20 22:15 Start Date: 03/Apr/20 22:15 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#issuecomment-608712255 @rohdesamuel is this good to go? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415780) Time Spent: 3h 40m (was: 3.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=414833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414833 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 02/Apr/20 16:44 Start Date: 02/Apr/20 16:44 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#issuecomment-607960904 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414833) Time Spent: 3h 20m (was: 3h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=414832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414832 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 02/Apr/20 16:44 Start Date: 02/Apr/20 16:44 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#issuecomment-607960807 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414832) Time Spent: 3h 10m (was: 3h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=414398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414398 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 02/Apr/20 01:01 Start Date: 02/Apr/20 01:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#discussion_r401995268 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -671,7 +671,11 @@ def apply( # If the user wants the old implementation of always generated # PCollection output ids, then set the tag to None first, then count up # from 1. -tag = len(current.outputs) if None in current.outputs else None +base = tag +counter = 0 +while tag in current.outputs: + counter += 1 + tag = '%s_%d' % (base, counter) current.add_output(result, tag) Review comment: I am relatively confident in this change, as it preserves the essential characteristic (that output names are unique) and defaults to the same thing for all single-output transforms. However, I have added the opt-out you had originally with a note just in case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414398) Time Spent: 3h (was: 2h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 3h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=414379&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414379 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 01/Apr/20 23:26 Start Date: 01/Apr/20 23:26 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283#discussion_r401967131 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -671,7 +671,11 @@ def apply( # If the user wants the old implementation of always generated # PCollection output ids, then set the tag to None first, then count up # from 1. -tag = len(current.outputs) if None in current.outputs else None +base = tag +counter = 0 +while tag in current.outputs: + counter += 1 + tag = '%s_%d' % (base, counter) current.add_output(result, tag) Review comment: Should this be under the experiment flag instead, so that we don't inadvertently break anyone? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414379) Time Spent: 2h 50m (was: 2h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=414346&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414346 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 01/Apr/20 22:34 Start Date: 01/Apr/20 22:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11283: [BEAM-9322] [BEAM-1833] Better naming for composite transform output tags. URL: https://github.com/apache/beam/pull/11283 This gives names like `'a', 'b.0', 'b.1'` for results like {'a': ..., 'b': (..., ...)}`. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://build
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393919 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/Feb/20 02:30 Start Date: 27/Feb/20 02:30 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591747015 > I see this error in the logs: > > 17:01:54 > assert event_tags.issubset(self.output_tags) > 17:01:54 E AssertionError: assert False > 17:01:54 E + where False = (set([None, '1'])) > 17:01:54 E + where = set(['a', 'b']).issubset > 17:01:54 E + and set([None, '1']) = .output_tags > > @rohdesamuel could you take a look? Yep, taking a look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393919) Time Spent: 2.5h (was: 2h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393918 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/Feb/20 02:30 Start Date: 27/Feb/20 02:30 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591746886 I see this error in the logs: 17:01:54 > assert event_tags.issubset(self.output_tags) 17:01:54 E AssertionError: assert False 17:01:54 E + where False = (set([None, '1'])) 17:01:54 E +where = set(['a', 'b']).issubset 17:01:54 E +and set([None, '1']) = .output_tags @rohdesamuel could you take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393918) Time Spent: 2h 20m (was: 2h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393914 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/Feb/20 02:21 Start Date: 27/Feb/20 02:21 Worklog Time Spent: 10m Work Description: udim commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591741341 This change may have broken precommits: https://builds.apache.org/job/beam_PreCommit_Python_Cron/2443/ edit: actually, not quite sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393914) Time Spent: 2h 10m (was: 2h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393909 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/Feb/20 02:10 Start Date: 27/Feb/20 02:10 Worklog Time Spent: 10m Work Description: udim commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591741341 I believe this change may have broken precommits: https://builds.apache.org/job/beam_PreCommit_Python_Cron/2443/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393909) Time Spent: 1h 50m (was: 1h 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393910 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 27/Feb/20 02:10 Start Date: 27/Feb/20 02:10 Worklog Time Spent: 10m Work Description: udim commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591741341 This change may have broken precommits: https://builds.apache.org/job/beam_PreCommit_Python_Cron/2443/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393910) Time Spent: 2h (was: 1h 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393813 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 22:36 Start Date: 26/Feb/20 22:36 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393813) Time Spent: 1h 40m (was: 1.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393811 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 22:34 Start Date: 26/Feb/20 22:34 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#discussion_r384809280 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -620,23 +620,25 @@ def apply(self, transform, pvalueish=None, label=None): current.add_output(result, result._main_tag) continue + # TODO(BEAM-9322): Remove the experiment check and have this conditional + # be the default. + if self._options.view_as(DebugOptions).lookup_experiment( + 'passthrough_pcollection_output_ids', default=False): +# Otherwise default to the new implementation which only auto-generates Review comment: I think this comment could be improved, because this is not the otherwise case anymore. Follow up PR would be fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393811) Time Spent: 1.5h (was: 1h 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393805 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 22:29 Start Date: 26/Feb/20 22:29 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591677820 > @aaltay Even though I prefer a proper fix, being pragmatic is important. This still needs to be reviewed though and am currently juggling an ill child. Ack. I will do the review. I wanted to understand your position. Take care. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393805) Time Spent: 1h 20m (was: 1h 10m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393754 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 21:18 Start Date: 26/Feb/20 21:18 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591649117 @aaltay Even though I prefer a proper fix, being pragmatic is important. This still needs to be reviewed though and am currently juggling an ill child. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393754) Time Spent: 1h 10m (was: 1h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393726 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 19:45 Start Date: 26/Feb/20 19:45 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591608825 My suggestion, would be to merge this and not change the behavior for 2.20. The reason is, we think from Google's internal users that this might impact about ~1% of the users. I do not believe we have time to improve this in time for the release branch cut happening today and I will error on not breaking any users. Counter point is: 1% is not very large and we can force a small group of users to set a flag to use a newer version. (I still think it is better to make this decision without the time pressure of immenant release cut.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393726) Time Spent: 1h (was: 50m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393153 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 03:14 Start Date: 26/Feb/20 03:14 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10971: [BEAM-9322] Fix tag output names within Dataflow to be consistent with values used in proto. URL: https://github.com/apache/beam/pull/10971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393153) Time Spent: 50m (was: 40m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393102 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 01:10 Start Date: 26/Feb/20 01:10 Worklog Time Spent: 10m Work Description: ananvay commented on issue #10971: [BEAM-9322] Fix tag output names within Dataflow to be consistent with values used in proto. URL: https://github.com/apache/beam/pull/10971#issuecomment-591181974 Thanks a lot Luke! Overall LGTM, assuming the tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393102) Time Spent: 40m (was: 0.5h) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393077 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 00:24 Start Date: 26/Feb/20 00:24 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10971: [BEAM-9322] Fix tag output names within Dataflow to be consistent with values used in proto. URL: https://github.com/apache/beam/pull/10971#issuecomment-591162130 R: @ananvay @rohdesamuel CC: @robertwb @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393077) Time Spent: 0.5h (was: 20m) > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=393076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393076 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 26/Feb/20 00:24 Start Date: 26/Feb/20 00:24 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10971: [BEAM-9322] Fix tag output names within Dataflow to be consistent with values used in proto. URL: https://github.com/apache/beam/pull/10971 This updates the Python SDK to drop 'out'_ and 'out' mangling in favor of using 'None' as the main output when there are more then one output defined otherwise the only output becomes the main output. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https:
[jira] [Work logged] (BEAM-9322) Python SDK ignores manually set PCollection tags
[ https://issues.apache.org/jira/browse/BEAM-9322?focusedWorklogId=392977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392977 ] ASF GitHub Bot logged work on BEAM-9322: Author: ASF GitHub Bot Created on: 25/Feb/20 21:35 Start Date: 25/Feb/20 21:35 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10934: [BEAM-9322] [BEAM-1833] Broke some people, setting the default to have the experiment be disabled URL: https://github.com/apache/beam/pull/10934#issuecomment-591081377 I would prefer a fix that moves us closer to solve the output naming issues within the Python SDK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392977) Remaining Estimate: 0h Time Spent: 10m > Python SDK ignores manually set PCollection tags > > > Key: BEAM-9322 > URL: https://issues.apache.org/jira/browse/BEAM-9322 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The Python SDK currently ignores any tags set on PCollections manually when > applying PTransforms when adding the PCollection to the PTransform > [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. > In the > [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] > method, the tag is set to None for all PValues, meaning the output tags are > set to an enumeration index over the PCollection outputs. The tags are not > propagated to correctly which can be a problem on relying on the output > PCollection tags to match the user set values. > The fix is to correct BEAM-1833, and always pass in the tags. However, that > doesn't fix the problem for nested PCollections. If you have a dict of lists > of PCollections, what should their tags be correctly set to? In order to fix > this, first propagate the correct tag then talk with the community about the > best auto-generated tags. > Some users may rely on the old implementation, so a flag will be created: > "force_generated_pcollection_output_ids" and be default set to False. If > True, this will go to the old implementation and generate tags for > PCollections. -- This message was sent by Atlassian Jira (v8.3.4#803005)