[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=389828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389828 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 20/Feb/20 07:17 Start Date: 20/Feb/20 07:17 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869#issuecomment-588657689 I'm not qualified to give my ok, but I have a question: it there a reason why it's a an option, shouldn't it always handle the logical type? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389828) Time Spent: 50m (was: 40m) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=389826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389826 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 20/Feb/20 07:12 Start Date: 20/Feb/20 07:12 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #10502: [BEAM-7274] Add DynamicMessage Schema support URL: https://github.com/apache/beam/pull/10502#issuecomment-588652660 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389826) Time Spent: 24h 20m (was: 24h 10m) > Protobuf Beam Schema support > > > Key: BEAM-7274 > URL: https://issues.apache.org/jira/browse/BEAM-7274 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 24h 20m > Remaining Estimate: 0h > > Add support for the new Beam Schema to the Protobuf extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389780 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 20/Feb/20 03:55 Start Date: 20/Feb/20 03:55 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#issuecomment-588596733 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389780) Time Spent: 47h 40m (was: 47.5h) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 47h 40m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389782 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 20/Feb/20 03:55 Start Date: 20/Feb/20 03:55 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#issuecomment-588596837 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389782) Time Spent: 48h (was: 47h 50m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 48h > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389781 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 20/Feb/20 03:55 Start Date: 20/Feb/20 03:55 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#issuecomment-588596780 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389781) Time Spent: 47h 50m (was: 47h 40m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 47h 50m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389779 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:54 Start Date: 20/Feb/20 03:54 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381723978 ## File path: sdks/python/apache_beam/runners/interactive/interactive_environment.py ## @@ -130,6 +139,17 @@ def __init__(self, cache_manager=None): 'You have limited Interactive Beam features since your ' 'ipython kernel is not connected any notebook frontend.') + @property + def options(self): +"""A reference to the global interactive options. + +Provided to avoid import loop or excessive dynamic import. All internal +Interactive Beam modules should access interactive_beam.options through +this property. +""" +from apache_beam.runners.interactive.interactive_beam import options +return options Review comment: If all the getters/setters are available in `interactive_beam`, why do the user need to use this setter here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389779) Time Spent: 66h 10m (was: 66h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 66h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389778 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:52 Start Date: 20/Feb/20 03:52 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381723166 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -132,7 +256,22 @@ def is_source_to_cache_changed(user_pipeline): is_changed = not current_signature.issubset(recorded_signature) # The computation of extract_unbounded_source_signature is expensive, track on # change by default. - if is_changed: + if is_changed and update_cached_source_signature: +if ie.current_env().options.enable_capture_replay: + if not recorded_signature: +_LOGGER.info( +'Interactive Beam has detected you have unbounded sources ' +'in your pipeline. In order to have a deterministic replay ' +'of your pipeline: {}'.format( Review comment: I think both cases should result in a complete sentence. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389778) Time Spent: 66h (was: 65h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 66h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389776 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:52 Start Date: 20/Feb/20 03:52 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381722820 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) + + +class BackgroundCachingJob(object): + """A simple abstraction that controls necessary components of a timed and + [disk] space limited background caching job. + + A background caching job successfully terminates in 2 conditions: + +#. The job is finite and runs into DONE state; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389776) Time Spent: 65h 40m (was: 65.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 65h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389777 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:52 Start Date: 20/Feb/20 03:52 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381722850 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) + + +class BackgroundCachingJob(object): + """A simple abstraction that controls necessary components of a timed and + [disk] space limited background caching job. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389777) Time Spent: 65h 50m (was: 65h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 65h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389775 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:51 Start Date: 20/Feb/20 03:51 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381722658 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) Review comment: It has limited effect but it still overrides user choice. Presumably we can have a pipeline option that can set logging level per module level. Also, is not the default logging level info anway? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389775) Time Spent: 65.5h (was: 65h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 65.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389774 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:49 Start Date: 20/Feb/20 03:49 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381721648 ## File path: sdks/python/apache_beam/runners/interactive/options/capture_control.py ## @@ -15,6 +15,16 @@ # limitations under the License. # +"""Module to control how Interactive Beam captures data from sources for +deterministic replayable PCollection evaluation and pipeline runs. + +For internal use only; no backwards-compatibility guarantees. +""" + +# pytype: skip-file Review comment: Yes, ack. Confused this with mypy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389774) Time Spent: 65h 20m (was: 65h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 65h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9335) update hard-coded coder id when translating Java external transforms
[ https://issues.apache.org/jira/browse/BEAM-9335?focusedWorklogId=389773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389773 ] ASF GitHub Bot logged work on BEAM-9335: Author: ASF GitHub Bot Created on: 20/Feb/20 03:47 Start Date: 20/Feb/20 03:47 Worklog Time Spent: 10m Work Description: ihji commented on issue #10900: [BEAM-9335] update hard-coded coder id when translating Java external transforms URL: https://github.com/apache/beam/pull/10900#issuecomment-588595193 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389773) Time Spent: 20m (was: 10m) > update hard-coded coder id when translating Java external transforms > > > Key: BEAM-9335 > URL: https://issues.apache.org/jira/browse/BEAM-9335 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > hard-coded coder id needs to be updated when translating Java external > transforms. Otherwise pipeline will fail if coder id is reused. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389772 ] ASF GitHub Bot logged work on BEAM-9341: Author: ASF GitHub Bot Created on: 20/Feb/20 03:45 Start Date: 20/Feb/20 03:45 Worklog Time Spent: 10m Work Description: ihji commented on issue #10912: [BEAM-9341] postcommit xvr flink fix URL: https://github.com/apache/beam/pull/10912#issuecomment-588594812 Ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389772) Time Spent: 1h (was: 50m) > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filipe Regadas updated BEAM-9321: - Priority: Major (was: Minor) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=389767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389767 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 20/Feb/20 03:24 Start Date: 20/Feb/20 03:24 Worklog Time Spent: 10m Work Description: regadas commented on issue #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869#issuecomment-588590233 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389767) Time Spent: 40m (was: 0.5h) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389762 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 02:47 Start Date: 20/Feb/20 02:47 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10908: [BEAM-9339] Declare capabilities for Python SDK. URL: https://github.com/apache/beam/pull/10908#discussion_r381688154 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1209,6 +1209,30 @@ message ExternalPayload { map params = 2; // Arbitrary extra parameters to pass } +// These URNs are used to indicate capabilities of environments that cannot +// simply be expressed as a component (such as a Coder or PTransform) that this +// environment understands. +message StandardProtocols { + enum Enum { +// Indicates suport for progress reporting via the legacy metric APIs. +LEGACY_PROGRESS_REPORTING = 0 [(beam_urn) = "beam:protocol:progress_reporting:v0"]; + +// Indicates suport for progress reporting via the new metric APIs. Review comment: nit: should we add a reference to clarify what old/new metrics are ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389762) Time Spent: 1h (was: 50m) > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389763 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 02:47 Start Date: 20/Feb/20 02:47 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10908: [BEAM-9339] Declare capabilities for Python SDK. URL: https://github.com/apache/beam/pull/10908#discussion_r381690351 ## File path: sdks/python/apache_beam/transforms/environments_test.py ## @@ -53,10 +63,14 @@ def test_environment_encoding(self): state_cache_size=0, data_buffer_time_limit_ms=0), SubprocessSDKEnvironment(command_string=u'foö')): context = pipeline_context.PipelineContext() - self.assertEqual( - environment, - Environment.from_runner_api( - environment.to_runner_api(context), context)) + proto = environment.to_runner_api(context) + reconstructed = Environment.from_runner_api(proto, context) + self.assertEqual(environment, reconstructed) + self.assertEqual(proto, reconstructed.to_runner_api(context)) + + def test_sdk_capabilities(self): +sdk_capabilities = environments.python_sdk_capabilities() +self.assertIn(common_urns.coders.LENGTH_PREFIX.urn, sdk_capabilities) Review comment: Probably test for every expected capability here ? And also force failure if the test has not been updated to reflect a new capability ? The test can be updated as the list of capabilities are updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389763) Time Spent: 1h (was: 50m) > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9338) add postcommit XVR spark badge
[ https://issues.apache.org/jira/browse/BEAM-9338?focusedWorklogId=389760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389760 ] ASF GitHub Bot logged work on BEAM-9338: Author: ASF GitHub Bot Created on: 20/Feb/20 02:37 Start Date: 20/Feb/20 02:37 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10907: [BEAM-9338] add postcommit XVR spark badges URL: https://github.com/apache/beam/pull/10907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389760) Time Spent: 0.5h (was: 20m) > add postcommit XVR spark badge > -- > > Key: BEAM-9338 > URL: https://issues.apache.org/jira/browse/BEAM-9338 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > add postcommit xvr spark badges -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389755 ] ASF GitHub Bot logged work on BEAM-9341: Author: ASF GitHub Bot Created on: 20/Feb/20 02:24 Start Date: 20/Feb/20 02:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10912: [BEAM-9341] postcommit xvr flink fix URL: https://github.com/apache/beam/pull/10912#issuecomment-588576387 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389755) Time Spent: 40m (was: 0.5h) > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389756 ] ASF GitHub Bot logged work on BEAM-9341: Author: ASF GitHub Bot Created on: 20/Feb/20 02:24 Start Date: 20/Feb/20 02:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10912: [BEAM-9341] postcommit xvr flink fix URL: https://github.com/apache/beam/pull/10912#issuecomment-588576428 Run XVR_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389756) Time Spent: 50m (was: 40m) > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389749 ] ASF GitHub Bot logged work on BEAM-9341: Author: ASF GitHub Bot Created on: 20/Feb/20 02:10 Start Date: 20/Feb/20 02:10 Worklog Time Spent: 10m Work Description: ihji commented on issue #10912: [BEAM-9341] postcommit xvr flink fix URL: https://github.com/apache/beam/pull/10912#issuecomment-588573076 please run `Run XVR_Flink PostCommit` and `Run XVR_Spark PostCommit` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389749) Time Spent: 0.5h (was: 20m) > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389748 ] ASF GitHub Bot logged work on BEAM-9341: Author: ASF GitHub Bot Created on: 20/Feb/20 02:09 Start Date: 20/Feb/20 02:09 Worklog Time Spent: 10m Work Description: ihji commented on issue #10912: [BEAM-9341] postcommit xvr flink fix URL: https://github.com/apache/beam/pull/10912#issuecomment-588572710 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389748) Time Spent: 20m (was: 10m) > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389747 ] ASF GitHub Bot logged work on BEAM-9341: Author: ASF GitHub Bot Created on: 20/Feb/20 02:08 Start Date: 20/Feb/20 02:08 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10912: [BEAM-9341] postcommit xvr flink fix URL: https://github.com/apache/beam/pull/10912 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040565#comment-17040565 ] Chamikara Madhusanka Jayalath commented on BEAM-3788: - Added a comment to the email thread. I think this is a Flink specific issue that should go away after SDF. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389745 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 01:56 Start Date: 20/Feb/20 01:56 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #10911: [BEAM-9339] Declare capabilities for Go SDK. URL: https://github.com/apache/beam/pull/10911#discussion_r381659754 ## File path: sdks/go/pkg/beam/runners/universal/universal.go ## @@ -78,6 +78,20 @@ func Execute(ctx context.Context, p *beam.Pipeline) error { return err } +const ( + urnLegacyProgressReporting = "beam:protocol:progress_reporting:v0" + urnMultiCore = "beam:protocol:multi_core_bundle_processing:v1" +) + +func goCapabilities() []string { + capabilities := []string{ + urnLegacyProgressReporting, + urnLegacyProgressReporting, + } + capabilities = append(capabilities, graphx.KnownStandardCoders()...) + return capabilities +} + Review comment: You can return straight from the append and save a line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389745) > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389744 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 01:56 Start Date: 20/Feb/20 01:56 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #10911: [BEAM-9339] Declare capabilities for Go SDK. URL: https://github.com/apache/beam/pull/10911#discussion_r381658325 ## File path: sdks/go/pkg/beam/runners/universal/universal.go ## @@ -78,6 +78,20 @@ func Execute(ctx context.Context, p *beam.Pipeline) error { return err } +const ( + urnLegacyProgressReporting = "beam:protocol:progress_reporting:v0" + urnMultiCore = "beam:protocol:multi_core_bundle_processing:v1" +) + +func goCapabilities() []string { + capabilities := []string{ + urnLegacyProgressReporting, + urnLegacyProgressReporting, Review comment: Accidentally duped. urnMultiCore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389744) Time Spent: 50m (was: 40m) > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040558#comment-17040558 ] Chad Dombrova commented on BEAM-3788: - Unless something has changed recently, https://issues.apache.org/jira/browse/BEAM-7870 is still a blocker for using KafkaIO in python out of the box. As the title suggests, it's also blocking PubSubIO in python and conceptually any external transform with a non-trivial coder. [~mxm], [~bhulette] has anything changed on that issue lately? > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389742 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 20/Feb/20 01:37 Start Date: 20/Feb/20 01:37 Worklog Time Spent: 10m Work Description: yifanzou commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588565094 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389742) Time Spent: 3h (was: 2h 50m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389740 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 20/Feb/20 01:36 Start Date: 20/Feb/20 01:36 Worklog Time Spent: 10m Work Description: yifanzou commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588564792 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389740) Time Spent: 2h 50m (was: 2h 40m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389739 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 20/Feb/20 01:35 Start Date: 20/Feb/20 01:35 Worklog Time Spent: 10m Work Description: yifanzou commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588564714 test this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389739) Time Spent: 2h 40m (was: 2.5h) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9341: -- Description: started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9341) postcommit xvr flink, spark failure
[ https://issues.apache.org/jira/browse/BEAM-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9341: -- Status: Open (was: Triage Needed) > postcommit xvr flink, spark failure > --- > > Key: BEAM-9341 > URL: https://issues.apache.org/jira/browse/BEAM-9341 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9341) postcommit xvr flink, spark failure
Heejong Lee created BEAM-9341: - Summary: postcommit xvr flink, spark failure Key: BEAM-9341 URL: https://issues.apache.org/jira/browse/BEAM-9341 Project: Beam Issue Type: Bug Components: java-fn-execution Reporter: Heejong Lee Assignee: Heejong Lee -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389733 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 01:22 Start Date: 20/Feb/20 01:22 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10911: [BEAM-9339] Declare capabilities for Go SDK. URL: https://github.com/apache/beam/pull/10911#issuecomment-588561432 R: @lostluck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389733) Time Spent: 40m (was: 0.5h) > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389731 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 01:21 Start Date: 20/Feb/20 01:21 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10911: [BEAM-9339] Declare capabilities for Go SDK. URL: https://github.com/apache/beam/pull/10911 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness
[ https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389728 ] ASF GitHub Bot logged work on BEAM-3545: Author: ASF GitHub Bot Created on: 20/Feb/20 01:01 Start Date: 20/Feb/20 01:01 Worklog Time Spent: 10m Work Description: youngoli commented on issue #10906: [BEAM-3545] Fix race condition w/plan metrics. URL: https://github.com/apache/beam/pull/10906#issuecomment-588556050 That makes sense. I do kinda like moving the metrics store outside of the plan and into the harness/direct runner. It doesn't seem like the plan actually needs the store, it's just being used as a container for the harness/direct runner to access and pass it to monitoring. But it's such a small difference that I have no objections to leaving it as-is. I was mainly concerned whether there was another race condition, not so much about style. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389728) Time Spent: 17h 10m (was: 17h) > Fn API metrics in Go SDK harness > > > Key: BEAM-3545 > URL: https://issues.apache.org/jira/browse/BEAM-3545 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Kenneth Knowles >Assignee: Robert Burke >Priority: Major > Labels: portability > Time Spent: 17h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389720 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381610998 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -478,6 +544,19 @@ def __init__(self, options): get_credentials=(not self.google_cloud_options.no_auth), http=http_client, response_encoding=get_response_encoding()) +self._sdk_image_overrides = self._get_sdk_image_overrides(options) + + def _get_sdk_image_overrides(self, pipeline_options): +worker_options = pipeline_options.view_as(WorkerOptions) +sdk_overrides = worker_options.sdk_harness_container_image_overrides +overrides_dict = dict() Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389720) Time Spent: 4h 50m (was: 4h 40m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389714 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381609441 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -301,6 +351,22 @@ def __init__(self, packages, options, environment_version, pipeline_url): dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty( key='display_data', value=to_json_value(items))) + def _get_environments_from_tranforms(self): Review comment: I thought about this and I think it's safer to collect the exact set of environments that we need to execute the pipeline instead of depending on all the environments available in the proto. There's nothing that prevents an external transform (or anywhere else that we update the proto) from including additional environments that are not needed to execute the pipeline. For example, default environment set by Java SDK does not work without replacing the container image (we happen to use the same environment_id for the replacement but could have been otherwise). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389714) Time Spent: 4h 10m (was: 4h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389719 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381611280 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -576,10 +655,25 @@ def create_job(self, job): 'A template was just created at location %s', template_location) return None + def _apply_sdk_environment_overrides(self, proto_pipeline): +# Update environments based on user provided overrides +sdk_overrides = self._sdk_image_overrides +if sdk_overrides: + for environment in proto_pipeline.components.environments.values(): +docker_payload = proto_utils.parse_Bytes( +environment.payload, beam_runner_api_pb2.DockerPayload) +for pattern in sdk_overrides: Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389719) Time Spent: 4h 40m (was: 4.5h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389715 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381607910 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -748,6 +748,17 @@ def _add_argparse_args(cls, parser): 'worker harness. Default is the container for the version of the ' 'SDK. Note: currently, only approved Google Cloud Dataflow ' 'container images may be used here.')) +parser.add_argument( +'--sdk_harness_container_image_overrides', Review comment: Currently this has to be provided for cross-language transforms to work (for Dataflow) since the container URL that we get from Beam does not work for Dataflow. I hope to introduce logic to derive Java container URL from Python in a follow up PR. After this this will be primary for testing or for users to override the container (similar to worker_harness_container_image option today). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389715) Time Spent: 4h 10m (was: 4h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389718 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381626779 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py ## @@ -329,7 +402,7 @@ def test_harness_override_default_in_released_sdks(self): env = apiclient.Environment([], #packages pipeline_options, '2.0.0', #any environment version -FAKE_PIPELINE_URL) +FAKE_PIPELINE_URL, None, None) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389718) Time Spent: 4h 40m (was: 4.5h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389722 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381615673 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -576,10 +655,25 @@ def create_job(self, job): 'A template was just created at location %s', template_location) return None + def _apply_sdk_environment_overrides(self, proto_pipeline): +# Update environments based on user provided overrides +sdk_overrides = self._sdk_image_overrides +if sdk_overrides: + for environment in proto_pipeline.components.environments.values(): +docker_payload = proto_utils.parse_Bytes( +environment.payload, beam_runner_api_pb2.DockerPayload) +for pattern in sdk_overrides: + override = sdk_overrides[pattern] + if re.match(pattern, docker_payload.container_image): +new_payload = beam_runner_api_pb2.DockerPayload( Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389722) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389721 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381619622 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -576,10 +655,25 @@ def create_job(self, job): 'A template was just created at location %s', template_location) return None + def _apply_sdk_environment_overrides(self, proto_pipeline): +# Update environments based on user provided overrides +sdk_overrides = self._sdk_image_overrides +if sdk_overrides: + for environment in proto_pipeline.components.environments.values(): +docker_payload = proto_utils.parse_Bytes( +environment.payload, beam_runner_api_pb2.DockerPayload) +for pattern in sdk_overrides: + override = sdk_overrides[pattern] + if re.match(pattern, docker_payload.container_image): +new_payload = beam_runner_api_pb2.DockerPayload( +container_image=override) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389721) Time Spent: 5h (was: 4h 50m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389716 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381608521 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, pipeline_url): # User agent information. self.proto.userAgent = dataflow.Environment.UserAgentValue() self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint +self._proto_pipeline = proto_pipeline +self._sdk_image_overrides = ( Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389716) Time Spent: 4h 20m (was: 4h 10m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389717 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 20/Feb/20 00:53 Start Date: 20/Feb/20 00:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381610512 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -301,6 +351,22 @@ def __init__(self, packages, options, environment_version, pipeline_url): dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty( key='display_data', value=to_json_value(items))) + def _get_environments_from_tranforms(self): +if not self._proto_pipeline: + return [] +environment_ids = [] +for transform in self._proto_pipeline.components.transforms.values(): + if transform.environment_id not in environment_ids: +environment_ids.append(transform.environment_id) +environments = [] +for environment_id in environment_ids: + if not environment_id: Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389717) Time Spent: 4.5h (was: 4h 20m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness
[ https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389708 ] ASF GitHub Bot logged work on BEAM-3545: Author: ASF GitHub Bot Created on: 20/Feb/20 00:40 Start Date: 20/Feb/20 00:40 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #10906: [BEAM-3545] Fix race condition w/plan metrics. URL: https://github.com/apache/beam/pull/10906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389708) Time Spent: 17h (was: 16h 50m) > Fn API metrics in Go SDK harness > > > Key: BEAM-3545 > URL: https://issues.apache.org/jira/browse/BEAM-3545 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Kenneth Knowles >Assignee: Robert Burke >Priority: Major > Labels: portability > Time Spent: 17h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness
[ https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389707 ] ASF GitHub Bot logged work on BEAM-3545: Author: ASF GitHub Bot Created on: 20/Feb/20 00:40 Start Date: 20/Feb/20 00:40 Worklog Time Spent: 10m Work Description: lostluck commented on issue #10906: [BEAM-3545] Fix race condition w/plan metrics. URL: https://github.com/apache/beam/pull/10906#issuecomment-588550039 It's more that Down shouldn't be called while Execute/Process is running. It's part of the contract for Units (see exec/unit.go) that they aren't required to be concurrency safe. The metrics do mess up this formulation though, since they *must* be accessible outside of the plan execution. I did consider instead simply having the store created outside of the plan, which would mean the harness would need to handle the asynchronous storage/access for it (probably as part of one of the maps in harness). That would probably save a lock. What do you think? There are several more changes coming here for PCollection and PTransform metrics, and while I don't like churn, I'll be following up in another PR anyway, so it can be cleaned up then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389707) Time Spent: 16h 50m (was: 16h 40m) > Fn API metrics in Go SDK harness > > > Key: BEAM-3545 > URL: https://issues.apache.org/jira/browse/BEAM-3545 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Kenneth Knowles >Assignee: Robert Burke >Priority: Major > Labels: portability > Time Spent: 16h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9340) Properly populate pipeline proto requirements.
[ https://issues.apache.org/jira/browse/BEAM-9340?focusedWorklogId=389705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389705 ] ASF GitHub Bot logged work on BEAM-9340: Author: ASF GitHub Bot Created on: 20/Feb/20 00:33 Start Date: 20/Feb/20 00:33 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10909: [BEAM-9340] Populate requirements for Python DoFn properties. URL: https://github.com/apache/beam/pull/10909 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Created] (BEAM-9340) Properly populate pipeline proto requirements.
Robert Bradshaw created BEAM-9340: - Summary: Properly populate pipeline proto requirements. Key: BEAM-9340 URL: https://issues.apache.org/jira/browse/BEAM-9340 Project: Beam Issue Type: New Feature Components: beam-model, sdk-go, sdk-java-core, sdk-py-core Reporter: Robert Bradshaw -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389703 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 20/Feb/20 00:25 Start Date: 20/Feb/20 00:25 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588545387 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389703) Time Spent: 2.5h (was: 2h 20m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389702 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 20/Feb/20 00:24 Start Date: 20/Feb/20 00:24 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588545387 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389702) Time Spent: 2h 20m (was: 2h 10m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389700 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 00:23 Start Date: 20/Feb/20 00:23 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10908: [BEAM-9339] Declare capabilities for Python SDK. URL: https://github.com/apache/beam/pull/10908#issuecomment-588544811 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389700) Time Spent: 20m (was: 10m) > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389698 ] ASF GitHub Bot logged work on BEAM-9339: Author: ASF GitHub Bot Created on: 20/Feb/20 00:22 Start Date: 20/Feb/20 00:22 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10908: [BEAM-9339] Declare capabilities for Python SDK. URL: https://github.com/apache/beam/pull/10908 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389695 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Feb/20 00:14 Start Date: 20/Feb/20 00:14 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381621290 ## File path: sdks/python/apache_beam/runners/interactive/options/capture_control.py ## @@ -0,0 +1,80 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to control how Interactive Beam captures data from sources for +deterministic replayable PCollection evaluation and pipeline runs. + +For internal use only; no backwards-compatibility guarantees. +""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +from datetime import timedelta + +from apache_beam.io.gcp.pubsub import ReadFromPubSub +from apache_beam.runners.interactive import background_caching_job as bcj +from apache_beam.runners.interactive import interactive_environment as ie + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) Review comment: We intend to log at info level for this module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389695) Time Spent: 65h 10m (was: 65h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 65h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=389691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389691 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 20/Feb/20 00:01 Start Date: 20/Feb/20 00:01 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r381617505 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1158,6 +1158,90 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for locally-accessible artifact files. +// payload: ArtifactFilePayload +FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"]; + +// A URN for artifacts described by URLs. +// payload: ArtifactUrlPayload +URL = 1 [(beam_urn) = "beam:artifact:type:url:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"]; + +// A URN for Java artifacts hosted on a Maven repository. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"]; + } + enum Roles { +// A URN for staging-to role. +// payload: ArtifactStagingToRolePayload +STAGING_TO = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"]; + } +} + +message ArtifactFilePayload { + // a string for an artifact path e.g. "/tmp/foo.jar" + string path = 1; + + // The hex-encoded sha256 checksum of the artifact. + string sha256 = 2; +} + +message ArtifactUrlPayload { + // a string for an artifact URL e.g. "https://.../foo.jar; or "gs://tmp/foo.jar" + string path = 1; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389691) Time Spent: 7h 10m (was: 7h) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9339) Declare capabilities in SDK environments
Robert Bradshaw created BEAM-9339: - Summary: Declare capabilities in SDK environments Key: BEAM-9339 URL: https://issues.apache.org/jira/browse/BEAM-9339 Project: Beam Issue Type: New Feature Components: sdk-go, sdk-java-harness, sdk-py-harness Reporter: Robert Bradshaw -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9338) add postcommit XVR spark badge
[ https://issues.apache.org/jira/browse/BEAM-9338?focusedWorklogId=389685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389685 ] ASF GitHub Bot logged work on BEAM-9338: Author: ASF GitHub Bot Created on: 19/Feb/20 23:53 Start Date: 19/Feb/20 23:53 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10907: [BEAM-9338] add postcommit XVR spark badges URL: https://github.com/apache/beam/pull/10907 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9338) add postcommit XVR spark badge
[ https://issues.apache.org/jira/browse/BEAM-9338?focusedWorklogId=389686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389686 ] ASF GitHub Bot logged work on BEAM-9338: Author: ASF GitHub Bot Created on: 19/Feb/20 23:53 Start Date: 19/Feb/20 23:53 Worklog Time Spent: 10m Work Description: ihji commented on issue #10907: [BEAM-9338] add postcommit XVR spark badges URL: https://github.com/apache/beam/pull/10907#issuecomment-588535760 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389686) Time Spent: 20m (was: 10m) > add postcommit XVR spark badge > -- > > Key: BEAM-9338 > URL: https://issues.apache.org/jira/browse/BEAM-9338 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > add postcommit xvr spark badges -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=389683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389683 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 19/Feb/20 23:52 Start Date: 19/Feb/20 23:52 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r381614435 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1158,6 +1158,90 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for locally-accessible artifact files. +// payload: ArtifactFilePayload +FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"]; + +// A URN for artifacts described by URLs. +// payload: ArtifactUrlPayload +URL = 1 [(beam_urn) = "beam:artifact:type:url:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"]; + +// A URN for Java artifacts hosted on a Maven repository. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"]; + } + enum Roles { +// A URN for staging-to role. +// payload: ArtifactStagingToRolePayload +STAGING_TO = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"]; + } +} + +message ArtifactFilePayload { + // a string for an artifact path e.g. "/tmp/foo.jar" + string path = 1; + + // The hex-encoded sha256 checksum of the artifact. + string sha256 = 2; +} + +message ArtifactUrlPayload { + // a string for an artifact URL e.g. "https://.../foo.jar; or "gs://tmp/foo.jar" + string path = 1; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; Review comment: We could consider just letting the payload be the raw bytes (and similarly for other one-field payloads). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389683) Time Spent: 7h (was: 6h 50m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9338) add postcommit XVR spark badge
[ https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9338: -- Summary: add postcommit XVR spark badge (was: add postcommit XVR spark tickers) > add postcommit XVR spark badge > -- > > Key: BEAM-9338 > URL: https://issues.apache.org/jira/browse/BEAM-9338 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > add postcommit xvr spark tickers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9338) add postcommit XVR spark badge
[ https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9338: -- Description: add postcommit xvr spark badges (was: add postcommit xvr spark tickers) > add postcommit XVR spark badge > -- > > Key: BEAM-9338 > URL: https://issues.apache.org/jira/browse/BEAM-9338 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > add postcommit xvr spark badges -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9338) add postcommit XVR spark tickers
Heejong Lee created BEAM-9338: - Summary: add postcommit XVR spark tickers Key: BEAM-9338 URL: https://issues.apache.org/jira/browse/BEAM-9338 Project: Beam Issue Type: Bug Components: website Reporter: Heejong Lee Assignee: Heejong Lee add postcommit xvr spark tickers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9338) add postcommit XVR spark tickers
[ https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9338: -- Status: Open (was: Triage Needed) > add postcommit XVR spark tickers > > > Key: BEAM-9338 > URL: https://issues.apache.org/jira/browse/BEAM-9338 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > add postcommit xvr spark tickers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9338) add postcommit XVR spark tickers
[ https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9338: -- Issue Type: Improvement (was: Bug) > add postcommit XVR spark tickers > > > Key: BEAM-9338 > URL: https://issues.apache.org/jira/browse/BEAM-9338 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > add postcommit xvr spark tickers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness
[ https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389678 ] ASF GitHub Bot logged work on BEAM-3545: Author: ASF GitHub Bot Created on: 19/Feb/20 23:34 Start Date: 19/Feb/20 23:34 Worklog Time Spent: 10m Work Description: lostluck commented on issue #10906: [BEAM-3545] Fix race condition w/plan metrics. URL: https://github.com/apache/beam/pull/10906#issuecomment-588529911 R: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389678) Time Spent: 16h 40m (was: 16.5h) > Fn API metrics in Go SDK harness > > > Key: BEAM-3545 > URL: https://issues.apache.org/jira/browse/BEAM-3545 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Kenneth Knowles >Assignee: Robert Burke >Priority: Major > Labels: portability > Time Spent: 16h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness
[ https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389677 ] ASF GitHub Bot logged work on BEAM-3545: Author: ASF GitHub Bot Created on: 19/Feb/20 23:34 Start Date: 19/Feb/20 23:34 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #10906: [BEAM-3545] Fix race condition w/plan metrics. URL: https://github.com/apache/beam/pull/10906 The last change introduced a race condition when initializing a plan. There's a small, but still open window where the progress reporting goroutine might try to read the value while it's being written to. A mutex clears this up. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Resolved] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-3306. Fix Version/s: Not applicable Resolution: Fixed Go supports a coder registry w/beam.RegisterCoder Remaining work might be to optionally support "direct" access to an io.Reader or io.Writer interface which could yield efficiency gains in some situations for user types. > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=389672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389672 ] ASF GitHub Bot logged work on BEAM-9287: Author: ASF GitHub Bot Created on: 19/Feb/20 22:58 Start Date: 19/Feb/20 22:58 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10863: [BEAM-9287] Add Python streaming Validates runner tests for Unified Worker URL: https://github.com/apache/beam/pull/10863 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389672) Time Spent: 1h 10m (was: 1h) > Python Validates runner tests for Unified Worker > > > Key: BEAM-9287 > URL: https://issues.apache.org/jira/browse/BEAM-9287 > Project: Beam > Issue Type: Test > Components: runner-dataflow, testing >Reporter: Ankur Goenka >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=389671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389671 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 19/Feb/20 22:55 Start Date: 19/Feb/20 22:55 Worklog Time Spent: 10m Work Description: ihji commented on issue #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#issuecomment-588516398 @robertwb @lukecwik PTAL. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389671) Time Spent: 6h 50m (was: 6h 40m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389670 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 22:55 Start Date: 19/Feb/20 22:55 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381594919 ## File path: sdks/python/apache_beam/runners/interactive/options/capture_control.py ## @@ -0,0 +1,80 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to control how Interactive Beam captures data from sources for +deterministic replayable PCollection evaluation and pipeline runs. + +For internal use only; no backwards-compatibility guarantees. +""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +from datetime import timedelta + +from apache_beam.io.gcp.pubsub import ReadFromPubSub +from apache_beam.runners.interactive import background_caching_job as bcj +from apache_beam.runners.interactive import interactive_environment as ie + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) + + +class CaptureControl(object): + """Options and their utilities that controls how Interactive Beam captures + deterministic replayable data from sources.""" + def __init__(self): +self._enable_capture_replay = True +self._capturable_sources = { +ReadFromPubSub, +} # yapf: disable Review comment: Removing the disable statement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389670) Time Spent: 65h (was: 64h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 65h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4032) Support staging binary distributions of dependency packages.
[ https://issues.apache.org/jira/browse/BEAM-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040489#comment-17040489 ] Ankur Goenka commented on BEAM-4032: This will become less of a concern when we start supporting custom containers as the binaries will be preinstalled on it. > Support staging binary distributions of dependency packages. > > > Key: BEAM-4032 > URL: https://issues.apache.org/jira/browse/BEAM-4032 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Major > > requirements.txt only supports source-distribution dependencies [1]. > --extra_packages does not officially support wheel files [2]. > It is possible to expand this to support binary distributions as long as we > have the knowledge of the target platform. > We should take into consideration the mechanisms of staging dependencies > through portability framework, and perhaps consolidate some of the existing > options. > [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L260] > [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L188] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389667 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 22:49 Start Date: 19/Feb/20 22:49 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381592605 ## File path: sdks/python/apache_beam/runners/interactive/interactive_environment.py ## @@ -130,6 +139,17 @@ def __init__(self, cache_manager=None): 'You have limited Interactive Beam features since your ' 'ipython kernel is not connected any notebook frontend.') + @property + def options(self): +"""A reference to the global interactive options. + +Provided to avoid import loop or excessive dynamic import. All internal +Interactive Beam modules should access interactive_beam.options through +this property. +""" +from apache_beam.runners.interactive.interactive_beam import options +return options Review comment: The `options` instantiated in `interactive_beam` is to expose the `getters` and `setters` of configurable fields such as `enable_capture_replay`, `capture_duration` and `capturable_sources` to the `interactive beam` user. The user has all necessary `getters`, `setters` and docstrings by looking at the `interactive_beam` module without the complexity of their underlying implementation details (such as `__repr__`, and how options and their utilities are grouped together). Then inside all internal `interactive beam` modules, to access the fields configured by the user, since we don't want any module to depend on `interactive_beam` module to avoid import loops, we can only do dynamic importing. It's going to be messy if we just dynamic import `interactive_beam` (the module that is supposed to be used by end user) everywhere. So this property (no `setter` given) does the dynamic import once in this single place and all internal modules will depend on `interactive_environment` module to access whatever configuration the user might have set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389667) Time Spent: 64h 50m (was: 64h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 64h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389666 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 22:40 Start Date: 19/Feb/20 22:40 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381588614 ## File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py ## @@ -34,6 +34,58 @@ from __future__ import absolute_import from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.options import interactive_options + + +class Options(interactive_options.InteractiveOptions): + """Options that guide how Interactive Beam works.""" + @property + def enable_capture_replay(self): +"""Whether replayable source data capture should be replayed for multiple +PCollection evaluations and pipeline runs as long as the data captured is +still valid.""" +return self.capture_control._enable_capture_replay + + @enable_capture_replay.setter + def enable_capture_replay(self, value): +"""Sets whether source data capture should be replayed. True - Enables +capture of replayable source data so that following PCollection evaluations +and pipeline runs always use the same data captured; False - Disables +capture of replayable source data so that following PCollection evaluation +and pipeline runs always use new data from sources.""" +self.capture_control._enable_capture_replay = value + + @property + def capturable_sources(self): +"""Interactive Beam automatically captures data from sources in this set.""" +return self.capture_control._capturable_sources + + @property + def capture_duration(self): +"""The data capture of sources ends as soon as the background caching job +has run for this long.""" +return self.capture_control._capture_duration + + @capture_duration.setter + def capture_duration(self, value): +"""Sets the capture duration as a timedelta. + +Example:: + + # Sets the capture duration limit to 10 seconds. + interactive_beam.options.capture_duration = timedelta(seconds=10) + # Evicts all captured data if there is any. + interactive_beam.evict_captured_data() + # The next PCollection evaluation will capture fresh data from sources, + # and the data captured will be replayed until another eviction. +""" +self.capture_control._capture_duration = value + + # TODO(BEAM-8335): add capture_size options when they are supported. + + +# Users can set options to guide how Interactive Beam works. +options = Options() Review comment: Will add ``` Example: from datetime import timedelta from apache_beam.runners.interactive import interactive_beam as ib ib.options.enable_capture_replay = False/True ib.options.capture_duration = timedelta(seconds=60) ib.options.capturable_sources.add(SourceClass) ``` to the comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389666) Time Spent: 64h 40m (was: 64.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 64h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389665 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 22:36 Start Date: 19/Feb/20 22:36 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381586709 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -132,7 +256,22 @@ def is_source_to_cache_changed(user_pipeline): is_changed = not current_signature.issubset(recorded_signature) # The computation of extract_unbounded_source_signature is expensive, track on # change by default. - if is_changed: + if is_changed and update_cached_source_signature: +if ie.current_env().options.enable_capture_replay: + if not recorded_signature: +_LOGGER.info( +'Interactive Beam has detected you have unbounded sources ' +'in your pipeline. In order to have a deterministic replay ' +'of your pipeline: {}'.format( Review comment: `ie.current_env().options.capture_control` is formatted the same to the other case. Do you think we should make its first letter lower case for this scenario? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389665) Time Spent: 64.5h (was: 64h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 64.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389664 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 22:32 Start Date: 19/Feb/20 22:32 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381585204 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) + + +class BackgroundCachingJob(object): + """A simple abstraction that controls necessary components of a timed and + [disk] space limited background caching job. + + A background caching job successfully terminates in 2 conditions: + +#. The job is finite and runs into DONE state; +#. The job is infinite but hits an interactive_beam.options configured limit + and gets cancelled into CANCELLED/CANCELLING state. + + In both situations, the background caching job should be treated as done + successfully. + """ + def __init__(self, pipeline_result, start_limit_checkers=True): +self._pipeline_result = pipeline_result +self._timer = threading.Timer( Review comment: Thanks! Yes, we should in case the notebook is shutdown abruptly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389664) Time Spent: 64h 20m (was: 64h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 64h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable
[ https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389662 ] ASF GitHub Bot logged work on BEAM-8280: Author: ASF GitHub Bot Created on: 19/Feb/20 22:27 Start Date: 19/Feb/20 22:27 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10894: [BEAM-8280] Enable and improve IOTypeHints debug_str traceback URL: https://github.com/apache/beam/pull/10894#discussion_r381580646 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -849,8 +854,9 @@ def element_type(side_input): if not typehints.is_consistent_with(bindings.get(arg, typehints.Any), hint): raise TypeCheckError( - 'Type hint violation for \'%s\': requires %s but got %s for %s' % - (self.label, hint, bindings[arg], arg)) + 'Type hint violation for \'%s\': requires %s but got %s for %s\n' + 'Full type hint:\n%s' % + (self.label, hint, bindings[arg], arg, type_hints.debug_str())) Review comment: nit: with this many args, either using `.format` with named params or `%(name)d` with an arg dict would be more readable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389662) Time Spent: 5h 20m (was: 5h 10m) > re-enable IOTypeHints.from_callable > --- > > Key: BEAM-8280 > URL: https://issues.apache.org/jira/browse/BEAM-8280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/BEAM-8279 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input
[ https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389661 ] ASF GitHub Bot logged work on BEAM-9326: Author: ASF GitHub Bot Created on: 19/Feb/20 22:25 Start Date: 19/Feb/20 22:25 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10879: [BEAM-9326] Make JsonToRow transform input instead of URL: https://github.com/apache/beam/pull/10879#issuecomment-588504481 Yes it is already reverted, for the `PipelineTest` class, for `JsonToRow` we are good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389661) Time Spent: 1h 10m (was: 1h) > JsonToRow transform should not use bounded Wildcards for its input > -- > > Key: BEAM-9326 > URL: https://issues.apache.org/jira/browse/BEAM-9326 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The JsonToRow PTransform input is a String (a final class in Java) so no > reason > to define a bounded wildcard as its argument. > We should use in Beam's codebase only when required by Java > Generics constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input
[ https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389658 ] ASF GitHub Bot logged work on BEAM-9326: Author: ASF GitHub Bot Created on: 19/Feb/20 22:14 Start Date: 19/Feb/20 22:14 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10879: [BEAM-9326] Make JsonToRow transform input instead of URL: https://github.com/apache/beam/pull/10879#issuecomment-588499498 Yikes! If removing `` makes it impossible to treat this class as a `PTransform` then I think this change should be reverted. Indeed I wondered if there was something funky going to happy in Java's type checking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389658) Time Spent: 1h (was: 50m) > JsonToRow transform should not use bounded Wildcards for its input > -- > > Key: BEAM-9326 > URL: https://issues.apache.org/jira/browse/BEAM-9326 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The JsonToRow PTransform input is a String (a final class in Java) so no > reason > to define a bounded wildcard as its argument. > We should use in Beam's codebase only when required by Java > Generics constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9337) DataflowPipelineJob.waitUntilFinish() crashes when it has created a template.
Kenneth Knowles created BEAM-9337: - Summary: DataflowPipelineJob.waitUntilFinish() crashes when it has created a template. Key: BEAM-9337 URL: https://issues.apache.org/jira/browse/BEAM-9337 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Kenneth Knowles Assignee: Yunqing Zhou {code:java} INFO: Template successfully created. Exception in thread "main" java.lang.UnsupportedOperationException: The result of template creation should not be used. at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getJobId(DataflowTemplateJob.java:37) at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:524) at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:506) at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:295) at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:224) at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:183) at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:176) {code} This is a real error. If a template was created, the job is complete. Instead of crashing by tried to access the job id, as though {{DataflowPipelineJob}} doesn't know it made a template, it should instead return successfully. Or perhaps there is another design choice. But just crashes does not make sense. Probably {{DataflowRunner}} should not return a {{DataflowPipelineJob}} at all in this way. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389654 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 19/Feb/20 21:55 Start Date: 19/Feb/20 21:55 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r381568266 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -994,7 +1069,13 @@ def input_for(transform_id, input_id): # The worker will be waiting on these inputs as well. for other_input in data_input: if other_input not in deferred_inputs: -deferred_inputs[other_input] = _ListBuffer([]) +outputs = process_bundle_descriptor.transforms[ + other_input].outputs.values() +coder_id = process_bundle_descriptor.pcollections[ + only_element(outputs)].coder_id +coder = context.coders[coder_id] +deferred_inputs[other_input] = _ListBuffer( +coder_impl=coder.get_impl()) Review comment: As commented at L1082 (of the PR branch), deferred inputs cannot be parallel processed for now. Is it better to set coder_impl to None to reduce unnecessary processes for now and add it back later when parallel processing is supported for deferred_inputs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389654) Time Spent: 2h 20m (was: 2h 10m) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389653 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:53 Start Date: 19/Feb/20 21:53 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381567204 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) + + +class BackgroundCachingJob(object): + """A simple abstraction that controls necessary components of a timed and + [disk] space limited background caching job. + + A background caching job successfully terminates in 2 conditions: + +#. The job is finite and runs into DONE state; Review comment: Those are considered as **not** `successfully` terminates. Let me reword it into `A background caching job successfully complete source data capture in 2 conditions`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389653) Time Spent: 64h 10m (was: 64h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 64h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=389652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389652 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 19/Feb/20 21:49 Start Date: 19/Feb/20 21:49 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#issuecomment-588489213 Kindly pinging : ) If everything looks fine, I'll squash them into one commit before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389652) Time Spent: 16h 50m (was: 16h 40m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389651 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:48 Start Date: 19/Feb/20 21:48 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381564546 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) + + +class BackgroundCachingJob(object): + """A simple abstraction that controls necessary components of a timed and + [disk] space limited background caching job. Review comment: This is to indicate the captured data could be on-disk, or in other mediums such as in-memory (some testing cache manager implementation). Let me just remove the `[disk]` to avoid the confusion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389651) Time Spent: 64h (was: 63h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 64h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389650 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:45 Start Date: 19/Feb/20 21:45 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381563024 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) Review comment: We plan to log at INFO level for this module. Since each module in Beam runners has its own logger (since [PR](https://github.com/apache/beam/commit/49d6efdd2a59652462228e3e2b353bcc4173554b)), this overriding is intended and has limited effect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389650) Time Spent: 63h 50m (was: 63h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389648 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:42 Start Date: 19/Feb/20 21:42 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381561361 ## File path: sdks/python/apache_beam/runners/interactive/options/capture_control.py ## @@ -15,6 +15,16 @@ # limitations under the License. # +"""Module to control how Interactive Beam captures data from sources for +deterministic replayable PCollection evaluation and pipeline runs. + +For internal use only; no backwards-compatibility guarantees. +""" + +# pytype: skip-file Review comment: According to Boyuan and the [PR](https://github.com/apache/beam/commit/7547ac6b273e6e2ffe7d69775606e14c0fd455b2), type checking is still in development, not stable and may cause surprised failures. So it's added to all py files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389648) Time Spent: 63h 40m (was: 63.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389641 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381549837 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -171,13 +180,14 @@ class TestStream(PTransform): time. After all of the specified elements are emitted, ceases to produce output. """ - def __init__(self, coder=coders.FastPrimitivesCoder(), events=None): + def __init__( + self, coder=coders.FastPrimitivesCoder(), events=None, output_tags=None): super(TestStream, self).__init__() assert coder is not None self.coder = coder self.watermarks = {None: timestamp.MIN_TIMESTAMP} self._events = [] if events is None else list(events) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389641) Time Spent: 63h 20m (was: 63h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389642 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381552740 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -276,8 +295,11 @@ def to_runner_api_parameter(self, context): @PTransform.register_urn( common_urns.primitives.TEST_STREAM.urn, beam_runner_api_pb2.TestStreamPayload) - def from_runner_api_parameter(payload, context): + def from_runner_api_parameter(ptransform, payload, context): coder = context.coders.get_by_id(payload.coder_id) +output_tags = set( +None if k == 'None' else k for k in ptransform.outputs.keys()) Review comment: This behavior isn't unique in the TestStream. This is consistent with the rest of the Python SDK with handling PCollection tags being None. As for what happens, I don't know, there are probably many subtle things that may go wrong. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389642) Time Spent: 63.5h (was: 63h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389637 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381538464 ## File path: sdks/python/apache_beam/runners/direct/direct_runner.py ## @@ -73,60 +73,12 @@ class SwitchingDirectRunner(PipelineRunner): def is_fnapi_compatible(self): return BundleBasedDirectRunner.is_fnapi_compatible() - def apply_TestStream(self, transform, pbegin, options): Review comment: thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389637) Time Spent: 62h 50m (was: 62h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 62h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389644 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381551312 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -188,7 +198,16 @@ def _infer_output_coder(self, input_type=None, input_coder=None): def expand(self, pbegin): assert isinstance(pbegin, pvalue.PBegin) self.pipeline = pbegin.pipeline -return pvalue.PCollection(self.pipeline, is_bounded=False) +if len(self.output_tags) == 0: Review comment: Gotcha, changed to ```if not self.output_tags``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389644) Time Spent: 63.5h (was: 63h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389640 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381543027 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -133,15 +140,17 @@ def __eq__(self, other): return self.new_watermark == other.new_watermark and self.tag == other.tag def __hash__(self): -return hash(self.new_watermark) +return hash(str(self.new_watermark) + str(self.tag)) def __lt__(self, other): return self.new_watermark < other.new_watermark def to_runner_api(self, unused_element_coder): +tag = 'None' if self.tag is None else self.tag Review comment: Looking through the codebase, it seems that 'None' is the special keyword used in the Python SDK to represent a tag that is specifically None. This keeps with the current style of the rest of the Python SDK. @lukecwik is this true? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389640) Time Spent: 63h 10m (was: 63h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389638 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381538392 ## File path: sdks/python/apache_beam/runners/direct/test_stream_impl.py ## @@ -45,11 +46,69 @@ class _WatermarkController(PTransform): - If the instance receives an ElementEvent, it emits all specified elements to the Global Window with the event time set to the element's timestamp. """ + def __init__(self, output_tag): +self.output_tag = output_tag + def get_windowing(self, _): return core.Windowing(window.GlobalWindows()) def expand(self, pcoll): -return pvalue.PCollection.from_(pcoll) +ret = pvalue.PCollection.from_(pcoll) +ret.tag = self.output_tag +return ret + + +class _ExpandableTestStream(PTransform): + def __init__(self, test_stream): +self.test_stream = test_stream + + def expand(self, pbegin): +"""Expands the TestStream into the DirectRunner implementation. + + +Takes the TestStream transform and creates a _TestStream -> multiplexer -> +_WatermarkController. +""" + +assert isinstance(pbegin, pvalue.PBegin) + +# If there is only one tag there is no need to add the multiplexer. +if len(self.test_stream.output_tags) == 1: + return ( + pbegin + | _TestStream( + self.test_stream.output_tags, + events=self.test_stream._events, + coder=self.test_stream.coder) + | _WatermarkController(list(self.test_stream.output_tags)[0])) + +# This multiplexing the multiple output PCollections. Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389638) Time Spent: 62h 50m (was: 62h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 62h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389645 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381553044 ## File path: sdks/python/apache_beam/testing/test_stream_test.py ## @@ -528,6 +529,56 @@ def process( p.run() + def test_roundtrip_proto(self): +test_stream = (TestStream() + .advance_processing_time(1) + .advance_watermark_to(2) + .add_elements([1, 2, 3])) # yapf: disable + +p = TestPipeline(options=StandardOptions(streaming=True)) +p | test_stream + +pipeline_proto, context = p.to_runner_api(return_context=True) + +for t in pipeline_proto.components.transforms.values(): + if t.spec.urn == common_urns.primitives.TEST_STREAM.urn: +test_stream_proto = t + +self.assertTrue(test_stream_proto) +roundtrip_test_stream = TestStream().from_runner_api( +test_stream_proto, context) + +self.assertListEqual(test_stream._events, roundtrip_test_stream._events) +self.assertSetEqual( +test_stream.output_tags, roundtrip_test_stream.output_tags) +self.assertEqual(test_stream.coder, roundtrip_test_stream.coder) + + def test_roundtrip_proto_multi(self): +test_stream = (TestStream(output_tags=['a', 'b']) Review comment: Gotcha, removed the output_tags This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389645) Time Spent: 63.5h (was: 63h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389639 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381545149 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -171,13 +180,14 @@ class TestStream(PTransform): time. After all of the specified elements are emitted, ceases to produce output. """ Review comment: > Are you trying to allow for outputs that have no events, otherwise shouldn't the tags come from the list of events? Yep! The TestStreamService will allow users to define a TestStream with the output_tags specified at creation time and the events supplied at runtime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389639) Time Spent: 63h (was: 62h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389643 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381557950 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -133,15 +140,17 @@ def __eq__(self, other): return self.new_watermark == other.new_watermark and self.tag == other.tag def __hash__(self): -return hash(self.new_watermark) +return hash(str(self.new_watermark) + str(self.tag)) def __lt__(self, other): return self.new_watermark < other.new_watermark def to_runner_api(self, unused_element_coder): +tag = 'None' if self.tag is None else self.tag return beam_runner_api_pb2.TestStreamPayload.Event( watermark_event=beam_runner_api_pb2.TestStreamPayload.Event. -AdvanceWatermark(new_watermark=self.new_watermark.micros // 1000)) +AdvanceWatermark( +new_watermark=self.new_watermark.micros // 1000, tag=tag)) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389643) Time Spent: 63.5h (was: 63h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389646 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:38 Start Date: 19/Feb/20 21:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892#discussion_r381551914 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -188,7 +198,16 @@ def _infer_output_coder(self, input_type=None, input_coder=None): def expand(self, pbegin): assert isinstance(pbegin, pvalue.PBegin) self.pipeline = pbegin.pipeline -return pvalue.PCollection(self.pipeline, is_bounded=False) +if len(self.output_tags) == 0: + self.output_tags = set([None]) + +if len(self.output_tags) == 1: Review comment: For backwards compatibility, existing code already relies on a single PCollection being returned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389646) Time Spent: 63.5h (was: 63h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 63.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389634 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 19/Feb/20 21:31 Start Date: 19/Feb/20 21:31 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389634) Time Spent: 54h 50m (was: 54h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 54h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=389633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389633 ] ASF GitHub Bot logged work on BEAM-9287: Author: ASF GitHub Bot Created on: 19/Feb/20 21:15 Start Date: 19/Feb/20 21:15 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10863: [BEAM-9287] Add Python streaming Validates runner tests for Unified Worker URL: https://github.com/apache/beam/pull/10863#issuecomment-588474305 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389633) Time Spent: 1h (was: 50m) > Python Validates runner tests for Unified Worker > > > Key: BEAM-9287 > URL: https://issues.apache.org/jira/browse/BEAM-9287 > Project: Beam > Issue Type: Test > Components: runner-dataflow, testing >Reporter: Ankur Goenka >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389632 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 19/Feb/20 21:14 Start Date: 19/Feb/20 21:14 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-588473555 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389632) Time Spent: 54h 40m (was: 54.5h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 54h 40m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389626 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:12 Start Date: 19/Feb/20 21:12 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381546032 ## File path: sdks/python/apache_beam/runners/interactive/interactive_environment.py ## @@ -130,6 +139,17 @@ def __init__(self, cache_manager=None): 'You have limited Interactive Beam features since your ' 'ipython kernel is not connected any notebook frontend.') + @property + def options(self): +"""A reference to the global interactive options. + +Provided to avoid import loop or excessive dynamic import. All internal +Interactive Beam modules should access interactive_beam.options through +this property. +""" +from apache_beam.runners.interactive.interactive_beam import options +return options Review comment: Why is the options defined globally in a different file, but setter is here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389626) Time Spent: 62h 20m (was: 62h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 62h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389625 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 21:12 Start Date: 19/Feb/20 21:12 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#discussion_r381537314 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -19,29 +19,113 @@ For internal use only; no backwards-compatibility guarantees. -A background caching job is a job that caches events for all unbounded sources -of a given pipeline. With Interactive Beam, one such job is started when a -pipeline run happens (which produces a main job in contrast to the background +A background caching job is a job that captures events for all capturable +sources of a given pipeline. With Interactive Beam, one such job is started when +a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions: - #. The pipeline contains unbounded sources. + #. The pipeline contains capturable sources, configured through + interactive_beam.options.capturable_sources. #. No such background job is running. #. No such background job has completed successfully and the cached events are - still valid (invalidated when unbounded sources change in the pipeline). + still valid (invalidated when capturable sources change in the pipeline). Once started, the background caching job runs asynchronously until it hits some -cache size limit. Meanwhile, the main job and future main jobs from the pipeline -will run using the deterministic replay-able cached events until they are -invalidated. +capture limit configured in interactive_beam.options. Meanwhile, the main job +and future main jobs from the pipeline will run using the deterministic +replayable captured events until they are invalidated. """ # pytype: skip-file from __future__ import absolute_import +import logging +import threading +import time + import apache_beam as beam -from apache_beam import runners from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive.caching import streaming_cache +from apache_beam.runners.runner import PipelineState + +_LOGGER = logging.getLogger(__name__) +_LOGGER.setLevel(logging.INFO) Review comment: Is this needed? This will override otherthings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389625) Time Spent: 62h 20m (was: 62h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 62h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)