[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=389828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389828
 ]

ASF GitHub Bot logged work on BEAM-9321:


Author: ASF GitHub Bot
Created on: 20/Feb/20 07:17
Start Date: 20/Feb/20 07:17
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10869: [BEAM-9321] Add 
BigQuery Avro logical type support on write
URL: https://github.com/apache/beam/pull/10869#issuecomment-588657689
 
 
   I'm not qualified to give my ok, but I have a question: it there a reason 
why it's a an option, shouldn't it always handle the logical type?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389828)
Time Spent: 50m  (was: 40m)

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=389826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389826
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 20/Feb/20 07:12
Start Date: 20/Feb/20 07:12
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10502: [BEAM-7274] Add 
DynamicMessage Schema support
URL: https://github.com/apache/beam/pull/10502#issuecomment-588652660
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389826)
Time Spent: 24h 20m  (was: 24h 10m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 24h 20m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389780
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:55
Start Date: 20/Feb/20 03:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#issuecomment-588596733
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389780)
Time Spent: 47h 40m  (was: 47.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 47h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389782
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:55
Start Date: 20/Feb/20 03:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#issuecomment-588596837
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389782)
Time Spent: 48h  (was: 47h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389781
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:55
Start Date: 20/Feb/20 03:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#issuecomment-588596780
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389781)
Time Spent: 47h 50m  (was: 47h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 47h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389779
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:54
Start Date: 20/Feb/20 03:54
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381723978
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -130,6 +139,17 @@ def __init__(self, cache_manager=None):
   'You have limited Interactive Beam features since your '
   'ipython kernel is not connected any notebook frontend.')
 
+  @property
+  def options(self):
+"""A reference to the global interactive options.
+
+Provided to avoid import loop or excessive dynamic import. All internal
+Interactive Beam modules should access interactive_beam.options through
+this property.
+"""
+from apache_beam.runners.interactive.interactive_beam import options
+return options
 
 Review comment:
   If all the getters/setters are available in `interactive_beam`, why do the 
user need to use this setter here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389779)
Time Spent: 66h 10m  (was: 66h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 66h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389778
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:52
Start Date: 20/Feb/20 03:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381723166
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -132,7 +256,22 @@ def is_source_to_cache_changed(user_pipeline):
   is_changed = not current_signature.issubset(recorded_signature)
   # The computation of extract_unbounded_source_signature is expensive, track 
on
   # change by default.
-  if is_changed:
+  if is_changed and update_cached_source_signature:
+if ie.current_env().options.enable_capture_replay:
+  if not recorded_signature:
+_LOGGER.info(
+'Interactive Beam has detected you have unbounded sources '
+'in your pipeline. In order to have a deterministic replay '
+'of your pipeline: {}'.format(
 
 Review comment:
   I think both cases should result in a complete sentence.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389778)
Time Spent: 66h  (was: 65h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 66h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389776
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:52
Start Date: 20/Feb/20 03:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381722820
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class BackgroundCachingJob(object):
+  """A simple abstraction that controls necessary components of a timed and
+  [disk] space limited background caching job.
+
+  A background caching job successfully terminates in 2 conditions:
+
+#. The job is finite and runs into DONE state;
 
 Review comment:
   OK
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389776)
Time Spent: 65h 40m  (was: 65.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 65h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389777
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:52
Start Date: 20/Feb/20 03:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381722850
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class BackgroundCachingJob(object):
+  """A simple abstraction that controls necessary components of a timed and
+  [disk] space limited background caching job.
 
 Review comment:
   OK
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389777)
Time Spent: 65h 50m  (was: 65h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 65h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389775
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:51
Start Date: 20/Feb/20 03:51
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381722658
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
 
 Review comment:
   It has limited effect but it still overrides user choice. Presumably we can 
have a pipeline option that can set logging level per module level.
   
   Also, is not the default logging level info anway?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389775)
Time Spent: 65.5h  (was: 65h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 65.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389774
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:49
Start Date: 20/Feb/20 03:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381721648
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/capture_control.py
 ##
 @@ -15,6 +15,16 @@
 # limitations under the License.
 #
 
+"""Module to control how Interactive Beam captures data from sources for
+deterministic replayable PCollection evaluation and pipeline runs.
+
+For internal use only; no backwards-compatibility guarantees.
+"""
+
+# pytype: skip-file
 
 Review comment:
   Yes, ack. Confused this with mypy.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389774)
Time Spent: 65h 20m  (was: 65h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 65h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9335) update hard-coded coder id when translating Java external transforms

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9335?focusedWorklogId=389773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389773
 ]

ASF GitHub Bot logged work on BEAM-9335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:47
Start Date: 20/Feb/20 03:47
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10900: [BEAM-9335] update 
hard-coded coder id when translating Java external transforms
URL: https://github.com/apache/beam/pull/10900#issuecomment-588595193
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389773)
Time Spent: 20m  (was: 10m)

> update hard-coded coder id when translating Java external transforms
> 
>
> Key: BEAM-9335
> URL: https://issues.apache.org/jira/browse/BEAM-9335
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> hard-coded coder id needs to be updated when translating Java external 
> transforms. Otherwise pipeline will fail if coder id is reused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389772
 ]

ASF GitHub Bot logged work on BEAM-9341:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:45
Start Date: 20/Feb/20 03:45
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10912: [BEAM-9341] postcommit 
xvr flink fix
URL: https://github.com/apache/beam/pull/10912#issuecomment-588594812
 
 
   Ready to merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389772)
Time Spent: 1h  (was: 50m)

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9321) BigQuery avro write logical type support

2020-02-19 Thread Filipe Regadas (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Regadas updated BEAM-9321:
-
Priority: Major  (was: Minor)

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=389767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389767
 ]

ASF GitHub Bot logged work on BEAM-9321:


Author: ASF GitHub Bot
Created on: 20/Feb/20 03:24
Start Date: 20/Feb/20 03:24
Worklog Time Spent: 10m 
  Work Description: regadas commented on issue #10869: [BEAM-9321] Add 
BigQuery Avro logical type support on write
URL: https://github.com/apache/beam/pull/10869#issuecomment-588590233
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389767)
Time Spent: 40m  (was: 0.5h)

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389762
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:47
Start Date: 20/Feb/20 02:47
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10908: 
[BEAM-9339] Declare capabilities for Python SDK.
URL: https://github.com/apache/beam/pull/10908#discussion_r381688154
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1209,6 +1209,30 @@ message ExternalPayload {
   map params = 2;  // Arbitrary extra parameters to pass
 }
 
+// These URNs are used to indicate capabilities of environments that cannot
+// simply be expressed as a component (such as a Coder or PTransform) that this
+// environment understands.
+message StandardProtocols {
+  enum Enum {
+// Indicates suport for progress reporting via the legacy metric APIs.
+LEGACY_PROGRESS_REPORTING = 0 [(beam_urn) = 
"beam:protocol:progress_reporting:v0"];
+
+// Indicates suport for progress reporting via the new metric APIs.
 
 Review comment:
   nit: should we add a reference to clarify what old/new metrics are ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389762)
Time Spent: 1h  (was: 50m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389763
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:47
Start Date: 20/Feb/20 02:47
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10908: 
[BEAM-9339] Declare capabilities for Python SDK.
URL: https://github.com/apache/beam/pull/10908#discussion_r381690351
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments_test.py
 ##
 @@ -53,10 +63,14 @@ def test_environment_encoding(self):
 state_cache_size=0, data_buffer_time_limit_ms=0),
 SubprocessSDKEnvironment(command_string=u'foö')):
   context = pipeline_context.PipelineContext()
-  self.assertEqual(
-  environment,
-  Environment.from_runner_api(
-  environment.to_runner_api(context), context))
+  proto = environment.to_runner_api(context)
+  reconstructed = Environment.from_runner_api(proto, context)
+  self.assertEqual(environment, reconstructed)
+  self.assertEqual(proto, reconstructed.to_runner_api(context))
+
+  def test_sdk_capabilities(self):
+sdk_capabilities = environments.python_sdk_capabilities()
+self.assertIn(common_urns.coders.LENGTH_PREFIX.urn, sdk_capabilities)
 
 Review comment:
   Probably test for every expected capability here ? And also force failure if 
the test has not been updated to reflect a new capability ?
   
   The test can be updated as the list of capabilities are updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389763)
Time Spent: 1h  (was: 50m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9338) add postcommit XVR spark badge

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?focusedWorklogId=389760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389760
 ]

ASF GitHub Bot logged work on BEAM-9338:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:37
Start Date: 20/Feb/20 02:37
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10907: [BEAM-9338] 
add postcommit XVR spark badges
URL: https://github.com/apache/beam/pull/10907
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389760)
Time Spent: 0.5h  (was: 20m)

> add postcommit XVR spark badge
> --
>
> Key: BEAM-9338
> URL: https://issues.apache.org/jira/browse/BEAM-9338
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> add postcommit xvr spark badges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389755
 ]

ASF GitHub Bot logged work on BEAM-9341:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:24
Start Date: 20/Feb/20 02:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10912: [BEAM-9341] 
postcommit xvr flink fix
URL: https://github.com/apache/beam/pull/10912#issuecomment-588576387
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389755)
Time Spent: 40m  (was: 0.5h)

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389756
 ]

ASF GitHub Bot logged work on BEAM-9341:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:24
Start Date: 20/Feb/20 02:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10912: [BEAM-9341] 
postcommit xvr flink fix
URL: https://github.com/apache/beam/pull/10912#issuecomment-588576428
 
 
   Run XVR_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389756)
Time Spent: 50m  (was: 40m)

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389749
 ]

ASF GitHub Bot logged work on BEAM-9341:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:10
Start Date: 20/Feb/20 02:10
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10912: [BEAM-9341] postcommit 
xvr flink fix
URL: https://github.com/apache/beam/pull/10912#issuecomment-588573076
 
 
   please run `Run XVR_Flink PostCommit` and `Run XVR_Spark PostCommit`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389749)
Time Spent: 0.5h  (was: 20m)

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389748
 ]

ASF GitHub Bot logged work on BEAM-9341:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:09
Start Date: 20/Feb/20 02:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10912: [BEAM-9341] postcommit 
xvr flink fix
URL: https://github.com/apache/beam/pull/10912#issuecomment-588572710
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389748)
Time Spent: 20m  (was: 10m)

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?focusedWorklogId=389747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389747
 ]

ASF GitHub Bot logged work on BEAM-9341:


Author: ASF GitHub Bot
Created on: 20/Feb/20 02:08
Start Date: 20/Feb/20 02:08
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10912: [BEAM-9341] 
postcommit xvr flink fix
URL: https://github.com/apache/beam/pull/10912
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-02-19 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040565#comment-17040565
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-3788:
-

Added a comment to the email thread. I think this is a Flink specific issue 
that should go away after SDF.

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Java KafkaIO will be made available to Python users as a cross-language 
> transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389745
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:56
Start Date: 20/Feb/20 01:56
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10911: [BEAM-9339] 
Declare capabilities for Go SDK.
URL: https://github.com/apache/beam/pull/10911#discussion_r381659754
 
 

 ##
 File path: sdks/go/pkg/beam/runners/universal/universal.go
 ##
 @@ -78,6 +78,20 @@ func Execute(ctx context.Context, p *beam.Pipeline) error {
return err
 }
 
+const (
+   urnLegacyProgressReporting = "beam:protocol:progress_reporting:v0"
+   urnMultiCore   = 
"beam:protocol:multi_core_bundle_processing:v1"
+)
+
+func goCapabilities() []string {
+   capabilities := []string{
+   urnLegacyProgressReporting,
+   urnLegacyProgressReporting,
+   }
+   capabilities = append(capabilities, graphx.KnownStandardCoders()...)
+   return capabilities
+}
+
 
 Review comment:
   You can return straight from the append and save a line.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389745)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389744
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:56
Start Date: 20/Feb/20 01:56
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10911: [BEAM-9339] 
Declare capabilities for Go SDK.
URL: https://github.com/apache/beam/pull/10911#discussion_r381658325
 
 

 ##
 File path: sdks/go/pkg/beam/runners/universal/universal.go
 ##
 @@ -78,6 +78,20 @@ func Execute(ctx context.Context, p *beam.Pipeline) error {
return err
 }
 
+const (
+   urnLegacyProgressReporting = "beam:protocol:progress_reporting:v0"
+   urnMultiCore   = 
"beam:protocol:multi_core_bundle_processing:v1"
+)
+
+func goCapabilities() []string {
+   capabilities := []string{
+   urnLegacyProgressReporting,
+   urnLegacyProgressReporting,
 
 Review comment:
   Accidentally duped. 
   urnMultiCore?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389744)
Time Spent: 50m  (was: 40m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-02-19 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040558#comment-17040558
 ] 

Chad Dombrova commented on BEAM-3788:
-

Unless something has changed recently, 
https://issues.apache.org/jira/browse/BEAM-7870 is still a blocker for using 
KafkaIO in python out of the box.  As the title suggests, it's also blocking 
PubSubIO in python and conceptually any external transform with a non-trivial 
coder. 

[~mxm], [~bhulette] has anything changed on that issue lately?

 

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Java KafkaIO will be made available to Python users as a cross-language 
> transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389742
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:37
Start Date: 20/Feb/20 01:37
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588565094
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389742)
Time Spent: 3h  (was: 2h 50m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389740
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:36
Start Date: 20/Feb/20 01:36
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588564792
 
 
   retest this please.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389740)
Time Spent: 2h 50m  (was: 2h 40m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389739
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:35
Start Date: 20/Feb/20 01:35
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588564714
 
 
   test this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389739)
Time Spent: 2h 40m  (was: 2.5h)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9341:
--
Description: started from 
[https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> started from [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1738/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9341:
--
Status: Open  (was: Triage Needed)

> postcommit xvr flink, spark failure
> ---
>
> Key: BEAM-9341
> URL: https://issues.apache.org/jira/browse/BEAM-9341
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9341) postcommit xvr flink, spark failure

2020-02-19 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-9341:
-

 Summary: postcommit xvr flink, spark failure
 Key: BEAM-9341
 URL: https://issues.apache.org/jira/browse/BEAM-9341
 Project: Beam
  Issue Type: Bug
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389733
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:22
Start Date: 20/Feb/20 01:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10911: [BEAM-9339] Declare 
capabilities for Go SDK.
URL: https://github.com/apache/beam/pull/10911#issuecomment-588561432
 
 
   R: @lostluck 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389733)
Time Spent: 40m  (was: 0.5h)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389731
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:21
Start Date: 20/Feb/20 01:21
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10911: [BEAM-9339] 
Declare capabilities for Go SDK.
URL: https://github.com/apache/beam/pull/10911
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389728
 ]

ASF GitHub Bot logged work on BEAM-3545:


Author: ASF GitHub Bot
Created on: 20/Feb/20 01:01
Start Date: 20/Feb/20 01:01
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10906: [BEAM-3545] Fix 
race condition w/plan metrics.
URL: https://github.com/apache/beam/pull/10906#issuecomment-588556050
 
 
   That makes sense.
   
   I do kinda like moving the metrics store outside of the plan and into the 
harness/direct runner. It doesn't seem like the plan actually needs the store, 
it's just being used as a container for the harness/direct runner to access and 
pass it to monitoring. But it's such a small difference that I have no 
objections to leaving it as-is. I was mainly concerned whether there was 
another race condition, not so much about style.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389728)
Time Spent: 17h 10m  (was: 17h)

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389720
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381610998
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -478,6 +544,19 @@ def __init__(self, options):
 get_credentials=(not self.google_cloud_options.no_auth),
 http=http_client,
 response_encoding=get_response_encoding())
+self._sdk_image_overrides = self._get_sdk_image_overrides(options)
+
+  def _get_sdk_image_overrides(self, pipeline_options):
+worker_options = pipeline_options.view_as(WorkerOptions)
+sdk_overrides = worker_options.sdk_harness_container_image_overrides
+overrides_dict = dict()
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389720)
Time Spent: 4h 50m  (was: 4h 40m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389714
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381609441
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -301,6 +351,22 @@ def __init__(self, packages, options, 
environment_version, pipeline_url):
   dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty(
   key='display_data', value=to_json_value(items)))
 
+  def _get_environments_from_tranforms(self):
 
 Review comment:
   I thought about this and I think it's safer to collect the exact set of 
environments that we need to execute the pipeline instead of depending on all 
the environments available in the proto. There's nothing that prevents an 
external transform (or anywhere else that we update the proto) from including 
additional environments that are not needed to execute the pipeline.
   
   For example, default environment set by Java SDK does not work without 
replacing the container image (we happen to use the same environment_id for the 
replacement but could have been otherwise).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389714)
Time Spent: 4h 10m  (was: 4h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389719
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381611280
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -576,10 +655,25 @@ def create_job(self, job):
 'A template was just created at location %s', template_location)
 return None
 
+  def _apply_sdk_environment_overrides(self, proto_pipeline):
+# Update environments based on user provided overrides
+sdk_overrides = self._sdk_image_overrides
+if sdk_overrides:
+  for environment in proto_pipeline.components.environments.values():
+docker_payload = proto_utils.parse_Bytes(
+environment.payload, beam_runner_api_pb2.DockerPayload)
+for pattern in sdk_overrides:
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389719)
Time Spent: 4h 40m  (was: 4.5h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389715
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381607910
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -748,6 +748,17 @@ def _add_argparse_args(cls, parser):
 'worker harness. Default is the container for the version of the '
 'SDK. Note: currently, only approved Google Cloud Dataflow '
 'container images may be used here.'))
+parser.add_argument(
+'--sdk_harness_container_image_overrides',
 
 Review comment:
   Currently this has to be provided for cross-language transforms to work (for 
Dataflow) since the container URL that we get from Beam does not work for 
Dataflow. I hope to introduce logic to derive Java container URL from Python in 
a follow up PR.
   
   After this this will be primary for testing or for users to override the 
container (similar to worker_harness_container_image option today).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389715)
Time Spent: 4h 10m  (was: 4h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389718
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381626779
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
 ##
 @@ -329,7 +402,7 @@ def test_harness_override_default_in_released_sdks(self):
 env = apiclient.Environment([], #packages
 pipeline_options,
 '2.0.0', #any environment version
-FAKE_PIPELINE_URL)
+FAKE_PIPELINE_URL, None, None)
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389718)
Time Spent: 4h 40m  (was: 4.5h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389722
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381615673
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -576,10 +655,25 @@ def create_job(self, job):
 'A template was just created at location %s', template_location)
 return None
 
+  def _apply_sdk_environment_overrides(self, proto_pipeline):
+# Update environments based on user provided overrides
+sdk_overrides = self._sdk_image_overrides
+if sdk_overrides:
+  for environment in proto_pipeline.components.environments.values():
+docker_payload = proto_utils.parse_Bytes(
+environment.payload, beam_runner_api_pb2.DockerPayload)
+for pattern in sdk_overrides:
+  override = sdk_overrides[pattern]
+  if re.match(pattern, docker_payload.container_image):
+new_payload = beam_runner_api_pb2.DockerPayload(
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389722)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389721
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381619622
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -576,10 +655,25 @@ def create_job(self, job):
 'A template was just created at location %s', template_location)
 return None
 
+  def _apply_sdk_environment_overrides(self, proto_pipeline):
+# Update environments based on user provided overrides
+sdk_overrides = self._sdk_image_overrides
+if sdk_overrides:
+  for environment in proto_pipeline.components.environments.values():
+docker_payload = proto_utils.parse_Bytes(
+environment.payload, beam_runner_api_pb2.DockerPayload)
+for pattern in sdk_overrides:
+  override = sdk_overrides[pattern]
+  if re.match(pattern, docker_payload.container_image):
+new_payload = beam_runner_api_pb2.DockerPayload(
+container_image=override)
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389721)
Time Spent: 5h  (was: 4h 50m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389716
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381608521
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, 
pipeline_url):
 # User agent information.
 self.proto.userAgent = dataflow.Environment.UserAgentValue()
 self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint
+self._proto_pipeline = proto_pipeline
+self._sdk_image_overrides = (
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389716)
Time Spent: 4h 20m  (was: 4h 10m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389717
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:53
Start Date: 20/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381610512
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -301,6 +351,22 @@ def __init__(self, packages, options, 
environment_version, pipeline_url):
   dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty(
   key='display_data', value=to_json_value(items)))
 
+  def _get_environments_from_tranforms(self):
+if not self._proto_pipeline:
+  return []
+environment_ids = []
+for transform in self._proto_pipeline.components.transforms.values():
+  if transform.environment_id not in environment_ids:
+environment_ids.append(transform.environment_id)
+environments = []
+for environment_id in environment_ids:
+  if not environment_id:
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389717)
Time Spent: 4.5h  (was: 4h 20m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389708
 ]

ASF GitHub Bot logged work on BEAM-3545:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:40
Start Date: 20/Feb/20 00:40
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10906: [BEAM-3545] 
Fix race condition w/plan metrics.
URL: https://github.com/apache/beam/pull/10906
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389708)
Time Spent: 17h  (was: 16h 50m)

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
>  Time Spent: 17h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389707
 ]

ASF GitHub Bot logged work on BEAM-3545:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:40
Start Date: 20/Feb/20 00:40
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #10906: [BEAM-3545] Fix 
race condition w/plan metrics.
URL: https://github.com/apache/beam/pull/10906#issuecomment-588550039
 
 
   It's more that Down shouldn't be called while Execute/Process is running. 
It's part of the contract for Units (see exec/unit.go) that they aren't 
required to be concurrency safe. 
   
   The metrics do mess up this formulation though, since they *must* be 
accessible outside of the plan execution. I did consider instead simply having 
the store created outside of the plan, which would mean the harness would need 
to handle the asynchronous storage/access for it (probably as part of one of 
the maps in harness).  That would probably save a lock. What do you think?
   
   There are several more changes coming here for PCollection and PTransform 
metrics,  and while I don't like churn, I'll be following up in another PR 
anyway, so it can be cleaned up then.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389707)
Time Spent: 16h 50m  (was: 16h 40m)

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9340) Properly populate pipeline proto requirements.

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9340?focusedWorklogId=389705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389705
 ]

ASF GitHub Bot logged work on BEAM-9340:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:33
Start Date: 20/Feb/20 00:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10909: [BEAM-9340] 
Populate requirements for Python DoFn properties.
URL: https://github.com/apache/beam/pull/10909
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9340) Properly populate pipeline proto requirements.

2020-02-19 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-9340:
-

 Summary: Properly populate pipeline proto requirements.
 Key: BEAM-9340
 URL: https://issues.apache.org/jira/browse/BEAM-9340
 Project: Beam
  Issue Type: New Feature
  Components: beam-model, sdk-go, sdk-java-core, sdk-py-core
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389703
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:25
Start Date: 20/Feb/20 00:25
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588545387
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389703)
Time Spent: 2.5h  (was: 2h 20m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389702
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:24
Start Date: 20/Feb/20 00:24
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588545387
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389702)
Time Spent: 2h 20m  (was: 2h 10m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389700
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:23
Start Date: 20/Feb/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10908: [BEAM-9339] Declare 
capabilities for Python SDK.
URL: https://github.com/apache/beam/pull/10908#issuecomment-588544811
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389700)
Time Spent: 20m  (was: 10m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=389698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389698
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:22
Start Date: 20/Feb/20 00:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10908: [BEAM-9339] 
Declare capabilities for Python SDK.
URL: https://github.com/apache/beam/pull/10908
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389695
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:14
Start Date: 20/Feb/20 00:14
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381621290
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/capture_control.py
 ##
 @@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module to control how Interactive Beam captures data from sources for
+deterministic replayable PCollection evaluation and pipeline runs.
+
+For internal use only; no backwards-compatibility guarantees.
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+from datetime import timedelta
+
+from apache_beam.io.gcp.pubsub import ReadFromPubSub
+from apache_beam.runners.interactive import background_caching_job as bcj
+from apache_beam.runners.interactive import interactive_environment as ie
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
 
 Review comment:
   We intend to log at info level for this module.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389695)
Time Spent: 65h 10m  (was: 65h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 65h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=389691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389691
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 20/Feb/20 00:01
Start Date: 20/Feb/20 00:01
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r381617505
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1158,6 +1158,90 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for locally-accessible artifact files.
+// payload: ArtifactFilePayload
+FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"];
+
+// A URN for artifacts described by URLs.
+// payload: ArtifactUrlPayload
+URL  = 1 [(beam_urn) = "beam:artifact:type:url:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"];
+
+// A URN for Java artifacts hosted on a Maven repository.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"];
+  }
+  enum Roles {
+// A URN for staging-to role.
+// payload: ArtifactStagingToRolePayload
+STAGING_TO  = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // a string for an artifact path e.g. "/tmp/foo.jar"
+  string path = 1;
+
+  // The hex-encoded sha256 checksum of the artifact.
+  string sha256 = 2;
+}
+
+message ArtifactUrlPayload {
+  // a string for an artifact URL e.g. "https://.../foo.jar; or 
"gs://tmp/foo.jar"
+  string path = 1;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389691)
Time Spent: 7h 10m  (was: 7h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9339) Declare capabilities in SDK environments

2020-02-19 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-9339:
-

 Summary: Declare capabilities in SDK environments
 Key: BEAM-9339
 URL: https://issues.apache.org/jira/browse/BEAM-9339
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go, sdk-java-harness, sdk-py-harness
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9338) add postcommit XVR spark badge

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?focusedWorklogId=389685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389685
 ]

ASF GitHub Bot logged work on BEAM-9338:


Author: ASF GitHub Bot
Created on: 19/Feb/20 23:53
Start Date: 19/Feb/20 23:53
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10907: [BEAM-9338] add 
postcommit XVR spark badges
URL: https://github.com/apache/beam/pull/10907
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9338) add postcommit XVR spark badge

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?focusedWorklogId=389686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389686
 ]

ASF GitHub Bot logged work on BEAM-9338:


Author: ASF GitHub Bot
Created on: 19/Feb/20 23:53
Start Date: 19/Feb/20 23:53
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10907: [BEAM-9338] add 
postcommit XVR spark badges
URL: https://github.com/apache/beam/pull/10907#issuecomment-588535760
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389686)
Time Spent: 20m  (was: 10m)

> add postcommit XVR spark badge
> --
>
> Key: BEAM-9338
> URL: https://issues.apache.org/jira/browse/BEAM-9338
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> add postcommit xvr spark badges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=389683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389683
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 19/Feb/20 23:52
Start Date: 19/Feb/20 23:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r381614435
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1158,6 +1158,90 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for locally-accessible artifact files.
+// payload: ArtifactFilePayload
+FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"];
+
+// A URN for artifacts described by URLs.
+// payload: ArtifactUrlPayload
+URL  = 1 [(beam_urn) = "beam:artifact:type:url:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"];
+
+// A URN for Java artifacts hosted on a Maven repository.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"];
+  }
+  enum Roles {
+// A URN for staging-to role.
+// payload: ArtifactStagingToRolePayload
+STAGING_TO  = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // a string for an artifact path e.g. "/tmp/foo.jar"
+  string path = 1;
+
+  // The hex-encoded sha256 checksum of the artifact.
+  string sha256 = 2;
+}
+
+message ArtifactUrlPayload {
+  // a string for an artifact URL e.g. "https://.../foo.jar; or 
"gs://tmp/foo.jar"
+  string path = 1;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
 
 Review comment:
   We could consider just letting the payload be the raw bytes (and similarly 
for other one-field payloads). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389683)
Time Spent: 7h  (was: 6h 50m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9338) add postcommit XVR spark badge

2020-02-19 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9338:
--
Summary: add postcommit XVR spark badge  (was: add postcommit XVR spark 
tickers)

> add postcommit XVR spark badge
> --
>
> Key: BEAM-9338
> URL: https://issues.apache.org/jira/browse/BEAM-9338
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> add postcommit xvr spark tickers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9338) add postcommit XVR spark badge

2020-02-19 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9338:
--
Description: add postcommit xvr spark badges  (was: add postcommit xvr 
spark tickers)

> add postcommit XVR spark badge
> --
>
> Key: BEAM-9338
> URL: https://issues.apache.org/jira/browse/BEAM-9338
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> add postcommit xvr spark badges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9338) add postcommit XVR spark tickers

2020-02-19 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-9338:
-

 Summary: add postcommit XVR spark tickers
 Key: BEAM-9338
 URL: https://issues.apache.org/jira/browse/BEAM-9338
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Heejong Lee
Assignee: Heejong Lee


add postcommit xvr spark tickers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9338) add postcommit XVR spark tickers

2020-02-19 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9338:
--
Status: Open  (was: Triage Needed)

> add postcommit XVR spark tickers
> 
>
> Key: BEAM-9338
> URL: https://issues.apache.org/jira/browse/BEAM-9338
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> add postcommit xvr spark tickers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9338) add postcommit XVR spark tickers

2020-02-19 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9338:
--
Issue Type: Improvement  (was: Bug)

> add postcommit XVR spark tickers
> 
>
> Key: BEAM-9338
> URL: https://issues.apache.org/jira/browse/BEAM-9338
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> add postcommit xvr spark tickers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389678
 ]

ASF GitHub Bot logged work on BEAM-3545:


Author: ASF GitHub Bot
Created on: 19/Feb/20 23:34
Start Date: 19/Feb/20 23:34
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #10906: [BEAM-3545] Fix 
race condition w/plan metrics.
URL: https://github.com/apache/beam/pull/10906#issuecomment-588529911
 
 
   R: @youngoli 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389678)
Time Spent: 16h 40m  (was: 16.5h)

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3545) Fn API metrics in Go SDK harness

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?focusedWorklogId=389677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389677
 ]

ASF GitHub Bot logged work on BEAM-3545:


Author: ASF GitHub Bot
Created on: 19/Feb/20 23:34
Start Date: 19/Feb/20 23:34
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10906: [BEAM-3545] 
Fix race condition w/plan metrics.
URL: https://github.com/apache/beam/pull/10906
 
 
   The last change introduced a race condition when initializing a plan. 
There's a small, but still open window where the progress reporting goroutine 
might try to read the value while it's being written to. A mutex clears this up.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Resolved] (BEAM-3306) Consider: Go coder registry

2020-02-19 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-3306.

Fix Version/s: Not applicable
   Resolution: Fixed

Go supports a coder registry w/beam.RegisterCoder

Remaining work might be to optionally support "direct" access to an io.Reader 
or io.Writer interface which could yield efficiency gains in some situations 
for user types.

> Consider: Go coder registry
> ---
>
> Key: BEAM-3306
> URL: https://issues.apache.org/jira/browse/BEAM-3306
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Add coder registry to allow easier overwrite of default coders. We may also 
> allow otherwise un-encodable types, but that would require that function 
> analysis depends on it.
> If we're hardcoding support for proto/avro, then there may be little need for 
> such a feature. Conversely, this may be how we implement such support.
>  
> Proposal Doc: 
> [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=389672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389672
 ]

ASF GitHub Bot logged work on BEAM-9287:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:58
Start Date: 19/Feb/20 22:58
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10863: [BEAM-9287] 
Add Python streaming Validates runner tests for Unified Worker
URL: https://github.com/apache/beam/pull/10863
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389672)
Time Spent: 1h 10m  (was: 1h)

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=389671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389671
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:55
Start Date: 19/Feb/20 22:55
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10733: [BEAM-9229] Adding 
dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#issuecomment-588516398
 
 
   @robertwb @lukecwik PTAL. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389671)
Time Spent: 6h 50m  (was: 6h 40m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389670
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:55
Start Date: 19/Feb/20 22:55
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381594919
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/capture_control.py
 ##
 @@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module to control how Interactive Beam captures data from sources for
+deterministic replayable PCollection evaluation and pipeline runs.
+
+For internal use only; no backwards-compatibility guarantees.
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+from datetime import timedelta
+
+from apache_beam.io.gcp.pubsub import ReadFromPubSub
+from apache_beam.runners.interactive import background_caching_job as bcj
+from apache_beam.runners.interactive import interactive_environment as ie
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class CaptureControl(object):
+  """Options and their utilities that controls how Interactive Beam captures
+  deterministic replayable data from sources."""
+  def __init__(self):
+self._enable_capture_replay = True
+self._capturable_sources = {
+ReadFromPubSub,
+}  # yapf: disable
 
 Review comment:
   Removing the disable statement.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389670)
Time Spent: 65h  (was: 64h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 65h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4032) Support staging binary distributions of dependency packages.

2020-02-19 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040489#comment-17040489
 ] 

Ankur Goenka commented on BEAM-4032:


This will become less of a concern when we start supporting custom containers 
as the binaries will be preinstalled on it.

> Support staging binary distributions of dependency packages.
> 
>
> Key: BEAM-4032
> URL: https://issues.apache.org/jira/browse/BEAM-4032
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> requirements.txt only supports source-distribution dependencies [1].
> --extra_packages does not officially support wheel files [2].
> It is possible to expand this to support binary distributions as long as we 
> have the knowledge of the target platform.
> We should take into consideration the mechanisms of staging dependencies 
> through portability framework, and perhaps consolidate some of the existing 
> options.
> [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L260]
> [https://github.com/apache/beam/blob/a79d1b4fc27eb81db0d9a773047820a206f3d238/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L188]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389667
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:49
Start Date: 19/Feb/20 22:49
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381592605
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -130,6 +139,17 @@ def __init__(self, cache_manager=None):
   'You have limited Interactive Beam features since your '
   'ipython kernel is not connected any notebook frontend.')
 
+  @property
+  def options(self):
+"""A reference to the global interactive options.
+
+Provided to avoid import loop or excessive dynamic import. All internal
+Interactive Beam modules should access interactive_beam.options through
+this property.
+"""
+from apache_beam.runners.interactive.interactive_beam import options
+return options
 
 Review comment:
   The `options` instantiated in `interactive_beam` is to expose the `getters` 
and `setters` of configurable fields such as `enable_capture_replay`, 
`capture_duration` and `capturable_sources` to the `interactive beam` user.
   The user has all necessary `getters`, `setters` and docstrings by looking at 
the `interactive_beam` module without the complexity of their underlying 
implementation details (such as `__repr__`, and how options and their utilities 
are grouped together).
   
   Then inside all internal `interactive beam` modules, to access the fields 
configured by the user, since we don't want any module to depend on 
`interactive_beam` module to avoid import loops, we can only do dynamic 
importing. It's going to be messy if we just dynamic import `interactive_beam` 
(the module that is supposed to be used by end user) everywhere. So this 
property (no `setter` given) does the dynamic import once in this single place 
and all internal modules will depend on `interactive_environment` module to 
access whatever configuration the user might have set.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389667)
Time Spent: 64h 50m  (was: 64h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 64h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389666
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:40
Start Date: 19/Feb/20 22:40
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381588614
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -34,6 +34,58 @@
 from __future__ import absolute_import
 
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.options import interactive_options
+
+
+class Options(interactive_options.InteractiveOptions):
+  """Options that guide how Interactive Beam works."""
+  @property
+  def enable_capture_replay(self):
+"""Whether replayable source data capture should be replayed for multiple
+PCollection evaluations and pipeline runs as long as the data captured is
+still valid."""
+return self.capture_control._enable_capture_replay
+
+  @enable_capture_replay.setter
+  def enable_capture_replay(self, value):
+"""Sets whether source data capture should be replayed. True - Enables
+capture of replayable source data so that following PCollection evaluations
+and pipeline runs always use the same data captured; False - Disables
+capture of replayable source data so that following PCollection evaluation
+and pipeline runs always use new data from sources."""
+self.capture_control._enable_capture_replay = value
+
+  @property
+  def capturable_sources(self):
+"""Interactive Beam automatically captures data from sources in this 
set."""
+return self.capture_control._capturable_sources
+
+  @property
+  def capture_duration(self):
+"""The data capture of sources ends as soon as the background caching job
+has run for this long."""
+return self.capture_control._capture_duration
+
+  @capture_duration.setter
+  def capture_duration(self, value):
+"""Sets the capture duration as a timedelta.
+
+Example::
+
+  # Sets the capture duration limit to 10 seconds.
+  interactive_beam.options.capture_duration = timedelta(seconds=10)
+  # Evicts all captured data if there is any.
+  interactive_beam.evict_captured_data()
+  # The next PCollection evaluation will capture fresh data from sources,
+  # and the data captured will be replayed until another eviction.
+"""
+self.capture_control._capture_duration = value
+
+  # TODO(BEAM-8335): add capture_size options when they are supported.
+
+
+# Users can set options to guide how Interactive Beam works.
+options = Options()
 
 Review comment:
   Will add
   ```
   Example:
   from datetime import timedelta
   from apache_beam.runners.interactive import interactive_beam as ib
   ib.options.enable_capture_replay = False/True
   ib.options.capture_duration = timedelta(seconds=60)
   ib.options.capturable_sources.add(SourceClass)
   ```
   to the comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389666)
Time Spent: 64h 40m  (was: 64.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 64h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389665
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:36
Start Date: 19/Feb/20 22:36
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381586709
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -132,7 +256,22 @@ def is_source_to_cache_changed(user_pipeline):
   is_changed = not current_signature.issubset(recorded_signature)
   # The computation of extract_unbounded_source_signature is expensive, track 
on
   # change by default.
-  if is_changed:
+  if is_changed and update_cached_source_signature:
+if ie.current_env().options.enable_capture_replay:
+  if not recorded_signature:
+_LOGGER.info(
+'Interactive Beam has detected you have unbounded sources '
+'in your pipeline. In order to have a deterministic replay '
+'of your pipeline: {}'.format(
 
 Review comment:
   `ie.current_env().options.capture_control` is formatted the same to the 
other case.
   Do you think we should make its first letter lower case for this scenario?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389665)
Time Spent: 64.5h  (was: 64h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 64.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389664
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:32
Start Date: 19/Feb/20 22:32
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381585204
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class BackgroundCachingJob(object):
+  """A simple abstraction that controls necessary components of a timed and
+  [disk] space limited background caching job.
+
+  A background caching job successfully terminates in 2 conditions:
+
+#. The job is finite and runs into DONE state;
+#. The job is infinite but hits an interactive_beam.options configured 
limit
+   and gets cancelled into CANCELLED/CANCELLING state.
+
+  In both situations, the background caching job should be treated as done
+  successfully.
+  """
+  def __init__(self, pipeline_result, start_limit_checkers=True):
+self._pipeline_result = pipeline_result
+self._timer = threading.Timer(
 
 Review comment:
   Thanks! Yes, we should in case the notebook is shutdown abruptly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389664)
Time Spent: 64h 20m  (was: 64h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 64h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389662
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:27
Start Date: 19/Feb/20 22:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10894: 
[BEAM-8280] Enable and improve IOTypeHints debug_str traceback
URL: https://github.com/apache/beam/pull/10894#discussion_r381580646
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform.py
 ##
 @@ -849,8 +854,9 @@ def element_type(side_input):
 if not typehints.is_consistent_with(bindings.get(arg, typehints.Any),
 hint):
   raise TypeCheckError(
-  'Type hint violation for \'%s\': requires %s but got %s for %s' %
-  (self.label, hint, bindings[arg], arg))
+  'Type hint violation for \'%s\': requires %s but got %s for %s\n'
+  'Full type hint:\n%s' %
+  (self.label, hint, bindings[arg], arg, type_hints.debug_str()))
 
 Review comment:
   nit: with this many args, either using `.format` with named params or 
`%(name)d` with an arg dict would be more readable
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389662)
Time Spent: 5h 20m  (was: 5h 10m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389661
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:25
Start Date: 19/Feb/20 22:25
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10879: [BEAM-9326] Make 
JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879#issuecomment-588504481
 
 
   Yes it is already reverted, for the `PipelineTest` class, for `JsonToRow` we 
are good to go. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389661)
Time Spent: 1h 10m  (was: 1h)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389658
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 22:14
Start Date: 19/Feb/20 22:14
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10879: [BEAM-9326] Make 
JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879#issuecomment-588499498
 
 
   Yikes! If removing `` makes it impossible to treat this 
class as a `PTransform` then I think this change should be reverted. Indeed I 
wondered if there was something funky going to happy in Java's type checking.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389658)
Time Spent: 1h  (was: 50m)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9337) DataflowPipelineJob.waitUntilFinish() crashes when it has created a template.

2020-02-19 Thread Kenneth Knowles (Jira)
Kenneth Knowles created BEAM-9337:
-

 Summary: DataflowPipelineJob.waitUntilFinish() crashes when it has 
created a template.
 Key: BEAM-9337
 URL: https://issues.apache.org/jira/browse/BEAM-9337
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Kenneth Knowles
Assignee: Yunqing Zhou


{code:java}
INFO: Template successfully created.

Exception in thread "main" java.lang.UnsupportedOperationException: The result 
of template creation should not be used.
at 
org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getJobId(DataflowTemplateJob.java:37)
at 
org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:524)
at 
org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:506)
at 
org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:295)
at 
org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:224)
at 
org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:183)
at 
org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:176)
{code}

This is a real error. If a template was created, the job is complete. Instead 
of crashing by tried to access the job id, as though {{DataflowPipelineJob}} 
doesn't know it made a template, it should instead return successfully. Or 
perhaps there is another design choice. But just crashes does not make sense. 
Probably {{DataflowRunner}} should not return a {{DataflowPipelineJob}} at all 
in this way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389654
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:55
Start Date: 19/Feb/20 21:55
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #10847: 
[BEAM-9228] Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#discussion_r381568266
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -994,7 +1069,13 @@ def input_for(transform_id, input_id):
 # The worker will be waiting on these inputs as well.
 for other_input in data_input:
   if other_input not in deferred_inputs:
-deferred_inputs[other_input] = _ListBuffer([])
+outputs = process_bundle_descriptor.transforms[
+  other_input].outputs.values()
+coder_id = process_bundle_descriptor.pcollections[
+  only_element(outputs)].coder_id
+coder = context.coders[coder_id]
+deferred_inputs[other_input] = _ListBuffer(
+coder_impl=coder.get_impl())
 
 Review comment:
   As commented at L1082 (of the PR branch), deferred inputs cannot be parallel 
processed for now. Is it better to set coder_impl to None to reduce unnecessary 
processes for now and add it back later when parallel processing is supported 
for deferred_inputs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389654)
Time Spent: 2h 20m  (was: 2h 10m)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389653
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:53
Start Date: 19/Feb/20 21:53
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381567204
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class BackgroundCachingJob(object):
+  """A simple abstraction that controls necessary components of a timed and
+  [disk] space limited background caching job.
+
+  A background caching job successfully terminates in 2 conditions:
+
+#. The job is finite and runs into DONE state;
 
 Review comment:
   Those are considered as **not** `successfully` terminates.
   Let me reword it into `A background caching job successfully complete source 
data capture in 2 conditions`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389653)
Time Spent: 64h 10m  (was: 64h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 64h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=389652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389652
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:49
Start Date: 19/Feb/20 21:49
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide 
WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#issuecomment-588489213
 
 
   Kindly pinging : )
   If everything looks fine, I'll squash them into one commit before merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389652)
Time Spent: 16h 50m  (was: 16h 40m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389651
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:48
Start Date: 19/Feb/20 21:48
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381564546
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class BackgroundCachingJob(object):
+  """A simple abstraction that controls necessary components of a timed and
+  [disk] space limited background caching job.
 
 Review comment:
   This is to indicate the captured data could be on-disk, or in other mediums 
such as in-memory (some testing cache manager implementation).
   Let me just remove the `[disk]` to avoid the confusion.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389651)
Time Spent: 64h  (was: 63h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 64h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389650
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:45
Start Date: 19/Feb/20 21:45
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381563024
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
 
 Review comment:
   We plan to log at INFO level for this module.
   Since each module in Beam runners has its own logger (since 
[PR](https://github.com/apache/beam/commit/49d6efdd2a59652462228e3e2b353bcc4173554b)),
 this overriding is intended and has limited effect.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389650)
Time Spent: 63h 50m  (was: 63h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389648
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:42
Start Date: 19/Feb/20 21:42
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381561361
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/capture_control.py
 ##
 @@ -15,6 +15,16 @@
 # limitations under the License.
 #
 
+"""Module to control how Interactive Beam captures data from sources for
+deterministic replayable PCollection evaluation and pipeline runs.
+
+For internal use only; no backwards-compatibility guarantees.
+"""
+
+# pytype: skip-file
 
 Review comment:
   According to Boyuan and the 
[PR](https://github.com/apache/beam/commit/7547ac6b273e6e2ffe7d69775606e14c0fd455b2),
 type checking is still in development, not stable and may cause surprised 
failures.
   So it's added to all py files.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389648)
Time Spent: 63h 40m  (was: 63.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389641
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381549837
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -171,13 +180,14 @@ class TestStream(PTransform):
   time. After all of the specified elements are emitted, ceases to produce
   output.
   """
-  def __init__(self, coder=coders.FastPrimitivesCoder(), events=None):
+  def __init__(
+  self, coder=coders.FastPrimitivesCoder(), events=None, output_tags=None):
 super(TestStream, self).__init__()
 assert coder is not None
 self.coder = coder
 self.watermarks = {None: timestamp.MIN_TIMESTAMP}
 self._events = [] if events is None else list(events)
 
 Review comment:
   Done. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389641)
Time Spent: 63h 20m  (was: 63h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389642
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381552740
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -276,8 +295,11 @@ def to_runner_api_parameter(self, context):
   @PTransform.register_urn(
   common_urns.primitives.TEST_STREAM.urn,
   beam_runner_api_pb2.TestStreamPayload)
-  def from_runner_api_parameter(payload, context):
+  def from_runner_api_parameter(ptransform, payload, context):
 coder = context.coders.get_by_id(payload.coder_id)
+output_tags = set(
+None if k == 'None' else k for k in ptransform.outputs.keys())
 
 Review comment:
   This behavior isn't unique in the TestStream. This is consistent with the 
rest of the Python SDK with handling PCollection tags being None. As for what 
happens, I don't know, there are probably many subtle things that may go wrong.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389642)
Time Spent: 63.5h  (was: 63h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389637
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381538464
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -73,60 +73,12 @@ class SwitchingDirectRunner(PipelineRunner):
   def is_fnapi_compatible(self):
 return BundleBasedDirectRunner.is_fnapi_compatible()
 
-  def apply_TestStream(self, transform, pbegin, options):
 
 Review comment:
   thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389637)
Time Spent: 62h 50m  (was: 62h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389644
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381551312
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -188,7 +198,16 @@ def _infer_output_coder(self, input_type=None, 
input_coder=None):
   def expand(self, pbegin):
 assert isinstance(pbegin, pvalue.PBegin)
 self.pipeline = pbegin.pipeline
-return pvalue.PCollection(self.pipeline, is_bounded=False)
+if len(self.output_tags) == 0:
 
 Review comment:
   Gotcha, changed to ```if not self.output_tags```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389644)
Time Spent: 63.5h  (was: 63h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389640
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381543027
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -133,15 +140,17 @@ def __eq__(self, other):
 return self.new_watermark == other.new_watermark and self.tag == other.tag
 
   def __hash__(self):
-return hash(self.new_watermark)
+return hash(str(self.new_watermark) + str(self.tag))
 
   def __lt__(self, other):
 return self.new_watermark < other.new_watermark
 
   def to_runner_api(self, unused_element_coder):
+tag = 'None' if self.tag is None else self.tag
 
 Review comment:
   Looking through the codebase, it seems that 'None' is the special keyword 
used in the Python SDK to represent a tag that is specifically None. This keeps 
with the current style of the rest of the Python SDK. @lukecwik is this true?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389640)
Time Spent: 63h 10m  (was: 63h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389638
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381538392
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/test_stream_impl.py
 ##
 @@ -45,11 +46,69 @@ class _WatermarkController(PTransform):
- If the instance receives an ElementEvent, it emits all specified elements
  to the Global Window with the event time set to the element's timestamp.
   """
+  def __init__(self, output_tag):
+self.output_tag = output_tag
+
   def get_windowing(self, _):
 return core.Windowing(window.GlobalWindows())
 
   def expand(self, pcoll):
-return pvalue.PCollection.from_(pcoll)
+ret = pvalue.PCollection.from_(pcoll)
+ret.tag = self.output_tag
+return ret
+
+
+class _ExpandableTestStream(PTransform):
+  def __init__(self, test_stream):
+self.test_stream = test_stream
+
+  def expand(self, pbegin):
+"""Expands the TestStream into the DirectRunner implementation.
+
+
+Takes the TestStream transform and creates a _TestStream -> multiplexer ->
+_WatermarkController.
+"""
+
+assert isinstance(pbegin, pvalue.PBegin)
+
+# If there is only one tag there is no need to add the multiplexer.
+if len(self.test_stream.output_tags) == 1:
+  return (
+  pbegin
+  | _TestStream(
+  self.test_stream.output_tags,
+  events=self.test_stream._events,
+  coder=self.test_stream.coder)
+  | _WatermarkController(list(self.test_stream.output_tags)[0]))
+
+# This multiplexing the  multiple output PCollections.
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389638)
Time Spent: 62h 50m  (was: 62h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389645
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381553044
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream_test.py
 ##
 @@ -528,6 +529,56 @@ def process(
 
 p.run()
 
+  def test_roundtrip_proto(self):
+test_stream = (TestStream()
+   .advance_processing_time(1)
+   .advance_watermark_to(2)
+   .add_elements([1, 2, 3])) # yapf: disable
+
+p = TestPipeline(options=StandardOptions(streaming=True))
+p | test_stream
+
+pipeline_proto, context = p.to_runner_api(return_context=True)
+
+for t in pipeline_proto.components.transforms.values():
+  if t.spec.urn == common_urns.primitives.TEST_STREAM.urn:
+test_stream_proto = t
+
+self.assertTrue(test_stream_proto)
+roundtrip_test_stream = TestStream().from_runner_api(
+test_stream_proto, context)
+
+self.assertListEqual(test_stream._events, roundtrip_test_stream._events)
+self.assertSetEqual(
+test_stream.output_tags, roundtrip_test_stream.output_tags)
+self.assertEqual(test_stream.coder, roundtrip_test_stream.coder)
+
+  def test_roundtrip_proto_multi(self):
+test_stream = (TestStream(output_tags=['a', 'b'])
 
 Review comment:
   Gotcha, removed the output_tags
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389645)
Time Spent: 63.5h  (was: 63h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389639
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381545149
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -171,13 +180,14 @@ class TestStream(PTransform):
   time. After all of the specified elements are emitted, ceases to produce
   output.
   """
 
 Review comment:
   > Are you trying to allow for outputs that have no events, otherwise 
shouldn't the tags come from the list of events?
   
   Yep! The TestStreamService will allow users to define a TestStream with the 
output_tags specified at creation time and the events supplied at runtime.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389639)
Time Spent: 63h  (was: 62h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389643
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381557950
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -133,15 +140,17 @@ def __eq__(self, other):
 return self.new_watermark == other.new_watermark and self.tag == other.tag
 
   def __hash__(self):
-return hash(self.new_watermark)
+return hash(str(self.new_watermark) + str(self.tag))
 
   def __lt__(self, other):
 return self.new_watermark < other.new_watermark
 
   def to_runner_api(self, unused_element_coder):
+tag = 'None' if self.tag is None else self.tag
 return beam_runner_api_pb2.TestStreamPayload.Event(
 watermark_event=beam_runner_api_pb2.TestStreamPayload.Event.
-AdvanceWatermark(new_watermark=self.new_watermark.micros // 1000))
+AdvanceWatermark(
+new_watermark=self.new_watermark.micros // 1000, tag=tag))
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389643)
Time Spent: 63.5h  (was: 63h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389646
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:38
Start Date: 19/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381551914
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -188,7 +198,16 @@ def _infer_output_coder(self, input_type=None, 
input_coder=None):
   def expand(self, pbegin):
 assert isinstance(pbegin, pvalue.PBegin)
 self.pipeline = pbegin.pipeline
-return pvalue.PCollection(self.pipeline, is_bounded=False)
+if len(self.output_tags) == 0:
+  self.output_tags = set([None])
+
+if len(self.output_tags) == 1:
 
 Review comment:
   For backwards compatibility, existing code already relies on a single 
PCollection being returned.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389646)
Time Spent: 63.5h  (was: 63h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 63.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389634
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:31
Start Date: 19/Feb/20 21:31
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389634)
Time Spent: 54h 50m  (was: 54h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 54h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=389633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389633
 ]

ASF GitHub Bot logged work on BEAM-9287:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:15
Start Date: 19/Feb/20 21:15
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10863: [BEAM-9287] Add 
Python streaming Validates runner tests for Unified Worker
URL: https://github.com/apache/beam/pull/10863#issuecomment-588474305
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389633)
Time Spent: 1h  (was: 50m)

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389632
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:14
Start Date: 19/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10835: [BEAM-8575] Removed 
MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-588473555
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389632)
Time Spent: 54h 40m  (was: 54.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 54h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389626
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381546032
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -130,6 +139,17 @@ def __init__(self, cache_manager=None):
   'You have limited Interactive Beam features since your '
   'ipython kernel is not connected any notebook frontend.')
 
+  @property
+  def options(self):
+"""A reference to the global interactive options.
+
+Provided to avoid import loop or excessive dynamic import. All internal
+Interactive Beam modules should access interactive_beam.options through
+this property.
+"""
+from apache_beam.runners.interactive.interactive_beam import options
+return options
 
 Review comment:
   Why is the options defined globally in a different file, but setter is here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389626)
Time Spent: 62h 20m  (was: 62h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389625
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381537314
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
 
 Review comment:
   Is this needed? This will override otherthings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389625)
Time Spent: 62h 20m  (was: 62h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >