[jira] [Updated] (BEAM-8420) Beam Model sources (classifier) jar don't contain sources

2019-10-22 Thread Romain Manni-Bucau (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain Manni-Bucau updated BEAM-8420:
-
Description: These packages are generated from proto files. The jars only 
contain META-INF/MANIFEST.MF. They should either not be distributed or should 
contain the generated java source files and optionally the proto files.  (was: 
These packages are generated from proto files. The jars only contain 
META-INF/MANIFEST.MF. They should either not be distributed or should contain 
the source proto files.)

> Beam Model sources (classifier) jar don't contain sources
> -
>
> Key: BEAM-8420
> URL: https://issues.apache.org/jira/browse/BEAM-8420
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> These packages are generated from proto files. The jars only contain 
> META-INF/MANIFEST.MF. They should either not be distributed or should contain 
> the generated java source files and optionally the proto files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=332400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332400
 ]

ASF GitHub Bot logged work on BEAM-8374:


Author: ASF GitHub Bot
Created on: 23/Oct/19 04:24
Start Date: 23/Oct/19 04:24
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #9758: 
[BEAM-8374] Fixes bug in SnsIO PublishResultCoder
URL: https://github.com/apache/beam/pull/9758#discussion_r337844279
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sns/PublishResultBuilder.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws.sns;
+
+import com.amazonaws.ResponseMetadata;
+import com.amazonaws.http.HttpResponse;
+import com.amazonaws.http.SdkHttpMetadata;
+import com.amazonaws.services.sns.model.PublishResult;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+public class PublishResultBuilder {
 
 Review comment:
   Shouldn't we build this class using auto service (@AutoValue), so it can be 
consistent with Beam model? @iemejia and @lukecwik what do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332400)
Time Spent: 2.5h  (was: 2h 20m)

> PublishResult returned by SnsIO is missing sdkResponseMetadata and 
> sdkHttpMetadata
> --
>
> Key: BEAM-8374
> URL: https://issues.apache.org/jira/browse/BEAM-8374
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently the PublishResultCoder in SnsIO only serializes the messageId field 
> so the PublishResult returned by Beam returns null for 
> getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible 
> to check the HTTP status for errors, which is necessary since this is not 
> handled in SnsIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=332399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332399
 ]

ASF GitHub Bot logged work on BEAM-8374:


Author: ASF GitHub Bot
Created on: 23/Oct/19 04:23
Start Date: 23/Oct/19 04:23
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #9758: 
[BEAM-8374] Fixes bug in SnsIO PublishResultCoder
URL: https://github.com/apache/beam/pull/9758#discussion_r337844279
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sns/PublishResultBuilder.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws.sns;
+
+import com.amazonaws.ResponseMetadata;
+import com.amazonaws.http.HttpResponse;
+import com.amazonaws.http.SdkHttpMetadata;
+import com.amazonaws.services.sns.model.PublishResult;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+public class PublishResultBuilder {
 
 Review comment:
   Shouldn't build this class using auto service (@AutoValue), so it can be 
consistent with Beam model? @iemejia and @lukecwik what do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332399)
Time Spent: 2h 20m  (was: 2h 10m)

> PublishResult returned by SnsIO is missing sdkResponseMetadata and 
> sdkHttpMetadata
> --
>
> Key: BEAM-8374
> URL: https://issues.apache.org/jira/browse/BEAM-8374
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently the PublishResultCoder in SnsIO only serializes the messageId field 
> so the PublishResult returned by Beam returns null for 
> getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible 
> to check the HTTP status for errors, which is necessary since this is not 
> handled in SnsIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=332372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332372
 ]

ASF GitHub Bot logged work on BEAM-8374:


Author: ASF GitHub Bot
Created on: 23/Oct/19 01:52
Start Date: 23/Oct/19 01:52
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9758: [BEAM-8374] Fixes bug 
in SnsIO PublishResultCoder
URL: https://github.com/apache/beam/pull/9758#issuecomment-545227828
 
 
   @iemejia @lukecwik How does this look? If you're happy with it I'll port it 
over to `amazon-web-services2`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332372)
Time Spent: 2h 10m  (was: 2h)

> PublishResult returned by SnsIO is missing sdkResponseMetadata and 
> sdkHttpMetadata
> --
>
> Key: BEAM-8374
> URL: https://issues.apache.org/jira/browse/BEAM-8374
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently the PublishResultCoder in SnsIO only serializes the messageId field 
> so the PublishResult returned by Beam returns null for 
> getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible 
> to check the HTTP status for errors, which is necessary since this is not 
> handled in SnsIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-8457:

Priority: Blocker  (was: Major)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-8457:

Fix Version/s: 2.17.0

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332355
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 23/Oct/19 01:15
Start Date: 23/Oct/19 01:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337813433
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +405,46 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
 
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
+  runner_in_use = runner
+  options_in_use = options
+elif not runner and options:
+  raise ValueError('Parameter runner is not given when parameter options '
+   'is given.')
+elif not options and runner:
+  raise ValueError('Parameter options is not given when parameter runner '
+   'is given.')
 # When possible, invoke a round trip through the runner API.
 if test_runner_api and self._verify_runner_api_compatible():
   return Pipeline.from_runner_api(
   self.to_runner_api(use_fake_coders=True),
-  self.runner,
-  self._options).run(False)
+  runner_in_use,
+  options_in_use,
+  interactive=self.interactive).run(False)
 
 Review comment:
   Did you find that this was necessary? I don't think we should change the 
signature of the `from_runner_api` call. The pipeline protobuf should contain 
all the necessary information... Though I'd defer to @robertwb on this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332355)
Time Spent: 1h 10m  (was: 1h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=332346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332346
 ]

ASF GitHub Bot logged work on BEAM-876:
---

Author: ASF GitHub Bot
Created on: 23/Oct/19 00:39
Start Date: 23/Oct/19 00:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9524: [BEAM-876] Support 
schemaUpdateOption in BigQueryIO
URL: https://github.com/apache/beam/pull/9524#issuecomment-545213360
 
 
   thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332346)
Time Spent: 2h  (was: 1h 50m)

> Support schemaUpdateOption in BigQueryIO
> 
>
> Key: BEAM-876
> URL: https://issues.apache.org/jira/browse/BEAM-876
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Eugene Kirpichov
>Assignee: canaan silberberg
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> BigQuery recently added support for updating the schema as a side effect of 
> the load job.
> Here is the relevant API method in JobConfigurationLoad: 
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List)
> BigQueryIO should support this too. See user request for this: 
> http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332345
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:38
Start Date: 23/Oct/19 00:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545213181
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332345)
Time Spent: 1h 50m  (was: 1h 40m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=332342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332342
 ]

ASF GitHub Bot logged work on BEAM-876:
---

Author: ASF GitHub Bot
Created on: 23/Oct/19 00:34
Start Date: 23/Oct/19 00:34
Worklog Time Spent: 10m 
  Work Description: ziel commented on issue #9524: [BEAM-876] Support 
schemaUpdateOption in BigQueryIO
URL: https://github.com/apache/beam/pull/9524#issuecomment-545212481
 
 
   I haven't had a chance to write the integration test yet... but was hoping 
to take a shot at it in the next week or so.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332342)
Time Spent: 1h 50m  (was: 1h 40m)

> Support schemaUpdateOption in BigQueryIO
> 
>
> Key: BEAM-876
> URL: https://issues.apache.org/jira/browse/BEAM-876
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Eugene Kirpichov
>Assignee: canaan silberberg
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> BigQuery recently added support for updating the schema as a side effect of 
> the load job.
> Here is the relevant API method in JobConfigurationLoad: 
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List)
> BigQueryIO should support this too. See user request for this: 
> http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332340
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:33
Start Date: 23/Oct/19 00:33
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-545212207
 
 
   > LMK if you'd like me to take another look.
   
   oh, yeah please do.  I don't have much more from my end other than renaming 
the class above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332340)
Time Spent: 1.5h  (was: 1h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332337
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:31
Start Date: 23/Oct/19 00:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9665: [BEAM-2879] Support 
writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-545211844
 
 
   LMK if you'd like me to take another look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332337)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=332336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332336
 ]

ASF GitHub Bot logged work on BEAM-876:
---

Author: ASF GitHub Bot
Created on: 23/Oct/19 00:29
Start Date: 23/Oct/19 00:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9524: [BEAM-876] Support 
schemaUpdateOption in BigQueryIO
URL: https://github.com/apache/beam/pull/9524#issuecomment-545211563
 
 
   Should I review once more?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332336)
Time Spent: 1h 40m  (was: 1.5h)

> Support schemaUpdateOption in BigQueryIO
> 
>
> Key: BEAM-876
> URL: https://issues.apache.org/jira/browse/BEAM-876
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Eugene Kirpichov
>Assignee: canaan silberberg
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> BigQuery recently added support for updating the schema as a side effect of 
> the load job.
> Here is the relevant API method in JobConfigurationLoad: 
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List)
> BigQueryIO should support this too. See user request for this: 
> http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=332335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332335
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:29
Start Date: 23/Oct/19 00:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9685: [BEAM-7765] - Add 
test for snippet accessing_valueprovider_info_after_run
URL: https://github.com/apache/beam/pull/9685#issuecomment-545211504
 
 
   Thanks for adding the snippets. There's only one more issue, with lints:
   ```
   01:02:12 > Task :sdks:python:test-suites:tox:py2:lintPy27
   01:02:12 * Module apache_beam.examples.snippets.snippets_test
   01:02:12 C:1281, 0: Line too long (97/80) (line-too-long)
   01:02:12 C:1306, 0: Wrong hanging indentation (add 7 spaces).
   01:02:12 LogValueProvidersFn(my_options.string_value)))
   01:02:12 ^  | (bad-continuation)
   01:02:12 C:1315, 0: Line too long (87/80) (line-too-long)
   01:02:12 C:1317, 0: Line too long (82/80) (line-too-long)
   01:02:12 C:1318, 0: Trailing whitespace (trailing-whitespace)
   01:02:12 
   01:02:12 
   01:02:12 Your code has been rated at 10.00/10 (previous run: 10.00/10, -0.00)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332335)
Time Spent: 2h 10m  (was: 2h)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332333
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:26
Start Date: 23/Oct/19 00:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545211008
 
 
   The build passed, but then timed out: 
https://builds.apache.org/job/beam_PostCommit_Python37_PR/44/console
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332333)
Time Spent: 1h 40m  (was: 1.5h)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332328
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:16
Start Date: 23/Oct/19 00:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545208657
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332328)
Time Spent: 1.5h  (was: 1h 20m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332302
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 22/Oct/19 23:59
Start Date: 22/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-545204985
 
 
   D'oh my bad, we _do_ have control over `PubsubMessage` :man_facepalming: I 
assumed it was part of the pubsub client library. Yeah I vote we use 
`DefaultSchema` with either `JavaBeanSchema` or `JavaFieldSchema`, whichever 
works with fewer changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332302)
Time Spent: 7h 40m  (was: 7.5h)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-22 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-8418.
-
Fix Version/s: 2.17.0
 Assignee: Valentyn Tymofieiev  (was: Robert Bradshaw)
   Resolution: Fixed

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-22 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957438#comment-16957438
 ] 

Valentyn Tymofieiev commented on BEAM-8418:
---

This issue was affecting non-Fn codepath only in Dataflow runner, and this 
codepath was fixed in PR #9822, I'll close this for now. 
Side note - if we are not running ValidatesRunner tests for Dataflow under 
FnAPI, we should add them.

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8196) Python 3.{5,7} post commit timed out at 100 minutes

2019-10-22 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8196:

Summary: Python 3.{5,7} post commit timed out at 100 minutes  (was: Python 
3.5 post commit timed out at 100 minutes)

> Python 3.{5,7} post commit timed out at 100 minutes
> ---
>
> Key: BEAM-8196
> URL: https://issues.apache.org/jira/browse/BEAM-8196
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_Python35/435/
> This post commit took 100 minutes and timedout. Should we increase the 
> timeout? We can also look into why this postcommit was slow. A later post 
> commit (https://builds.apache.org/job/beam_PostCommit_Python35/437/) 
> completed in 66 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8196) Python 3.5 post commit timed out at 100 minutes

2019-10-22 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957432#comment-16957432
 ] 

Udi Meiri commented on BEAM-8196:
-

This is still happening, very frequently now for 3.7 postcommits.
I investigated 6 semmingly long-running jobs on the apache-beam-testing 
project, they all were running "Apache Beam Python 3.7 SDK 2.17.0.dev" and all 
were showing the
"ModuleNotFoundError: No module named 'endpoints_pb2'".
One of these is still running after 16 hours. The rest failed after 1-2 hours. 
Perhaps the 16 hour one did not abort because it is a streaming job?


> Python 3.5 post commit timed out at 100 minutes
> ---
>
> Key: BEAM-8196
> URL: https://issues.apache.org/jira/browse/BEAM-8196
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_Python35/435/
> This post commit took 100 minutes and timedout. Should we increase the 
> timeout? We can also look into why this postcommit was slow. A later post 
> commit (https://builds.apache.org/job/beam_PostCommit_Python35/437/) 
> completed in 66 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3922) beam_PostCommit_Python_Verify is broken

2019-10-22 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957426#comment-16957426
 ] 

Udi Meiri commented on BEAM-3922:
-

Should this issue be closed?

> beam_PostCommit_Python_Verify is broken
> ---
>
> Key: BEAM-3922
> URL: https://issues.apache.org/jira/browse/BEAM-3922
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Priority: Major
>
> Jenkins 
>  
> [beam_PostCommit_Python_Verify|https://builds.apache.org/job/beam_PostCommit_Python_Verify/]
>  is broken since Mar 21. 
> From the [console 
> log|https://builds.apache.org/job/beam_PostCommit_Python_Verify/4490/consoleFull]:
> {code}
> ==
> ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 812, in run
> test(orig)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 45, in __call__
> return self.run(*arg, **kwarg)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 133, in run
> self.runTest(result)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 151, in runTest
> test(result)
>   File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
> return self.run(*args, **kwds)
>   File "/usr/lib/python2.7/unittest/case.py", line 331, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 66, in test_wordcount_it
> wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 115, in run
> result = p.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 389, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 402, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 57, in run_pipeline
> self.result.wait_until_finish()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1071, in wait_until_finish
> time.sleep(5.0)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 276, in signalhandler
> raise TimedOutException()
> TimedOutException: 'test_wordcount_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)'
> {code}
> Looks like wordcount pipeline didn't finish after 900s (set from command 
> --process-timeout=900) and test failed in timeout. Generally this test should 
> finish in 10min, so probably something wrong in the pipeline.
> One failure pipeline link (found in console log):
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-23_08_28_06-8460792149394878073?project=apache-beam-testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=332291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332291
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Oct/19 23:34
Start Date: 22/Oct/19 23:34
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#issuecomment-545199834
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332291)
Time Spent: 9h  (was: 8h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332279
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 23:05
Start Date: 22/Oct/19 23:05
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337787464
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,15 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-notebook if pipeline is initiated from interactive runner.
+from apache_beam.runners.interactive import interactive_runner
+if isinstance(pipeline.runner, interactive_runner.InteractiveRunner):
 
 Review comment:
   I've missed the path where a new Pipeline is created and `run()` is invoked 
again.
   Yes, all of these would be possible.
   I've added an `interactive` parameter at the constructor level for 
`Pipeline` using default value `None`. `run()` and `from_runner_api()` will 
pass the `None` or `bool` value down no matter how the user chains the runners. 
I'm not very confident with the naming but the change should be backward 
compatible for Beam.
   
   Currently, I'm running into a problem when testing. Once I set `labels`, 
Dataflow job will fail immediately and throw `Error processing pipeline.` 
error. There will be no job graph, no worker started, no logs. Looks like when 
there is user label in the job request, Dataflow cannot convert the work item 
into internal representation.
   
   I'll do some investigation and figure out why.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332279)
Time Spent: 1h  (was: 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332274
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 22:58
Start Date: 22/Oct/19 22:58
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337785938
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +396,40 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
+
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
 
 Review comment:
   You're right! This will surprise the user. I've changed it to throw error if 
either is not provided instead of ignoring the input by default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332274)
Time Spent: 50m  (was: 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332271
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 22/Oct/19 22:52
Start Date: 22/Oct/19 22:52
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-545189390
 
 
   R: @aaltay 
   The tests have passed and Sam has completed his review. Do you have any 
other comments for this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332271)
Time Spent: 8h 50m  (was: 8h 40m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332269
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 22:38
Start Date: 22/Oct/19 22:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545170744
 
 
   First instance of job running:
   https://builds.apache.org/job/beam_PostCommit_Python37_PR/42/
   Next:
   https://builds.apache.org/job/beam_PostCommit_Python37_PR/43/
   Next:
   https://builds.apache.org/job/beam_PostCommit_Python37_PR/44/
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332269)
Time Spent: 1h 20m  (was: 1h 10m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332268
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 22:38
Start Date: 22/Oct/19 22:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545185940
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332268)
Time Spent: 1h 10m  (was: 1h)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8456) BigQuery to Beam SQL timestamp has the wrong default: truncation makes the most sense

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8456?focusedWorklogId=332264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332264
 ]

ASF GitHub Bot logged work on BEAM-8456:


Author: ASF GitHub Bot
Created on: 22/Oct/19 22:27
Start Date: 22/Oct/19 22:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9849: [BEAM-8456] Add 
pipeline option to have Data Catalog truncate sub-millisecond precision
URL: https://github.com/apache/beam/pull/9849#issuecomment-545182789
 
 
   OK PTAL as-is. I experimented with inlining and I liked the result less.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332264)
Time Spent: 1h  (was: 50m)

> BigQuery to Beam SQL timestamp has the wrong default: truncation makes the 
> most sense
> -
>
> Key: BEAM-8456
> URL: https://issues.apache.org/jira/browse/BEAM-8456
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Most of the time, a user reading a timestamp from BigQuery with 
> higher-than-millisecond precision timestamps may not even realize that the 
> data source created these high precision timestamps. They are probably 
> timestamps on log entries generated by a system with higher precision.
> If they are using it with Beam SQL, which only supports millisecond 
> precision, it makes sense to "just work" by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332255
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:54
Start Date: 22/Oct/19 21:54
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-545172864
 
 
   >  If it were a class that we had control over we could use the 
DefaultSchema annotation, as long as one of the included SchemaProvider 
implementations would work (I think JavaBeanSchema is the closest but wouldn't 
work because PubsubMessage doesn't have setters).
   
   It may reduce our overall technical debt if we just implement the full set 
of setters and getters on `PubsubMessage` and use `DefaultSchema`.  It doesn't 
seem like it would be a bad thing.  Who would have an opinion on that?
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332255)
Time Spent: 7.5h  (was: 7h 20m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8420) Beam Model sources (classifier) jar don't contain sources

2019-10-22 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8420:
--
Description: These packages are generated from proto files. The jars only 
contain META-INF/MANIFEST.MF. They should either not be distributed or should 
contain the source proto files.

> Beam Model sources (classifier) jar don't contain sources
> -
>
> Key: BEAM-8420
> URL: https://issues.apache.org/jira/browse/BEAM-8420
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> These packages are generated from proto files. The jars only contain 
> META-INF/MANIFEST.MF. They should either not be distributed or should contain 
> the source proto files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8420) Beam Model sources (classifier) jar don't contain sources

2019-10-22 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8420:
--
Status: Open  (was: Triage Needed)

> Beam Model sources (classifier) jar don't contain sources
> -
>
> Key: BEAM-8420
> URL: https://issues.apache.org/jira/browse/BEAM-8420
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
>Reporter: Romain Manni-Bucau
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332254
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:48
Start Date: 22/Oct/19 21:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545170780
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332254)
Time Spent: 1h  (was: 50m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332253
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:48
Start Date: 22/Oct/19 21:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545170744
 
 
   First instance of job running:
   https://builds.apache.org/job/beam_PostCommit_Python37_PR/42/
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332253)
Time Spent: 50m  (was: 40m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=332249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332249
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:42
Start Date: 22/Oct/19 21:42
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9811: [BEAM-8402] Create a 
class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#issuecomment-545168831
 
 
   > I think Robert's idea is to specify a set of required runtime 
dependencies, in the form of "Java with Beam 2.16.0 with libraries XY"; and 
then let Beam create an environment that fits these requirements. 
   
   That's how I was reading it as well.  While that's potentially more user 
friendly, it also seems quite challenging.  If I request "Java with Beam 
2.16.0", and the runner happens to be using Java, it seems unsafe to assume 
that an embedded process is a valid replacement for docker, not in the least 
because I may have my own custom docker container for Beam-Java.   That said, 
it's still a very interesting idea.  Is there a ticket for this idea yet?
   
   Since that idea seems like it could take awhile to be designed and executed, 
I'm hoping that we can move forward with this PR in the meantime.   I think 
that the progress that we make on this and the subsequent work on the semantics 
of assigning environments to transforms (possibly including 
[BEAM-7850](https://issues.apache.org/jira/browse/BEAM-7850)), would be useful 
in any case.  Any hey, we can always mark it as experimental :)
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332249)
Time Spent: 1h 10m  (was: 1h)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332245
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:41
Start Date: 22/Oct/19 21:41
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337763913
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING. If the stream is paused, this
+  // will unpause the stream at the specified playback speed.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then pauses the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+// The state of the InteractiveService. The default state is STOPPED.
+enum State {
+  // The InteractiveService is not replaying. Goes to RUNNING with a
+  // StartRequest.
+  STOPPED = 0;
+
+  // The InteractiveService is replaying events. Goes to PAUSED with a
+  // PauseRequest. Goes to STOPPED with a StopRequest.
+  RUNNING = 1;
+
+  // The InteractiveService is paused from replaying events. Goes to RUNNING
+  // with either a StartRequest or a StepRequest. Goes to STOPPED with a
 
 Review comment:
   Gotcha, I rewrote the state comments to be very explicit about the state 
machine and what happens at each state.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332245)
Time Spent: 10.5h  (was: 10h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332247
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:41
Start Date: 22/Oct/19 21:41
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337764017
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING. If the stream is paused, this
+  // will unpause the stream at the specified playback speed.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then pauses the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+// The state of the InteractiveService. The default state is STOPPED.
+enum State {
+  // The InteractiveService is not replaying. Goes to RUNNING with a
+  // StartRequest.
+  STOPPED = 0;
+
+  // The InteractiveService is replaying events. Goes to PAUSED with a
+  // PauseRequest. Goes to STOPPED with a StopRequest.
+  RUNNING = 1;
+
+  // The InteractiveService is paused from replaying events. Goes to RUNNING
+  // with either a StartRequest or a StepRequest. Goes to STOPPED with a
+  // StopRequest.
+  PAUSED = 2;
+
+  // The InteractiveService is stepping through a single event. Will move to
+  // PAUSED after quiescence.
+  STEPPING = 3;
+
+
+  // The InteractiveService is advancing until a specified duration is reached.
+  // Will move to PAUSED after the stream advances sufficiently.
+  ADVANCING = 4;
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+}
+message StartResponse { }
+
+message AdvanceRequest {
+  // (Required) Will advance the stream by replaying events as quickly as
+  // possible until the stream timestamp has advanced by the specified amount.
+  google.protobuf.Duration advance_by = 1;
+}
+message AdvanceResponse {}
+
+message StopRequest { }
+message StopResponse { }
+
+message PauseRequest { }
+message PauseResponse {
+  // The current timestamp of the replay stream.
+  google.protobuf.Timestamp stream_time = 1;
+
+  

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332246
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:41
Start Date: 22/Oct/19 21:41
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337763937
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING. If the stream is paused, this
+  // will unpause the stream at the specified playback speed.
+  rpc Start (StartRequest) returns (StartResponse) {}
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332246)
Time Spent: 10h 40m  (was: 10.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332238
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:31
Start Date: 22/Oct/19 21:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9771: [BEAM-7926] Update 
dependencies in Java Katas
URL: https://github.com/apache/beam/pull/9771#issuecomment-545164821
 
 
   Hi @leonardoam ! Thanks for the PR. Have to tested that this combination of 
dependencies works?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332238)
Time Spent: 8h 40m  (was: 8.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=332234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332234
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:30
Start Date: 22/Oct/19 21:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-545164472
 
 
   taking a look
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332234)
Time Spent: 4h 10m  (was: 4h)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=332232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332232
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:28
Start Date: 22/Oct/19 21:28
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#issuecomment-545163564
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332232)
Time Spent: 8h 50m  (was: 8h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8438) Update Python/Streaming IO Documentation

2019-10-22 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8438:
--
Status: Open  (was: Triage Needed)

> Update Python/Streaming IO Documentation
> 
>
> Key: BEAM-8438
> URL: https://issues.apache.org/jira/browse/BEAM-8438
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Brian Hulette
>Priority: Major
>
> Built-in IO documentation states that Python/Streaming only supports pubsub 
> and BQ, which is out of date.
> https://beam.apache.org/documentation/io/built-in/
> This came up on 
> [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8447) Python: Expand Datastore IT

2019-10-22 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8447:
--
Status: Open  (was: Triage Needed)

> Python: Expand Datastore IT
> ---
>
> Key: BEAM-8447
> URL: https://issues.apache.org/jira/browse/BEAM-8447
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Udi Meiri
>Priority: Major
>
> The current datastore_write_it_test only counts entities without verifying 
> their expected contents. Like for PubSub and BigQuery, there should be a 
> datastore_matcher.py (with test) that lets us test things like embedded keys 
> and entities (https://github.com/apache/beam/pull/9805).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8457:
--
Status: Open  (was: Triage Needed)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332231
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:23
Start Date: 22/Oct/19 21:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337757120
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,15 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-notebook if pipeline is initiated from interactive runner.
+from apache_beam.runners.interactive import interactive_runner
+if isinstance(pipeline.runner, interactive_runner.InteractiveRunner):
 
 Review comment:
   This seems fine - but what if we go with the `runner_in_use` codepath? Would 
users do: `p.run(runner=InteractiveRunner(DataflowRunner()), options=...)`? Or 
would users create a pipeline with InteractiveRunner and then do 
`p.run(runner=DataflowRunner()...`? Is it poissible for users to do `p = 
beam.Pipeline()`, and then do 
`InteractiveRunner().run_pipeline(p)`/`InteractiveRunner(DataflowRunner()).run_pipeline(p)`?
   
   IIUC users would have to pass the interactive runner in `p = 
beam.Pipeline()` to activate the interactive mode, right? InteractiveRunner is 
not automatically selected?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332231)
Time Spent: 40m  (was: 0.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332230
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:23
Start Date: 22/Oct/19 21:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337751395
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +396,40 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
+
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
 
 Review comment:
   What if either runner or options are not provided? Should that throw an 
error? Currently, if only one is provided, it'll be ignored - and that would be 
quite surprising for users.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332230)
Time Spent: 0.5h  (was: 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332226
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:18
Start Date: 22/Oct/19 21:18
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337753713
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING. If the stream is paused, this
+  // will unpause the stream at the specified playback speed.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then pauses the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+// The state of the InteractiveService. The default state is STOPPED.
+enum State {
+  // The InteractiveService is not replaying. Goes to RUNNING with a
+  // StartRequest.
+  STOPPED = 0;
+
+  // The InteractiveService is replaying events. Goes to PAUSED with a
+  // PauseRequest. Goes to STOPPED with a StopRequest.
+  RUNNING = 1;
+
+  // The InteractiveService is paused from replaying events. Goes to RUNNING
+  // with either a StartRequest or a StepRequest. Goes to STOPPED with a
 
 Review comment:
   Does it go to STEPPING (instead of RUNNING) with a StepRequest?
   Also, does it go to ADVANCING with an AdvanceRequest?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332226)
Time Spent: 10h  (was: 9h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332227
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:18
Start Date: 22/Oct/19 21:18
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337754373
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING. If the stream is paused, this
+  // will unpause the stream at the specified playback speed.
+  rpc Start (StartRequest) returns (StartResponse) {}
 
 Review comment:
   nit: remove space between method name and open parenthesis for all the rpc 
declarations?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332227)
Time Spent: 10h 10m  (was: 10h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332228
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:18
Start Date: 22/Oct/19 21:18
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337754649
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING. If the stream is paused, this
+  // will unpause the stream at the specified playback speed.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then pauses the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+// The state of the InteractiveService. The default state is STOPPED.
+enum State {
+  // The InteractiveService is not replaying. Goes to RUNNING with a
+  // StartRequest.
+  STOPPED = 0;
+
+  // The InteractiveService is replaying events. Goes to PAUSED with a
+  // PauseRequest. Goes to STOPPED with a StopRequest.
+  RUNNING = 1;
+
+  // The InteractiveService is paused from replaying events. Goes to RUNNING
+  // with either a StartRequest or a StepRequest. Goes to STOPPED with a
+  // StopRequest.
+  PAUSED = 2;
+
+  // The InteractiveService is stepping through a single event. Will move to
+  // PAUSED after quiescence.
+  STEPPING = 3;
+
+
+  // The InteractiveService is advancing until a specified duration is reached.
+  // Will move to PAUSED after the stream advances sufficiently.
+  ADVANCING = 4;
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+}
+message StartResponse { }
+
+message AdvanceRequest {
+  // (Required) Will advance the stream by replaying events as quickly as
+  // possible until the stream timestamp has advanced by the specified amount.
+  google.protobuf.Duration advance_by = 1;
+}
+message AdvanceResponse {}
+
+message StopRequest { }
+message StopResponse { }
+
+message PauseRequest { }
+message PauseResponse {
+  // The current timestamp of the replay stream.
+  google.protobuf.Timestamp stream_time = 1;
+
+  // 

[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332225
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:10
Start Date: 22/Oct/19 21:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545157196
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332225)
Time Spent: 40m  (was: 0.5h)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332224
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:09
Start Date: 22/Oct/19 21:09
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337751597
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,136 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+STRING_TO_API_STATE = {
+'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED,
+'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED,
+'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING,
+}
 
 Review comment:
   Thanks, added a comment to the proto saying the initial state is STOPPED.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332224)
Time Spent: 9h 50m  (was: 9h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332217
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:52
Start Date: 22/Oct/19 20:52
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545149669
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332217)
Time Spent: 0.5h  (was: 20m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332216
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:48
Start Date: 22/Oct/19 20:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying 
BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-545147710
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332216)
Time Spent: 20m  (was: 10m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332215
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:47
Start Date: 22/Oct/19 20:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9855: [BEAM-8446] 
Retrying BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332213
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:45
Start Date: 22/Oct/19 20:45
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854
 
 
   1. Changed the pipeline.run() API to allow a runner and an option
   parameter so that a pipeline initially bundled w/ an interactive runner
   can be directly run by other runners from notebook.
   2. Implicitly added the necessary source information through user labels
   when the user does p.run(runner=DataflowRunner(), options=options) or
   DataflowRunner().run_pipeline(p, options).
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332214
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:45
Start Date: 22/Oct/19 20:45
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9854: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-545146836
 
 
   R: @pabloem 
   PTAL, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332214)
Time Spent: 20m  (was: 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332194
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:37
Start Date: 22/Oct/19 20:37
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337737427
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING.
 
 Review comment:
   Ack, I changed the Start comment to reflect that it can also unpause.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332194)
Time Spent: 9h 40m  (was: 9.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332193
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:37
Start Date: 22/Oct/19 20:37
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337737375
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,136 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+STRING_TO_API_STATE = {
+'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED,
+'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED,
+'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING,
+}
 
 Review comment:
   What is the initial state? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332193)
Time Spent: 9.5h  (was: 9h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332192
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:36
Start Date: 22/Oct/19 20:36
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337737214
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+}
+message StartResponse { }
+
+message AdvanceRequest {
+  // (Required) Will advance the stream by replaying events as quickly as
+  // possible until the stream timestamp has advanced by the specified amount.
+  google.protobuf.Duration advance_by = 1;
+}
+message AdvanceResponse {}
+
+message StopRequest { }
+message StopResponse { }
+
+message PauseRequest { }
+message PauseResponse { }
+
+message StatusRequest { }
+message StatusResponse {
+  // The current timestamp of the replay stream. Is MIN_TIMESTAMP when state
+  // is STOPPED.
+  google.protobuf.Timestamp stream_time = 1;
+
+  // The minimum watermark across all of the faked replayable unbounded 
sources.
+  // Is MIN_TIMESTAMP when state is STOPPED.
+  google.protobuf.Timestamp watermark = 2;
+
+  // The latest timestamp of the recording stream. Is MIN_TIMESTAMP if there is
+  // no recording.
+  google.protobuf.Timestamp recording_time = 3;
+
+  // The set playback_speed from the StartRequest. Playback speed is set by
+  // StartRequest, or if the stream_time is the current time and the recording
+  // is still happening, the playback speed is 1, else 0.
+  double playback_speed = 4;
+
+  enum State {
+// The InteractiveService is not replaying. Goes to RUNNING with a
+// StartRequest.
+STOPPED = 0;
+
+// The InteractiveService is replaying events. Goes to PAUSED with a
+// PauseRequest. Goes to STOPPED with 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332191
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:35
Start Date: 22/Oct/19 20:35
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337736773
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+}
+message StartResponse { }
+
+message AdvanceRequest {
+  // (Required) Will advance the stream by replaying events as quickly as
+  // possible until the stream timestamp has advanced by the specified amount.
+  google.protobuf.Duration advance_by = 1;
+}
+message AdvanceResponse {}
+
+message StopRequest { }
+message StopResponse { }
+
+message PauseRequest { }
+message PauseResponse { }
 
 Review comment:
   That makes sense, I also added the watermark to the response.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332191)
Time Spent: 9h 10m  (was: 9h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332190
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:34
Start Date: 22/Oct/19 20:34
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-545142275
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332190)
Time Spent: 8.5h  (was: 8h 20m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332187
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:31
Start Date: 22/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337733049
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
 
 Review comment:
   Ack.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332187)
Time Spent: 9h  (was: 8h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332186
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:31
Start Date: 22/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337731954
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING.
 
 Review comment:
   Please also add unpause.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332186)
Time Spent: 8h 50m  (was: 8h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332184
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:31
Start Date: 22/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337732796
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+}
+message StartResponse { }
+
+message AdvanceRequest {
+  // (Required) Will advance the stream by replaying events as quickly as
+  // possible until the stream timestamp has advanced by the specified amount.
+  google.protobuf.Duration advance_by = 1;
+}
+message AdvanceResponse {}
+
+message StopRequest { }
+message StopResponse { }
+
+message PauseRequest { }
+message PauseResponse { }
 
 Review comment:
   Does it make sense to include the stream timestamp in PauseResponse?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332184)
Time Spent: 8h 40m  (was: 8.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332183
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:31
Start Date: 22/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337734636
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest. It is also allowed for
+  // setting the playback_speed while RUNNING.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING then pauses the stream when the
+  // offset is reached.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+}
+message StartResponse { }
+
+message AdvanceRequest {
+  // (Required) Will advance the stream by replaying events as quickly as
+  // possible until the stream timestamp has advanced by the specified amount.
+  google.protobuf.Duration advance_by = 1;
+}
+message AdvanceResponse {}
+
+message StopRequest { }
+message StopResponse { }
+
+message PauseRequest { }
+message PauseResponse { }
+
+message StatusRequest { }
+message StatusResponse {
+  // The current timestamp of the replay stream. Is MIN_TIMESTAMP when state
+  // is STOPPED.
+  google.protobuf.Timestamp stream_time = 1;
+
+  // The minimum watermark across all of the faked replayable unbounded 
sources.
+  // Is MIN_TIMESTAMP when state is STOPPED.
+  google.protobuf.Timestamp watermark = 2;
+
+  // The latest timestamp of the recording stream. Is MIN_TIMESTAMP if there is
+  // no recording.
+  google.protobuf.Timestamp recording_time = 3;
+
+  // The set playback_speed from the StartRequest. Playback speed is set by
+  // StartRequest, or if the stream_time is the current time and the recording
+  // is still happening, the playback speed is 1, else 0.
+  double playback_speed = 4;
+
+  enum State {
+// The InteractiveService is not replaying. Goes to RUNNING with a
+// StartRequest.
+STOPPED = 0;
+
+// The InteractiveService is replaying events. Goes to PAUSED with a
+// PauseRequest. Goes to STOPPED with 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332185
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:31
Start Date: 22/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337733498
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
 
 Review comment:
   Thanks. I think we need to make the doc here better since Advance doesn't 
really "un-pause" since after Advance, the replay is still paused, but at a 
different timestamp.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332185)
Time Spent: 8h 50m  (was: 8h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332181
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:19
Start Date: 22/Oct/19 20:19
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-545135957
 
 
   > Right now I'm mostly thinking about the latter
   
   Agreed, that's what I'm thinking about too.
   
   > Maybe I'm thinking about this wrong, but I think the PubsubMessage is 
structured:
   
   Ah ok, fair. I was referring specifically to the structure (or lack thereof) 
of the byte array payload, but you're right the (Python SDK) user can handle 
creating a byte array themselves, and the row coder can just encode `{byte[] 
payload, Map attributes, String messageId}`
   
   > What are the requirements for registering a Row converter?
   
   ### Java
   There are a variety of ways to do it. If it were a class that we had control 
over we could use the 
[`DefaultSchema`](https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/schemas/DefaultSchema.html)
 annotation, as long as one of the included SchemaProvider implementations 
would work (I think JavaBeanSchema is the closest but wouldn't work because 
PubsubMessage doesn't have setters). I think what we'd want to do here is just 
implement a 
[`SchemaProvider`](https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/schemas/SchemaProvider.html)
 and a 
[`SchemaProviderRegistrar`](https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/schemas/SchemaProviderRegistrar.html)
 for `PubsubMessage` and include it in Beam.
   
   @reuvenlax may have a better suggestion.
   
   ### Python
   With my PR I think it could look like:
   ```python
   # this is py3 syntax for clarity, but we'd probably
   # need to use the TypedDict('PubsubMessage', ...) version
   class PubsubMessage(TypedDict):
  message: ByteString
  attributes: Mapping[unicode, unicode]
  messageId: unicode
   
   coders.registry.register_coder(PubsubMessage, coders.RowCoder)
   
   pcoll
 | 'make some messages' >> 
beam.Map(makeMessage).with_output_types(PubsubMessage)
 | 'write to pubsub' >> beam.io.WriteToPubsub(project, topic) # or something
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332181)
Time Spent: 7h 20m  (was: 7h 10m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=332179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332179
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:14
Start Date: 22/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332179)
Time Spent: 69h 50m  (was: 69h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 69h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=332177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332177
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:05
Start Date: 22/Oct/19 20:05
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9811: [BEAM-8402] Create a 
class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#issuecomment-545130446
 
 
   I think Robert's idea is to specify a set of required runtime dependencies, 
in the form of "Java with Beam 2.16.0 with libraries XY"; and then let Beam 
create an environment that fits these requirements. The question is whether we 
should really "bake in" the existing environments into the model itself. It 
does not have to be a contradiction because we could support "legacy" 
environments  (like the existing) and eventually replace them by the "smart" 
dynamic environments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332177)
Time Spent: 1h  (was: 50m)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to be able to run queries

2019-10-22 Thread Israel Herraiz (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Israel Herraiz updated BEAM-8458:
-
Summary: BigQueryIO.Read needs permissions to create datasets to be able to 
run queries  (was: BigQueryIO.Read needs permissions to create datasets to run 
queries)

> BigQueryIO.Read needs permissions to create datasets to be able to run queries
> --
>
> Key: BEAM-8458
> URL: https://issues.apache.org/jira/browse/BEAM-8458
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Israel Herraiz
>Assignee: Israel Herraiz
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the 
> results of the query.
> Therefore, Beam requires permissions to create datasets just to be able to 
> run a query. In practice, this means that Beam requires the role 
> bigQuery.User just to run queries, whereas if you use {{from}} (to read from 
> a table), the role bigQuery.jobUser suffices.
> BigQueryIO.Read should have an option to set an existing dataset  to write 
> the temp results of
>  a query, so it would be enough with having the role bigQuery.jobUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries

2019-10-22 Thread Israel Herraiz (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Israel Herraiz updated BEAM-8458:
-
Description: 
When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the 
results of the query.

Therefore, Beam requires permissions to create datasets just to be able to run 
a query. In practice, this means that Beam requires the role bigQuery.User just 
to run queries, whereas if you use {{from}} (to read from a table), the role 
bigQuery.jobUser suffices.

BigQueryIO.Read should have an option to set an existing dataset  to write the 
temp results of
 a query, so it would be enough with having the role bigQuery.jobUser.

  was:
When using `fromQuery`, BigQueryIO creates a temp dataset to store the results 
of the query.

Therefore, Beam requires permissions to create datasets just to be able to run 
a query. In practice, this means that Beam requires the role bigQuery.User just 
to run queries, whereas if you use `from` (to read from a table), the role 
bigQuery.jobUser suffices.

BigQueryIO.Read should have an option to set an existing dataset  to write the 
temp results of
a query, so it would be enough with having the role bigQuery.jobUser.


> BigQueryIO.Read needs permissions to create datasets to run queries
> ---
>
> Key: BEAM-8458
> URL: https://issues.apache.org/jira/browse/BEAM-8458
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Israel Herraiz
>Assignee: Israel Herraiz
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the 
> results of the query.
> Therefore, Beam requires permissions to create datasets just to be able to 
> run a query. In practice, this means that Beam requires the role 
> bigQuery.User just to run queries, whereas if you use {{from}} (to read from 
> a table), the role bigQuery.jobUser suffices.
> BigQueryIO.Read should have an option to set an existing dataset  to write 
> the temp results of
>  a query, so it would be enough with having the role bigQuery.jobUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries

2019-10-22 Thread Israel Herraiz (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Israel Herraiz reassigned BEAM-8458:


Assignee: Israel Herraiz

> BigQueryIO.Read needs permissions to create datasets to run queries
> ---
>
> Key: BEAM-8458
> URL: https://issues.apache.org/jira/browse/BEAM-8458
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Israel Herraiz
>Assignee: Israel Herraiz
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using `fromQuery`, BigQueryIO creates a temp dataset to store the 
> results of the query.
> Therefore, Beam requires permissions to create datasets just to be able to 
> run a query. In practice, this means that Beam requires the role 
> bigQuery.User just to run queries, whereas if you use `from` (to read from a 
> table), the role bigQuery.jobUser suffices.
> BigQueryIO.Read should have an option to set an existing dataset  to write 
> the temp results of
> a query, so it would be enough with having the role bigQuery.jobUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8458?focusedWorklogId=332166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332166
 ]

ASF GitHub Bot logged work on BEAM-8458:


Author: ASF GitHub Bot
Created on: 22/Oct/19 19:34
Start Date: 22/Oct/19 19:34
Worklog Time Spent: 10m 
  Work Description: iht commented on pull request #9852: [BEAM-8458] Add 
option to set temp dataset in BigQueryIO.Read
URL: https://github.com/apache/beam/pull/9852
 
 
   When using fromQuery, BigQueryIO creates a temp dataset to store the results 
of
   the query. Therefore, Beam requires permissions to create datasets just to be
   able to run a query. With this option, BigQueryIO can write the temp results 
of
   the query to a pre-existing dataset, and therefore it only needs permissions 
to
   run queries and create tables to be able to use from Query.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [X ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8458?focusedWorklogId=332167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332167
 ]

ASF GitHub Bot logged work on BEAM-8458:


Author: ASF GitHub Bot
Created on: 22/Oct/19 19:35
Start Date: 22/Oct/19 19:35
Worklog Time Spent: 10m 
  Work Description: iht commented on issue #9852: [BEAM-8458] Add option to 
set temp dataset in BigQueryIO.Read
URL: https://github.com/apache/beam/pull/9852#issuecomment-545119172
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332167)
Time Spent: 20m  (was: 10m)

> BigQueryIO.Read needs permissions to create datasets to run queries
> ---
>
> Key: BEAM-8458
> URL: https://issues.apache.org/jira/browse/BEAM-8458
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Israel Herraiz
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using `fromQuery`, BigQueryIO creates a temp dataset to store the 
> results of the query.
> Therefore, Beam requires permissions to create datasets just to be able to 
> run a query. In practice, this means that Beam requires the role 
> bigQuery.User just to run queries, whereas if you use `from` (to read from a 
> table), the role bigQuery.jobUser suffices.
> BigQueryIO.Read should have an option to set an existing dataset  to write 
> the temp results of
> a query, so it would be enough with having the role bigQuery.jobUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries

2019-10-22 Thread Israel Herraiz (Jira)
Israel Herraiz created BEAM-8458:


 Summary: BigQueryIO.Read needs permissions to create datasets to 
run queries
 Key: BEAM-8458
 URL: https://issues.apache.org/jira/browse/BEAM-8458
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Reporter: Israel Herraiz


When using `fromQuery`, BigQueryIO creates a temp dataset to store the results 
of the query.

Therefore, Beam requires permissions to create datasets just to be able to run 
a query. In practice, this means that Beam requires the role bigQuery.User just 
to run queries, whereas if you use `from` (to read from a table), the role 
bigQuery.jobUser suffices.

BigQueryIO.Read should have an option to set an existing dataset  to write the 
temp results of
a query, so it would be enough with having the role bigQuery.jobUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332162
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 19:24
Start Date: 22/Oct/19 19:24
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337705585
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
 
 Review comment:
   Ack, commented on the new protocol.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332162)
Time Spent: 8.5h  (was: 8h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332161
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 19:21
Start Date: 22/Oct/19 19:21
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337704421
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332161)
Time Spent: 8h 20m  (was: 8h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332160
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 19:20
Start Date: 22/Oct/19 19:20
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337703862
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+
+  // (Optional) if present, will start the stream at the specified timestamp.
+  google.protobuf.Timestamp start_at = 2;
 
 Review comment:
   Good point, removed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332160)
Time Spent: 8h 10m  (was: 8h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332159
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 19:19
Start Date: 22/Oct/19 19:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337703571
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
 
 Review comment:
   Proto3 doesn't allow for explicit default values :(
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332159)
Time Spent: 8h  (was: 7h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332151
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:54
Start Date: 22/Oct/19 18:54
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-545103766
 
 
   > Are you thinking you'd use beam:coder:row:v1 as the interface for the 
external transform, and the Java ExternalTransform implementations would handle 
the conversion of Row to/from PubsubMessage? 
   
   There are two places that I see  beam:coder:row:v1 being useful:
   
   1. as a way to declare the construction interface of an external transform, 
and encode its values.  A schema coder would replace the `configuration` 
mapping in `pipeline.ExternalConfiguration.ExternalConfigurationPayload` proto.
   2. as a coder for structured elements that are exchanged between sdks
   
   Right now I'm mostly thinking about the latter, which is when 
`PubsubMessage` comes into play.
   
   > There's no trivial way to register a converter between Row and 
PubsubMessage since the latter isn't structured, 
   
   Maybe I'm thinking about this wrong, but I think the `PubsubMessage` _is_ 
structured:
   
   ```java
   public class PubsubMessage {
   
 private byte[] message;
 private Map attributes;
 private String messageId;
   
 /** Returns the main PubSub message. */
 public byte[] getPayload() {
   return message;
 }
   
 /** Returns the full map of attributes. This is an unmodifiable map. */
 public Map getAttributeMap() {
   return attributes;
 }
   
 /** Returns the messageId of the message populated by Cloud Pub/Sub. */
 @Nullable
 public String getMessageId() {
   return messageId;
 }
   ```
   
   I'm not a Java expert by any means, but this seems like a type that would 
work with AutoValue, we just need to rename `message` to `payload` and 
`attributes` to `attributeMap`.
   
   What are the requirements for registering a Row converter?
   
   > but of course on the Java side we could have code to serialize the Row to 
a variety of formats to put in the PubsubMessage payload: Avro, JSON, or the 
row serialization format itself (although I'm not sure we'd want to encourage 
using that outside of Beam), would be pretty simple to add. 
   
   I think the payload is not a concern when it comes to portability of 
external transforms:  it gets encoded/decoded by another transform, not 
PubsubRead/Write.  We can just assume that's a byte array.
   
   My grasp on the Java side is a bit tenuous, so I'd like for @mxm to confirm 
or deny what I've written here.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332151)
Time Spent: 7h 10m  (was: 7h)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332152
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:54
Start Date: 22/Oct/19 18:54
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337692597
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
 
 Review comment:
   Should we just add [default = 1]?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332152)
Time Spent: 7h 50m  (was: 7h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332149
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:52
Start Date: 22/Oct/19 18:52
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337691444
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
+  rpc Pause (PauseRequest) returns (PauseResponse) {}
+
+  // Sends a single element to the EventsRequest then closes the stream.
+  rpc Step (StepRequest) returns (StepResponse) {}
+
+  // Responds with debugging and other cache-specific metadata.
+  rpc Status (StatusRequest) returns (StatusResponse) {}
+}
+
+message StartRequest {
+  // (Optional) How quickly the stream will be played back, e.g. if
+  // playback_speed == 2, then the stream will replay events twice as fast as
+  // they were recorded. If unspecified, this will default to 1.
+  double playback_speed = 1;
+
+  // (Optional) if present, will start the stream at the specified timestamp.
+  google.protobuf.Timestamp start_at = 2;
 
 Review comment:
   I'm not sure whether we should allow this feature since that means 
re-execution with start_at specified means different input data and it kinda 
defeats the purpose of replaying. Also, if we use Start method for also 
unpausing and changing playback speed, start_at either should be ignored or 
should return an error.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332149)
Time Spent: 7h 40m  (was: 7.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * 

[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332148
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:49
Start Date: 22/Oct/19 18:49
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-545101905
 
 
   I also plan to create the follow-up JIRAs after the merge and notify dev 
list.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332148)
Time Spent: 18.5h  (was: 18h 20m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332146
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:46
Start Date: 22/Oct/19 18:46
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-545100575
 
 
   I'll definitely squash the history to some compact isolated commits before 
merging. Thanks for the approval!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332146)
Time Spent: 18h 20m  (was: 18h 10m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332145
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:46
Start Date: 22/Oct/19 18:46
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337688439
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
 
 Review comment:
   I think it's okay for the Start method to also change the playback speed for 
a running replay. We just need to document that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332145)
Time Spent: 7.5h  (was: 7h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332143
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:46
Start Date: 22/Oct/19 18:46
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337688439
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
 
 Review comment:
   [couldn't reply to an outdated comment]
   I think it's okay for the Start method to also change the playback speed for 
a running replay. We just need to document that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332143)
Time Spent: 7h 20m  (was: 7h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332144
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:46
Start Date: 22/Oct/19 18:46
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337684609
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+import "google/protobuf/duration.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  // Starts the stream of events to the EventsRequest.
+  rpc Start (StartRequest) returns (StartResponse) {}
+
+  // Advances the stream to the specified offset then pauses the stream. This
+  // starts the stream if it is not RUNNING.
+  rpc Advance (AdvanceRequest) returns (AdvanceResponse) {}
+
+  // Stops and resets the stream to the beginning.
+  rpc Stop (StopRequest) returns (StopResponse) {}
+
+  // Pauses the stream of events to the EventsRequest. If there is already an
+  // outstanding EventsRequest streaming events, then the stream will pause
+  // after the EventsResponse is completed.
+  // To un-pause, send either a Start or Advance request.
 
 Review comment:
   If the playback is paused, then an Advance request is issued, does the 
playback resume at the previous speed after the Advance, or should it remain 
paused? I think the latter makes more sense.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332144)
Time Spent: 7h 20m  (was: 7h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332141
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:33
Start Date: 22/Oct/19 18:33
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337682581
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -94,6 +98,7 @@ message StatusResponse {
   // no recording.
   google.protobuf.Timestamp recording_time = 3;
 
+  // The set playback_speed from either the StartRequest or the AdvanceRequest.
 
 Review comment:
   Does it make sense for the  StartRequest to do that? I don't want to have 
too many methods on the API if possible. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332141)
Time Spent: 7h 10m  (was: 7h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332139
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:31
Start Date: 22/Oct/19 18:31
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337681814
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -81,7 +86,6 @@ message PauseResponse { }
 
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332139)
Time Spent: 7h  (was: 6h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-22 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957267#comment-16957267
 ] 

Robert Bradshaw commented on BEAM-8418:
---

Note that this is a Dataflow-only issue. The FnAPI doesn't have a way to 
specifying what element should be used for Impulse (and Dataflow should 
probably be updated accordingly). 

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332137
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:26
Start Date: 22/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-545092832
 
 
   > How close are we to having schema coders ready to use? IIUC, we should be 
able to register a converter between Row -> PubsubMessage, right? What's the 
mechanism for that? It would be great to put that to use here as soon as the 
schema coders are ready.
   
   @robertwb gave #9188 a LGTM, I just need to get CI passing and I think we 
can merge it. I'll work on that now.
   
   Are you thinking you'd use beam:coder:row:v1 as the interface for the 
external transform, and the Java ExternalTransform implementations would handle 
the conversion of Row to/from PubsubMessage? There's no trivial way to register 
a converter between Row and PubsubMessage since the latter isn't structured, 
but of course on the Java side we could have code to serialize the Row to a 
variety of formats to put in the PubsubMessage payload: Avro, JSON, or the row 
serialization format itself (although I'm not sure we'd want to encourage using 
that outside of Beam), would be pretty simple to add. Maybe the format to use 
could be part of the external transform payload.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332137)
Time Spent: 7h  (was: 6h 50m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332136
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:24
Start Date: 22/Oct/19 18:24
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337678175
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream_test.py
 ##
 @@ -0,0 +1,118 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import unittest
+
+import grpc
+from google.protobuf import timestamp_pb2
+
+from apache_beam import coders
+from apache_beam.portability.api import beam_interactive_api_pb2 as 
interactive_api
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc as 
interactive_api_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.caching.streaming_cache import 
StreamingCache
+from apache_beam.testing.interactive_stream import InteractiveStreamController
+
+
+def get_open_port():
+  import socket
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+  s.bind(('', 0))
+  s.listen(1)
+  port = s.getsockname()[1]
+  s.close()
+  return port
 
 Review comment:
   Awesome thanks for this. I didn't know about this functionality.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332136)
Time Spent: 6h 50m  (was: 6h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332134
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:21
Start Date: 22/Oct/19 18:21
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r337676820
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from pandas.io.json import json_normalize
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from timeloop import Timeloop
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
 
 Review comment:
   I've added such warning message in the interactive_environment module to 
check for Python version and print out warnings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332134)
Time Spent: 8h 20m  (was: 8h 10m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332132
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:18
Start Date: 22/Oct/19 18:18
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9190: [BEAM-7520] Fix 
timer firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-545089710
 
 
   My reading of the commit history is that there are many commits which are 
not meaningful by themselves. There may be some commits that stand alone. But I 
will squash these commits when I merge unless you would like to do your own 
custom squashing into some meaningful small set of commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332132)
Time Spent: 18h 10m  (was: 18h)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332130
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 22/Oct/19 18:17
Start Date: 22/Oct/19 18:17
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9190: 
[BEAM-7520] Fix timer firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#discussion_r337674421
 
 

 ##
 File path: 
runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerTest.java
 ##
 @@ -616,6 +649,186 @@ public void processElement(ProcessContext c) {
 p.run();
   }
 
+  /**
+   * Test running of {@link Pipeline} which has two {@link POutput POutputs} 
and finishing the first
+   * one triggers data being fed into the second one.
+   */
+  @Test(timeout = 1)
+  public void testTwoPOutputsInPipelineWithCascade() throws 
InterruptedException {
 
 Review comment:
   Got it. Makes sense.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332130)
Time Spent: 18h  (was: 17h 50m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332123
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 17:57
Start Date: 22/Oct/19 17:57
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337664333
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -94,6 +98,7 @@ message StatusResponse {
   // no recording.
   google.protobuf.Timestamp recording_time = 3;
 
+  // The set playback_speed from either the StartRequest or the AdvanceRequest.
 
 Review comment:
   playback speed is not set by AdvanceRequest. Playback speed is set by 
StartRequest, or if the stream_time is the current time and the recording is 
still happening, the playback speed is 1, else 0.
   
   Also, should we have a method that changes the playback speed without the 
user pausing and then starting?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332123)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332122
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 17:57
Start Date: 22/Oct/19 17:57
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] 
Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337664736
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -81,7 +86,6 @@ message PauseResponse { }
 
 
 Review comment:
   To un-pause, do we issue another StartRequest? If so, please document that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332122)
Time Spent: 6h 40m  (was: 6.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-10-22 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reopened BEAM-7594:
-

Found another instance of this:
{code}
05:35:15 ==
05:35:15 ERROR: test_read_from_text_with_file_name_file_pattern 
(apache_beam.io.textio_test.TextSourceTest)
05:35:15 --
05:35:15 Traceback (most recent call last):
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/io/textio_test.py",
 line 517, in test_read_from_text_with_file_name_file_pattern
05:35:15 pipeline.run()
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
 line 112, in run
05:35:15 else test_runner_api))
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 407, in run
05:35:15 self._options).run(False)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 420, in run
05:35:15 return self.runner.run_pipeline(self, self._options)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 129, in run_pipeline
05:35:15 return runner.run_pipeline(pipeline, options)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 381, in run_pipeline
05:35:15 default_environment=self._default_environment))
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 388, in run_via_runner_api
05:35:15 return self.run_stages(stage_context, stages)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 470, in run_stages
05:35:15 stage_context.safe_coders)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 753, in _run_stage
05:35:15 result, splits = bundle_manager.process_bundle(data_input, 
data_output)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1801, in process_bundle
05:35:15 part, expected_outputs), part_inputs):
05:35:15   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in 
result_iterator
05:35:15 yield fs.pop().result()
05:35:15   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in 
result
05:35:15 return self.__get_result()
05:35:15   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in 
__get_result
05:35:15 raise self._exception
05:35:15   File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in 
run
05:35:15 result = self.fn(*self.args, **self.kwargs)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1801, in 
05:35:15 part, expected_outputs), part_inputs):
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1737, in process_bundle
05:35:15 result_future = 
self._worker_handler.control_conn.push(process_bundle_req)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1150, in push
05:35:15 response = self.worker.do_instruction(request)
05:35:15   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 360, in do_instruction
05:35:15 request.instruction_id)
05:35:15   File 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332117
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 17:49
Start Date: 22/Oct/19 17:49
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337661353
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+STRING_TO_API_STATE = {
+'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED,
+'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED,
+'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING,
+}
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, endpoint, streaming_cache):
+self._endpoint = endpoint
+self._server = grpc.server(ThreadPoolExecutor(max_workers=2))
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._server.add_insecure_port(self._endpoint)
+self._server.start()
+
+self._streaming_cache = streaming_cache
+self._state = 'STOPPED'
+self._playback_speed = 1.0
+
+  def Start(self, request, context):
+"""Requests that the Service starts emitting elements.
+"""
+
+self._next_state('RUNNING')
+self._playback_speed = request.playback_speed or 1.0
+self._playback_speed = 1.0 / max(min(self._playback_speed, 100.0), 0.1)
+return beam_interactive_api_pb2.StartResponse()
+
+  def Stop(self, request, context):
+"""Requests that the Service stop emitting elements.
+"""
+self._next_state('STOPPED')
+return beam_interactive_api_pb2.StartResponse()
+
+  def Pause(self, request, context):
+"""Requests that the Service pause emitting elements.
+"""
+self._next_state('PAUSED')
+return beam_interactive_api_pb2.PauseResponse()
+
+  def Step(self, request, context):
+"""Requests that the Service emit a single element from each cached source.
+"""
+self._next_state('STEP')
+return beam_interactive_api_pb2.StepResponse()
+
+  def Status(self, request, context):
+"""Returns the status of the service.
+"""
+resp = beam_interactive_api_pb2.StatusResponse()
+resp.stream_time.GetCurrentTime()
+resp.state = STRING_TO_API_STATE[self._state]
+return resp
+
+  def _reset_state(self):
+self._reader = None
+self._playback_speed = 1.0
+self._state = 'STOPPED'
+
+  def _next_state(self, state):
+if self._state == 'STOPPED':
+  if state == 'RUNNING' or state == 'STEP':
+self._reader = self._streaming_cache.reader()
+elif self._state == 'RUNNING':
+  if state == 'STOPPED':
+self._reset_state()
+self._state = state
+
+  def Events(self, request, context):
+# The TestStream will wait until the stream starts.
+while self._state != 'RUNNING' and self._state != 'STEP':
+  time.sleep(0.01)
+
+events = self._reader.read()
+if events:
+  for e in events:
+# Here we assume that the first event is the processing_time_event so
+# that we can sleep and then emit the element. Thereby, trying to
+# emulate the original stream.
+if e.HasField('processing_time_event'):
+  sleep_duration = (
+  e.processing_time_event.advance_duration * self._playback_speed
+  ) * 10**-6
+  time.sleep(sleep_duration)
+yield beam_interactive_api_pb2.EventsResponse(events=[e])
+else:
+  resp = 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332116
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 22/Oct/19 17:49
Start Date: 22/Oct/19 17:49
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r337661276
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+STRING_TO_API_STATE = {
+'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED,
+'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED,
+'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING,
+}
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, endpoint, streaming_cache):
+self._endpoint = endpoint
+self._server = grpc.server(ThreadPoolExecutor(max_workers=2))
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._server.add_insecure_port(self._endpoint)
+self._server.start()
+
+self._streaming_cache = streaming_cache
+self._state = 'STOPPED'
+self._playback_speed = 1.0
+
+  def Start(self, request, context):
+"""Requests that the Service starts emitting elements.
+"""
+
+self._next_state('RUNNING')
+self._playback_speed = request.playback_speed or 1.0
+self._playback_speed = 1.0 / max(min(self._playback_speed, 100.0), 0.1)
+return beam_interactive_api_pb2.StartResponse()
+
+  def Stop(self, request, context):
+"""Requests that the Service stop emitting elements.
+"""
+self._next_state('STOPPED')
+return beam_interactive_api_pb2.StartResponse()
+
+  def Pause(self, request, context):
+"""Requests that the Service pause emitting elements.
+"""
+self._next_state('PAUSED')
+return beam_interactive_api_pb2.PauseResponse()
+
+  def Step(self, request, context):
+"""Requests that the Service emit a single element from each cached source.
+"""
+self._next_state('STEP')
+return beam_interactive_api_pb2.StepResponse()
+
+  def Status(self, request, context):
+"""Returns the status of the service.
+"""
+resp = beam_interactive_api_pb2.StatusResponse()
+resp.stream_time.GetCurrentTime()
+resp.state = STRING_TO_API_STATE[self._state]
+return resp
+
+  def _reset_state(self):
+self._reader = None
+self._playback_speed = 1.0
+self._state = 'STOPPED'
+
+  def _next_state(self, state):
+if self._state == 'STOPPED':
+  if state == 'RUNNING' or state == 'STEP':
+self._reader = self._streaming_cache.reader()
+elif self._state == 'RUNNING':
+  if state == 'STOPPED':
+self._reset_state()
+self._state = state
+
+  def Events(self, request, context):
+# The TestStream will wait until the stream starts.
+while self._state != 'RUNNING' and self._state != 'STEP':
+  time.sleep(0.01)
 
 Review comment:
   Ack, changed to 0.25. I want to balance the feeling of "snappiness" and not 
wasting the user's CPU here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332116)
Time Spent: 6h 20m  (was: 6h 10m)

> Add streaming support to Interactive Beam
> 

  1   2   >