[jira] [Updated] (BEAM-8420) Beam Model sources (classifier) jar don't contain sources
[ https://issues.apache.org/jira/browse/BEAM-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romain Manni-Bucau updated BEAM-8420: - Description: These packages are generated from proto files. The jars only contain META-INF/MANIFEST.MF. They should either not be distributed or should contain the generated java source files and optionally the proto files. (was: These packages are generated from proto files. The jars only contain META-INF/MANIFEST.MF. They should either not be distributed or should contain the source proto files.) > Beam Model sources (classifier) jar don't contain sources > - > > Key: BEAM-8420 > URL: https://issues.apache.org/jira/browse/BEAM-8420 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 >Reporter: Romain Manni-Bucau >Priority: Major > > These packages are generated from proto files. The jars only contain > META-INF/MANIFEST.MF. They should either not be distributed or should contain > the generated java source files and optionally the proto files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata
[ https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=332400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332400 ] ASF GitHub Bot logged work on BEAM-8374: Author: ASF GitHub Bot Created on: 23/Oct/19 04:24 Start Date: 23/Oct/19 04:24 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on pull request #9758: [BEAM-8374] Fixes bug in SnsIO PublishResultCoder URL: https://github.com/apache/beam/pull/9758#discussion_r337844279 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sns/PublishResultBuilder.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.sns; + +import com.amazonaws.ResponseMetadata; +import com.amazonaws.http.HttpResponse; +import com.amazonaws.http.SdkHttpMetadata; +import com.amazonaws.services.sns.model.PublishResult; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +public class PublishResultBuilder { Review comment: Shouldn't we build this class using auto service (@AutoValue), so it can be consistent with Beam model? @iemejia and @lukecwik what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332400) Time Spent: 2.5h (was: 2h 20m) > PublishResult returned by SnsIO is missing sdkResponseMetadata and > sdkHttpMetadata > -- > > Key: BEAM-8374 > URL: https://issues.apache.org/jira/browse/BEAM-8374 > Project: Beam > Issue Type: Bug > Components: io-java-aws >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently the PublishResultCoder in SnsIO only serializes the messageId field > so the PublishResult returned by Beam returns null for > getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible > to check the HTTP status for errors, which is necessary since this is not > handled in SnsIO. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata
[ https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=332399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332399 ] ASF GitHub Bot logged work on BEAM-8374: Author: ASF GitHub Bot Created on: 23/Oct/19 04:23 Start Date: 23/Oct/19 04:23 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on pull request #9758: [BEAM-8374] Fixes bug in SnsIO PublishResultCoder URL: https://github.com/apache/beam/pull/9758#discussion_r337844279 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sns/PublishResultBuilder.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.sns; + +import com.amazonaws.ResponseMetadata; +import com.amazonaws.http.HttpResponse; +import com.amazonaws.http.SdkHttpMetadata; +import com.amazonaws.services.sns.model.PublishResult; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +public class PublishResultBuilder { Review comment: Shouldn't build this class using auto service (@AutoValue), so it can be consistent with Beam model? @iemejia and @lukecwik what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332399) Time Spent: 2h 20m (was: 2h 10m) > PublishResult returned by SnsIO is missing sdkResponseMetadata and > sdkHttpMetadata > -- > > Key: BEAM-8374 > URL: https://issues.apache.org/jira/browse/BEAM-8374 > Project: Beam > Issue Type: Bug > Components: io-java-aws >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently the PublishResultCoder in SnsIO only serializes the messageId field > so the PublishResult returned by Beam returns null for > getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible > to check the HTTP status for errors, which is necessary since this is not > handled in SnsIO. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata
[ https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=332372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332372 ] ASF GitHub Bot logged work on BEAM-8374: Author: ASF GitHub Bot Created on: 23/Oct/19 01:52 Start Date: 23/Oct/19 01:52 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9758: [BEAM-8374] Fixes bug in SnsIO PublishResultCoder URL: https://github.com/apache/beam/pull/9758#issuecomment-545227828 @iemejia @lukecwik How does this look? If you're happy with it I'll port it over to `amazon-web-services2`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332372) Time Spent: 2h 10m (was: 2h) > PublishResult returned by SnsIO is missing sdkResponseMetadata and > sdkHttpMetadata > -- > > Key: BEAM-8374 > URL: https://issues.apache.org/jira/browse/BEAM-8374 > Project: Beam > Issue Type: Bug > Components: io-java-aws >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the PublishResultCoder in SnsIO only serializes the messageId field > so the PublishResult returned by Beam returns null for > getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible > to check the HTTP status for errors, which is necessary since this is not > handled in SnsIO. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-8457: Priority: Blocker (was: Major) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Blocker > Fix For: 2.17.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-8457: Fix Version/s: 2.17.0 > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332355 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 23/Oct/19 01:15 Start Date: 23/Oct/19 01:15 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337813433 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +405,46 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: + runner_in_use = runner + options_in_use = options +elif not runner and options: + raise ValueError('Parameter runner is not given when parameter options ' + 'is given.') +elif not options and runner: + raise ValueError('Parameter options is not given when parameter runner ' + 'is given.') # When possible, invoke a round trip through the runner API. if test_runner_api and self._verify_runner_api_compatible(): return Pipeline.from_runner_api( self.to_runner_api(use_fake_coders=True), - self.runner, - self._options).run(False) + runner_in_use, + options_in_use, + interactive=self.interactive).run(False) Review comment: Did you find that this was necessary? I don't think we should change the signature of the `from_runner_api` call. The pipeline protobuf should contain all the necessary information... Though I'd defer to @robertwb on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332355) Time Spent: 1h 10m (was: 1h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=332346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332346 ] ASF GitHub Bot logged work on BEAM-876: --- Author: ASF GitHub Bot Created on: 23/Oct/19 00:39 Start Date: 23/Oct/19 00:39 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9524: [BEAM-876] Support schemaUpdateOption in BigQueryIO URL: https://github.com/apache/beam/pull/9524#issuecomment-545213360 thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332346) Time Spent: 2h (was: 1h 50m) > Support schemaUpdateOption in BigQueryIO > > > Key: BEAM-876 > URL: https://issues.apache.org/jira/browse/BEAM-876 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Eugene Kirpichov >Assignee: canaan silberberg >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > BigQuery recently added support for updating the schema as a side effect of > the load job. > Here is the relevant API method in JobConfigurationLoad: > https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List) > BigQueryIO should support this too. See user request for this: > http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332345 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 23/Oct/19 00:38 Start Date: 23/Oct/19 00:38 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545213181 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332345) Time Spent: 1h 50m (was: 1h 40m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=332342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332342 ] ASF GitHub Bot logged work on BEAM-876: --- Author: ASF GitHub Bot Created on: 23/Oct/19 00:34 Start Date: 23/Oct/19 00:34 Worklog Time Spent: 10m Work Description: ziel commented on issue #9524: [BEAM-876] Support schemaUpdateOption in BigQueryIO URL: https://github.com/apache/beam/pull/9524#issuecomment-545212481 I haven't had a chance to write the integration test yet... but was hoping to take a shot at it in the next week or so. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332342) Time Spent: 1h 50m (was: 1h 40m) > Support schemaUpdateOption in BigQueryIO > > > Key: BEAM-876 > URL: https://issues.apache.org/jira/browse/BEAM-876 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Eugene Kirpichov >Assignee: canaan silberberg >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > BigQuery recently added support for updating the schema as a side effect of > the load job. > Here is the relevant API method in JobConfigurationLoad: > https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List) > BigQueryIO should support this too. See user request for this: > http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332340 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 23/Oct/19 00:33 Start Date: 23/Oct/19 00:33 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-545212207 > LMK if you'd like me to take another look. oh, yeah please do. I don't have much more from my end other than renaming the class above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332340) Time Spent: 1.5h (was: 1h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332337 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 23/Oct/19 00:31 Start Date: 23/Oct/19 00:31 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-545211844 LMK if you'd like me to take another look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332337) Time Spent: 1h 20m (was: 1h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=332336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332336 ] ASF GitHub Bot logged work on BEAM-876: --- Author: ASF GitHub Bot Created on: 23/Oct/19 00:29 Start Date: 23/Oct/19 00:29 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9524: [BEAM-876] Support schemaUpdateOption in BigQueryIO URL: https://github.com/apache/beam/pull/9524#issuecomment-545211563 Should I review once more? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332336) Time Spent: 1h 40m (was: 1.5h) > Support schemaUpdateOption in BigQueryIO > > > Key: BEAM-876 > URL: https://issues.apache.org/jira/browse/BEAM-876 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Eugene Kirpichov >Assignee: canaan silberberg >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > BigQuery recently added support for updating the schema as a side effect of > the load job. > Here is the relevant API method in JobConfigurationLoad: > https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List) > BigQueryIO should support this too. See user request for this: > http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run
[ https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=332335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332335 ] ASF GitHub Bot logged work on BEAM-7765: Author: ASF GitHub Bot Created on: 23/Oct/19 00:29 Start Date: 23/Oct/19 00:29 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9685: [BEAM-7765] - Add test for snippet accessing_valueprovider_info_after_run URL: https://github.com/apache/beam/pull/9685#issuecomment-545211504 Thanks for adding the snippets. There's only one more issue, with lints: ``` 01:02:12 > Task :sdks:python:test-suites:tox:py2:lintPy27 01:02:12 * Module apache_beam.examples.snippets.snippets_test 01:02:12 C:1281, 0: Line too long (97/80) (line-too-long) 01:02:12 C:1306, 0: Wrong hanging indentation (add 7 spaces). 01:02:12 LogValueProvidersFn(my_options.string_value))) 01:02:12 ^ | (bad-continuation) 01:02:12 C:1315, 0: Line too long (87/80) (line-too-long) 01:02:12 C:1317, 0: Line too long (82/80) (line-too-long) 01:02:12 C:1318, 0: Trailing whitespace (trailing-whitespace) 01:02:12 01:02:12 01:02:12 Your code has been rated at 10.00/10 (previous run: 10.00/10, -0.00) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332335) Time Spent: 2h 10m (was: 2h) > Add test for snippet accessing_valueprovider_info_after_run > --- > > Key: BEAM-7765 > URL: https://issues.apache.org/jira/browse/BEAM-7765 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: John Patoch >Priority: Major > Labels: easy > Time Spent: 2h 10m > Remaining Estimate: 0h > > This snippet needs a unit test. > It has bugs. For example: > - apache_beam.utils.value_provider doesn't exist > - beam.combiners.Sum doesn't exist > - unused import of: WriteToText > cc: [~pabloem] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332333 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 23/Oct/19 00:26 Start Date: 23/Oct/19 00:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545211008 The build passed, but then timed out: https://builds.apache.org/job/beam_PostCommit_Python37_PR/44/console This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332333) Time Spent: 1h 40m (was: 1.5h) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332328 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 23/Oct/19 00:16 Start Date: 23/Oct/19 00:16 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545208657 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332328) Time Spent: 1.5h (was: 1h 20m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332302 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 22/Oct/19 23:59 Start Date: 22/Oct/19 23:59 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-545204985 D'oh my bad, we _do_ have control over `PubsubMessage` :man_facepalming: I assumed it was part of the pubsub client library. Yeah I vote we use `DefaultSchema` with either `JavaBeanSchema` or `JavaFieldSchema`, whichever works with fewer changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332302) Time Spent: 7h 40m (was: 7.5h) > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 7h 40m > Remaining Estimate: 0h > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev closed BEAM-8418. - Fix Version/s: 2.17.0 Assignee: Valentyn Tymofieiev (was: Robert Bradshaw) Resolution: Fixed > Fix handling of Impulse transform in Dataflow runner. > -- > > Key: BEAM-8418 > URL: https://issues.apache.org/jira/browse/BEAM-8418 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Following pipeline fails on Dataflow runner unless we use beam_fn_api > experiment. > {noformat} > class NoOpDoFn(beam.DoFn): > def process(self, element): > return element > p = beam.Pipeline(options=pipeline_options) > _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn()) > result = p.run() > {noformat} > The reason is that we encode Impluse payload using url-escaping in [1], while > Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF > runner expects URL escaping. > We should fix or reconcile the encoding in non-FnAPI path, and add a > ValidatesRunner test that catches this error. > [1] > https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957438#comment-16957438 ] Valentyn Tymofieiev commented on BEAM-8418: --- This issue was affecting non-Fn codepath only in Dataflow runner, and this codepath was fixed in PR #9822, I'll close this for now. Side note - if we are not running ValidatesRunner tests for Dataflow under FnAPI, we should add them. > Fix handling of Impulse transform in Dataflow runner. > -- > > Key: BEAM-8418 > URL: https://issues.apache.org/jira/browse/BEAM-8418 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Following pipeline fails on Dataflow runner unless we use beam_fn_api > experiment. > {noformat} > class NoOpDoFn(beam.DoFn): > def process(self, element): > return element > p = beam.Pipeline(options=pipeline_options) > _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn()) > result = p.run() > {noformat} > The reason is that we encode Impluse payload using url-escaping in [1], while > Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF > runner expects URL escaping. > We should fix or reconcile the encoding in non-FnAPI path, and add a > ValidatesRunner test that catches this error. > [1] > https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8196) Python 3.{5,7} post commit timed out at 100 minutes
[ https://issues.apache.org/jira/browse/BEAM-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-8196: Summary: Python 3.{5,7} post commit timed out at 100 minutes (was: Python 3.5 post commit timed out at 100 minutes) > Python 3.{5,7} post commit timed out at 100 minutes > --- > > Key: BEAM-8196 > URL: https://issues.apache.org/jira/browse/BEAM-8196 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_Python35/435/ > This post commit took 100 minutes and timedout. Should we increase the > timeout? We can also look into why this postcommit was slow. A later post > commit (https://builds.apache.org/job/beam_PostCommit_Python35/437/) > completed in 66 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8196) Python 3.5 post commit timed out at 100 minutes
[ https://issues.apache.org/jira/browse/BEAM-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957432#comment-16957432 ] Udi Meiri commented on BEAM-8196: - This is still happening, very frequently now for 3.7 postcommits. I investigated 6 semmingly long-running jobs on the apache-beam-testing project, they all were running "Apache Beam Python 3.7 SDK 2.17.0.dev" and all were showing the "ModuleNotFoundError: No module named 'endpoints_pb2'". One of these is still running after 16 hours. The rest failed after 1-2 hours. Perhaps the 16 hour one did not abort because it is a streaming job? > Python 3.5 post commit timed out at 100 minutes > --- > > Key: BEAM-8196 > URL: https://issues.apache.org/jira/browse/BEAM-8196 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_Python35/435/ > This post commit took 100 minutes and timedout. Should we increase the > timeout? We can also look into why this postcommit was slow. A later post > commit (https://builds.apache.org/job/beam_PostCommit_Python35/437/) > completed in 66 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3922) beam_PostCommit_Python_Verify is broken
[ https://issues.apache.org/jira/browse/BEAM-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957426#comment-16957426 ] Udi Meiri commented on BEAM-3922: - Should this issue be closed? > beam_PostCommit_Python_Verify is broken > --- > > Key: BEAM-3922 > URL: https://issues.apache.org/jira/browse/BEAM-3922 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Mark Liu >Priority: Major > > Jenkins > > [beam_PostCommit_Python_Verify|https://builds.apache.org/job/beam_PostCommit_Python_Verify/] > is broken since Mar 21. > From the [console > log|https://builds.apache.org/job/beam_PostCommit_Python_Verify/4490/consoleFull]: > {code} > == > ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py", > line 812, in run > test(orig) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py", > line 45, in __call__ > return self.run(*arg, **kwarg) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py", > line 133, in run > self.runTest(result) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py", > line 151, in runTest > test(result) > File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__ > return self.run(*args, **kwds) > File "/usr/lib/python2.7/unittest/case.py", line 331, in run > testMethod() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py", > line 66, in test_wordcount_it > wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts)) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py", > line 115, in run > result = p.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py", > line 389, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py", > line 402, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 57, in run_pipeline > self.result.wait_until_finish() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 1071, in wait_until_finish > time.sleep(5.0) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py", > line 276, in signalhandler > raise TimedOutException() > TimedOutException: 'test_wordcount_it > (apache_beam.examples.wordcount_it_test.WordCountIT)' > {code} > Looks like wordcount pipeline didn't finish after 900s (set from command > --process-timeout=900) and test failed in timeout. Generally this test should > finish in 10min, so probably something wrong in the pipeline. > One failure pipeline link (found in console log): > https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-23_08_28_06-8460792149394878073?project=apache-beam-testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=332291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332291 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 22/Oct/19 23:34 Start Date: 22/Oct/19 23:34 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#issuecomment-545199834 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332291) Time Spent: 9h (was: 8h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332279 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 23:05 Start Date: 22/Oct/19 23:05 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337787464 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,15 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-notebook if pipeline is initiated from interactive runner. +from apache_beam.runners.interactive import interactive_runner +if isinstance(pipeline.runner, interactive_runner.InteractiveRunner): Review comment: I've missed the path where a new Pipeline is created and `run()` is invoked again. Yes, all of these would be possible. I've added an `interactive` parameter at the constructor level for `Pipeline` using default value `None`. `run()` and `from_runner_api()` will pass the `None` or `bool` value down no matter how the user chains the runners. I'm not very confident with the naming but the change should be backward compatible for Beam. Currently, I'm running into a problem when testing. Once I set `labels`, Dataflow job will fail immediately and throw `Error processing pipeline.` error. There will be no job graph, no worker started, no logs. Looks like when there is user label in the job request, Dataflow cannot convert the work item into internal representation. I'll do some investigation and figure out why. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332279) Time Spent: 1h (was: 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332274 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 22:58 Start Date: 22/Oct/19 22:58 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337785938 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +396,40 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. + +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: Review comment: You're right! This will surprise the user. I've changed it to throw error if either is not provided instead of ignoring the input by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332274) Time Spent: 50m (was: 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332271 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 22/Oct/19 22:52 Start Date: 22/Oct/19 22:52 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize PCollection URL: https://github.com/apache/beam/pull/9741#issuecomment-545189390 R: @aaltay The tests have passed and Sam has completed his review. Do you have any other comments for this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332271) Time Spent: 8h 50m (was: 8h 40m) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332269 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 22:38 Start Date: 22/Oct/19 22:38 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545170744 First instance of job running: https://builds.apache.org/job/beam_PostCommit_Python37_PR/42/ Next: https://builds.apache.org/job/beam_PostCommit_Python37_PR/43/ Next: https://builds.apache.org/job/beam_PostCommit_Python37_PR/44/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332269) Time Spent: 1h 20m (was: 1h 10m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332268 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 22:38 Start Date: 22/Oct/19 22:38 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545185940 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332268) Time Spent: 1h 10m (was: 1h) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8456) BigQuery to Beam SQL timestamp has the wrong default: truncation makes the most sense
[ https://issues.apache.org/jira/browse/BEAM-8456?focusedWorklogId=332264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332264 ] ASF GitHub Bot logged work on BEAM-8456: Author: ASF GitHub Bot Created on: 22/Oct/19 22:27 Start Date: 22/Oct/19 22:27 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9849: [BEAM-8456] Add pipeline option to have Data Catalog truncate sub-millisecond precision URL: https://github.com/apache/beam/pull/9849#issuecomment-545182789 OK PTAL as-is. I experimented with inlining and I liked the result less. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332264) Time Spent: 1h (was: 50m) > BigQuery to Beam SQL timestamp has the wrong default: truncation makes the > most sense > - > > Key: BEAM-8456 > URL: https://issues.apache.org/jira/browse/BEAM-8456 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Most of the time, a user reading a timestamp from BigQuery with > higher-than-millisecond precision timestamps may not even realize that the > data source created these high precision timestamps. They are probably > timestamps on log entries generated by a system with higher precision. > If they are using it with Beam SQL, which only supports millisecond > precision, it makes sense to "just work" by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332255 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 22/Oct/19 21:54 Start Date: 22/Oct/19 21:54 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-545172864 > If it were a class that we had control over we could use the DefaultSchema annotation, as long as one of the included SchemaProvider implementations would work (I think JavaBeanSchema is the closest but wouldn't work because PubsubMessage doesn't have setters). It may reduce our overall technical debt if we just implement the full set of setters and getters on `PubsubMessage` and use `DefaultSchema`. It doesn't seem like it would be a bad thing. Who would have an opinion on that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332255) Time Spent: 7.5h (was: 7h 20m) > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 7.5h > Remaining Estimate: 0h > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8420) Beam Model sources (classifier) jar don't contain sources
[ https://issues.apache.org/jira/browse/BEAM-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8420: -- Description: These packages are generated from proto files. The jars only contain META-INF/MANIFEST.MF. They should either not be distributed or should contain the source proto files. > Beam Model sources (classifier) jar don't contain sources > - > > Key: BEAM-8420 > URL: https://issues.apache.org/jira/browse/BEAM-8420 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 >Reporter: Romain Manni-Bucau >Priority: Major > > These packages are generated from proto files. The jars only contain > META-INF/MANIFEST.MF. They should either not be distributed or should contain > the source proto files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8420) Beam Model sources (classifier) jar don't contain sources
[ https://issues.apache.org/jira/browse/BEAM-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8420: -- Status: Open (was: Triage Needed) > Beam Model sources (classifier) jar don't contain sources > - > > Key: BEAM-8420 > URL: https://issues.apache.org/jira/browse/BEAM-8420 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 >Reporter: Romain Manni-Bucau >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332254 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 21:48 Start Date: 22/Oct/19 21:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545170780 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332254) Time Spent: 1h (was: 50m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332253 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 21:48 Start Date: 22/Oct/19 21:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545170744 First instance of job running: https://builds.apache.org/job/beam_PostCommit_Python37_PR/42/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332253) Time Spent: 50m (was: 40m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments
[ https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=332249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332249 ] ASF GitHub Bot logged work on BEAM-8402: Author: ASF GitHub Bot Created on: 22/Oct/19 21:42 Start Date: 22/Oct/19 21:42 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9811: [BEAM-8402] Create a class hierarchy to represent environments URL: https://github.com/apache/beam/pull/9811#issuecomment-545168831 > I think Robert's idea is to specify a set of required runtime dependencies, in the form of "Java with Beam 2.16.0 with libraries XY"; and then let Beam create an environment that fits these requirements. That's how I was reading it as well. While that's potentially more user friendly, it also seems quite challenging. If I request "Java with Beam 2.16.0", and the runner happens to be using Java, it seems unsafe to assume that an embedded process is a valid replacement for docker, not in the least because I may have my own custom docker container for Beam-Java. That said, it's still a very interesting idea. Is there a ticket for this idea yet? Since that idea seems like it could take awhile to be designed and executed, I'm hoping that we can move forward with this PR in the meantime. I think that the progress that we make on this and the subsequent work on the semantics of assigning environments to transforms (possibly including [BEAM-7850](https://issues.apache.org/jira/browse/BEAM-7850)), would be useful in any case. Any hey, we can always mark it as experimental :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332249) Time Spent: 1h 10m (was: 1h) > Create a class hierarchy to represent environments > -- > > Key: BEAM-8402 > URL: https://issues.apache.org/jira/browse/BEAM-8402 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > As a first step towards making it possible to assign different environments > to sections of a pipeline, we first need to expose environment classes to the > pipeline API. Unlike PTransforms, PCollections, Coders, and Windowings, > environments exists solely in the portability framework as protobuf objects. > By creating a hierarchy of "native" classes that represent the various > environment types -- external, docker, process, etc -- users will be able to > instantiate these and assign them to parts of the pipeline. The assignment > portion will be covered in a follow-up issue/PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332245 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:41 Start Date: 22/Oct/19 21:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337763913 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. If the stream is paused, this + // will unpause the stream at the specified playback speed. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then pauses the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +// The state of the InteractiveService. The default state is STOPPED. +enum State { + // The InteractiveService is not replaying. Goes to RUNNING with a + // StartRequest. + STOPPED = 0; + + // The InteractiveService is replaying events. Goes to PAUSED with a + // PauseRequest. Goes to STOPPED with a StopRequest. + RUNNING = 1; + + // The InteractiveService is paused from replaying events. Goes to RUNNING + // with either a StartRequest or a StepRequest. Goes to STOPPED with a Review comment: Gotcha, I rewrote the state comments to be very explicit about the state machine and what happens at each state. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332245) Time Spent: 10.5h (was: 10h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332247 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:41 Start Date: 22/Oct/19 21:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337764017 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. If the stream is paused, this + // will unpause the stream at the specified playback speed. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then pauses the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +// The state of the InteractiveService. The default state is STOPPED. +enum State { + // The InteractiveService is not replaying. Goes to RUNNING with a + // StartRequest. + STOPPED = 0; + + // The InteractiveService is replaying events. Goes to PAUSED with a + // PauseRequest. Goes to STOPPED with a StopRequest. + RUNNING = 1; + + // The InteractiveService is paused from replaying events. Goes to RUNNING + // with either a StartRequest or a StepRequest. Goes to STOPPED with a + // StopRequest. + PAUSED = 2; + + // The InteractiveService is stepping through a single event. Will move to + // PAUSED after quiescence. + STEPPING = 3; + + + // The InteractiveService is advancing until a specified duration is reached. + // Will move to PAUSED after the stream advances sufficiently. + ADVANCING = 4; +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; +} +message StartResponse { } + +message AdvanceRequest { + // (Required) Will advance the stream by replaying events as quickly as + // possible until the stream timestamp has advanced by the specified amount. + google.protobuf.Duration advance_by = 1; +} +message AdvanceResponse {} + +message StopRequest { } +message StopResponse { } + +message PauseRequest { } +message PauseResponse { + // The current timestamp of the replay stream. + google.protobuf.Timestamp stream_time = 1; + +
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332246 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:41 Start Date: 22/Oct/19 21:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337763937 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. If the stream is paused, this + // will unpause the stream at the specified playback speed. + rpc Start (StartRequest) returns (StartResponse) {} Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332246) Time Spent: 10h 40m (was: 10.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332238 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 22/Oct/19 21:31 Start Date: 22/Oct/19 21:31 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9771: [BEAM-7926] Update dependencies in Java Katas URL: https://github.com/apache/beam/pull/9771#issuecomment-545164821 Hi @leonardoam ! Thanks for the PR. Have to tested that this combination of dependencies works? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332238) Time Spent: 8h 40m (was: 8.5h) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=332234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332234 ] ASF GitHub Bot logged work on BEAM-1440: Author: ASF GitHub Bot Created on: 22/Oct/19 21:30 Start Date: 22/Oct/19 21:30 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9772: [BEAM-1440] Create a BigQuery source that implements iobase.BoundedSource for Python URL: https://github.com/apache/beam/pull/9772#issuecomment-545164472 taking a look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332234) Time Spent: 4h 10m (was: 4h) > Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK > -- > > Key: BEAM-1440 > URL: https://issues.apache.org/jira/browse/BEAM-1440 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently we have a BigQuery native source for Python SDK [1]. > This can only be used by Dataflow runner. > We should implement a Beam BigQuery source that implements > iobase.BoundedSource [2] interface so that other runners that try to use > Python SDK can read from BigQuery as well. Java SDK already has a Beam > BigQuery source [3]. > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py > [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70 > [3] > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=332232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332232 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 22/Oct/19 21:28 Start Date: 22/Oct/19 21:28 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#issuecomment-545163564 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332232) Time Spent: 8h 50m (was: 8h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8438) Update Python/Streaming IO Documentation
[ https://issues.apache.org/jira/browse/BEAM-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8438: -- Status: Open (was: Triage Needed) > Update Python/Streaming IO Documentation > > > Key: BEAM-8438 > URL: https://issues.apache.org/jira/browse/BEAM-8438 > Project: Beam > Issue Type: Task > Components: website >Reporter: Brian Hulette >Priority: Major > > Built-in IO documentation states that Python/Streaming only supports pubsub > and BQ, which is out of date. > https://beam.apache.org/documentation/io/built-in/ > This came up on > [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8447) Python: Expand Datastore IT
[ https://issues.apache.org/jira/browse/BEAM-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8447: -- Status: Open (was: Triage Needed) > Python: Expand Datastore IT > --- > > Key: BEAM-8447 > URL: https://issues.apache.org/jira/browse/BEAM-8447 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Udi Meiri >Priority: Major > > The current datastore_write_it_test only counts entities without verifying > their expected contents. Like for PubSub and BigQuery, there should be a > datastore_matcher.py (with test) that lets us test things like embedded keys > and entities (https://github.com/apache/beam/pull/9805). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8457: -- Status: Open (was: Triage Needed) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332231 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 21:23 Start Date: 22/Oct/19 21:23 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337757120 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,15 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-notebook if pipeline is initiated from interactive runner. +from apache_beam.runners.interactive import interactive_runner +if isinstance(pipeline.runner, interactive_runner.InteractiveRunner): Review comment: This seems fine - but what if we go with the `runner_in_use` codepath? Would users do: `p.run(runner=InteractiveRunner(DataflowRunner()), options=...)`? Or would users create a pipeline with InteractiveRunner and then do `p.run(runner=DataflowRunner()...`? Is it poissible for users to do `p = beam.Pipeline()`, and then do `InteractiveRunner().run_pipeline(p)`/`InteractiveRunner(DataflowRunner()).run_pipeline(p)`? IIUC users would have to pass the interactive runner in `p = beam.Pipeline()` to activate the interactive mode, right? InteractiveRunner is not automatically selected? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332231) Time Spent: 40m (was: 0.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332230 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 21:23 Start Date: 22/Oct/19 21:23 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337751395 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +396,40 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. + +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: Review comment: What if either runner or options are not provided? Should that throw an error? Currently, if only one is provided, it'll be ignored - and that would be quite surprising for users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332230) Time Spent: 0.5h (was: 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332226 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:18 Start Date: 22/Oct/19 21:18 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337753713 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. If the stream is paused, this + // will unpause the stream at the specified playback speed. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then pauses the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +// The state of the InteractiveService. The default state is STOPPED. +enum State { + // The InteractiveService is not replaying. Goes to RUNNING with a + // StartRequest. + STOPPED = 0; + + // The InteractiveService is replaying events. Goes to PAUSED with a + // PauseRequest. Goes to STOPPED with a StopRequest. + RUNNING = 1; + + // The InteractiveService is paused from replaying events. Goes to RUNNING + // with either a StartRequest or a StepRequest. Goes to STOPPED with a Review comment: Does it go to STEPPING (instead of RUNNING) with a StepRequest? Also, does it go to ADVANCING with an AdvanceRequest? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332226) Time Spent: 10h (was: 9h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 10h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332227 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:18 Start Date: 22/Oct/19 21:18 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337754373 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. If the stream is paused, this + // will unpause the stream at the specified playback speed. + rpc Start (StartRequest) returns (StartResponse) {} Review comment: nit: remove space between method name and open parenthesis for all the rpc declarations? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332227) Time Spent: 10h 10m (was: 10h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332228 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:18 Start Date: 22/Oct/19 21:18 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337754649 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. If the stream is paused, this + // will unpause the stream at the specified playback speed. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then pauses the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +// The state of the InteractiveService. The default state is STOPPED. +enum State { + // The InteractiveService is not replaying. Goes to RUNNING with a + // StartRequest. + STOPPED = 0; + + // The InteractiveService is replaying events. Goes to PAUSED with a + // PauseRequest. Goes to STOPPED with a StopRequest. + RUNNING = 1; + + // The InteractiveService is paused from replaying events. Goes to RUNNING + // with either a StartRequest or a StepRequest. Goes to STOPPED with a + // StopRequest. + PAUSED = 2; + + // The InteractiveService is stepping through a single event. Will move to + // PAUSED after quiescence. + STEPPING = 3; + + + // The InteractiveService is advancing until a specified duration is reached. + // Will move to PAUSED after the stream advances sufficiently. + ADVANCING = 4; +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; +} +message StartResponse { } + +message AdvanceRequest { + // (Required) Will advance the stream by replaying events as quickly as + // possible until the stream timestamp has advanced by the specified amount. + google.protobuf.Duration advance_by = 1; +} +message AdvanceResponse {} + +message StopRequest { } +message StopResponse { } + +message PauseRequest { } +message PauseResponse { + // The current timestamp of the replay stream. + google.protobuf.Timestamp stream_time = 1; + + //
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332225 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 21:10 Start Date: 22/Oct/19 21:10 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545157196 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332225) Time Spent: 40m (was: 0.5h) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332224 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 21:09 Start Date: 22/Oct/19 21:09 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337751597 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,136 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import time +from concurrent.futures import ThreadPoolExecutor + +import grpc + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer + +STRING_TO_API_STATE = { +'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED, +'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED, +'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING, +} Review comment: Thanks, added a comment to the proto saying the initial state is STOPPED. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332224) Time Spent: 9h 50m (was: 9h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332217 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 20:52 Start Date: 22/Oct/19 20:52 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545149669 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332217) Time Spent: 0.5h (was: 20m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332216 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 20:48 Start Date: 22/Oct/19 20:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855#issuecomment-545147710 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332216) Time Spent: 20m (was: 10m) > apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types > is flaky > --- > > Key: BEAM-8446 > URL: https://issues.apache.org/jira/browse/BEAM-8446 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Pablo Estrada >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > test_big_query_write_new_types appears to be flaky in > beam_PostCommit_Python37 test suite. > https://builds.apache.org/job/beam_PostCommit_Python37/733/ > https://builds.apache.org/job/beam_PostCommit_Python37/739/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky
[ https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=332215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332215 ] ASF GitHub Bot logged work on BEAM-8446: Author: ASF GitHub Bot Created on: 22/Oct/19 20:47 Start Date: 22/Oct/19 20:47 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9855: [BEAM-8446] Retrying BQ query on timeouts URL: https://github.com/apache/beam/pull/9855 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332213 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 20:45 Start Date: 22/Oct/19 20:45 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854 1. Changed the pipeline.run() API to allow a runner and an option parameter so that a pipeline initially bundled w/ an interactive runner can be directly run by other runners from notebook. 2. Implicitly added the necessary source information through user labels when the user does p.run(runner=DataflowRunner(), options=options) or DataflowRunner().run_pipeline(p, options). **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332214 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 20:45 Start Date: 22/Oct/19 20:45 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#issuecomment-545146836 R: @pabloem PTAL, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332214) Time Spent: 20m (was: 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332194 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:37 Start Date: 22/Oct/19 20:37 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337737427 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. Review comment: Ack, I changed the Start comment to reflect that it can also unpause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332194) Time Spent: 9h 40m (was: 9.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332193 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:37 Start Date: 22/Oct/19 20:37 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337737375 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,136 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import time +from concurrent.futures import ThreadPoolExecutor + +import grpc + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer + +STRING_TO_API_STATE = { +'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED, +'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED, +'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING, +} Review comment: What is the initial state? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332193) Time Spent: 9.5h (was: 9h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332192 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:36 Start Date: 22/Oct/19 20:36 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337737214 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; +} +message StartResponse { } + +message AdvanceRequest { + // (Required) Will advance the stream by replaying events as quickly as + // possible until the stream timestamp has advanced by the specified amount. + google.protobuf.Duration advance_by = 1; +} +message AdvanceResponse {} + +message StopRequest { } +message StopResponse { } + +message PauseRequest { } +message PauseResponse { } + +message StatusRequest { } +message StatusResponse { + // The current timestamp of the replay stream. Is MIN_TIMESTAMP when state + // is STOPPED. + google.protobuf.Timestamp stream_time = 1; + + // The minimum watermark across all of the faked replayable unbounded sources. + // Is MIN_TIMESTAMP when state is STOPPED. + google.protobuf.Timestamp watermark = 2; + + // The latest timestamp of the recording stream. Is MIN_TIMESTAMP if there is + // no recording. + google.protobuf.Timestamp recording_time = 3; + + // The set playback_speed from the StartRequest. Playback speed is set by + // StartRequest, or if the stream_time is the current time and the recording + // is still happening, the playback speed is 1, else 0. + double playback_speed = 4; + + enum State { +// The InteractiveService is not replaying. Goes to RUNNING with a +// StartRequest. +STOPPED = 0; + +// The InteractiveService is replaying events. Goes to PAUSED with a +// PauseRequest. Goes to STOPPED with
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332191 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:35 Start Date: 22/Oct/19 20:35 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337736773 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; +} +message StartResponse { } + +message AdvanceRequest { + // (Required) Will advance the stream by replaying events as quickly as + // possible until the stream timestamp has advanced by the specified amount. + google.protobuf.Duration advance_by = 1; +} +message AdvanceResponse {} + +message StopRequest { } +message StopResponse { } + +message PauseRequest { } +message PauseResponse { } Review comment: That makes sense, I also added the watermark to the response. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332191) Time Spent: 9h 10m (was: 9h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332190 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 22/Oct/19 20:34 Start Date: 22/Oct/19 20:34 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize PCollection URL: https://github.com/apache/beam/pull/9741#issuecomment-545142275 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332190) Time Spent: 8.5h (was: 8h 20m) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332187 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:31 Start Date: 22/Oct/19 20:31 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337733049 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; Review comment: Ack. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332187) Time Spent: 9h (was: 8h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332186 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:31 Start Date: 22/Oct/19 20:31 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337731954 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. Review comment: Please also add unpause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332186) Time Spent: 8h 50m (was: 8h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332184 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:31 Start Date: 22/Oct/19 20:31 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337732796 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; +} +message StartResponse { } + +message AdvanceRequest { + // (Required) Will advance the stream by replaying events as quickly as + // possible until the stream timestamp has advanced by the specified amount. + google.protobuf.Duration advance_by = 1; +} +message AdvanceResponse {} + +message StopRequest { } +message StopResponse { } + +message PauseRequest { } +message PauseResponse { } Review comment: Does it make sense to include the stream timestamp in PauseResponse? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332184) Time Spent: 8h 40m (was: 8.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > This issue tracks the work items to
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332183 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:31 Start Date: 22/Oct/19 20:31 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337734636 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. It is also allowed for + // setting the playback_speed while RUNNING. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING then pauses the stream when the + // offset is reached. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; +} +message StartResponse { } + +message AdvanceRequest { + // (Required) Will advance the stream by replaying events as quickly as + // possible until the stream timestamp has advanced by the specified amount. + google.protobuf.Duration advance_by = 1; +} +message AdvanceResponse {} + +message StopRequest { } +message StopResponse { } + +message PauseRequest { } +message PauseResponse { } + +message StatusRequest { } +message StatusResponse { + // The current timestamp of the replay stream. Is MIN_TIMESTAMP when state + // is STOPPED. + google.protobuf.Timestamp stream_time = 1; + + // The minimum watermark across all of the faked replayable unbounded sources. + // Is MIN_TIMESTAMP when state is STOPPED. + google.protobuf.Timestamp watermark = 2; + + // The latest timestamp of the recording stream. Is MIN_TIMESTAMP if there is + // no recording. + google.protobuf.Timestamp recording_time = 3; + + // The set playback_speed from the StartRequest. Playback speed is set by + // StartRequest, or if the stream_time is the current time and the recording + // is still happening, the playback speed is 1, else 0. + double playback_speed = 4; + + enum State { +// The InteractiveService is not replaying. Goes to RUNNING with a +// StartRequest. +STOPPED = 0; + +// The InteractiveService is replaying events. Goes to PAUSED with a +// PauseRequest. Goes to STOPPED with
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332185 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 20:31 Start Date: 22/Oct/19 20:31 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337733498 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. Review comment: Thanks. I think we need to make the doc here better since Advance doesn't really "un-pause" since after Advance, the replay is still paused, but at a different timestamp. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332185) Time Spent: 8h 50m (was: 8h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332181 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 22/Oct/19 20:19 Start Date: 22/Oct/19 20:19 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-545135957 > Right now I'm mostly thinking about the latter Agreed, that's what I'm thinking about too. > Maybe I'm thinking about this wrong, but I think the PubsubMessage is structured: Ah ok, fair. I was referring specifically to the structure (or lack thereof) of the byte array payload, but you're right the (Python SDK) user can handle creating a byte array themselves, and the row coder can just encode `{byte[] payload, Map attributes, String messageId}` > What are the requirements for registering a Row converter? ### Java There are a variety of ways to do it. If it were a class that we had control over we could use the [`DefaultSchema`](https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/schemas/DefaultSchema.html) annotation, as long as one of the included SchemaProvider implementations would work (I think JavaBeanSchema is the closest but wouldn't work because PubsubMessage doesn't have setters). I think what we'd want to do here is just implement a [`SchemaProvider`](https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/schemas/SchemaProvider.html) and a [`SchemaProviderRegistrar`](https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/schemas/SchemaProviderRegistrar.html) for `PubsubMessage` and include it in Beam. @reuvenlax may have a better suggestion. ### Python With my PR I think it could look like: ```python # this is py3 syntax for clarity, but we'd probably # need to use the TypedDict('PubsubMessage', ...) version class PubsubMessage(TypedDict): message: ByteString attributes: Mapping[unicode, unicode] messageId: unicode coders.registry.register_coder(PubsubMessage, coders.RowCoder) pcoll | 'make some messages' >> beam.Map(makeMessage).with_output_types(PubsubMessage) | 'write to pubsub' >> beam.io.WriteToPubsub(project, topic) # or something ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332181) Time Spent: 7h 20m (was: 7h 10m) > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 7h 20m > Remaining Estimate: 0h > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=332179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332179 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 22/Oct/19 20:14 Start Date: 22/Oct/19 20:14 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9790: [BEAM-7389] Show code snippet outputs as stdout URL: https://github.com/apache/beam/pull/9790 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332179) Time Spent: 69h 50m (was: 69h 40m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 69h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments
[ https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=332177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332177 ] ASF GitHub Bot logged work on BEAM-8402: Author: ASF GitHub Bot Created on: 22/Oct/19 20:05 Start Date: 22/Oct/19 20:05 Worklog Time Spent: 10m Work Description: mxm commented on issue #9811: [BEAM-8402] Create a class hierarchy to represent environments URL: https://github.com/apache/beam/pull/9811#issuecomment-545130446 I think Robert's idea is to specify a set of required runtime dependencies, in the form of "Java with Beam 2.16.0 with libraries XY"; and then let Beam create an environment that fits these requirements. The question is whether we should really "bake in" the existing environments into the model itself. It does not have to be a contradiction because we could support "legacy" environments (like the existing) and eventually replace them by the "smart" dynamic environments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332177) Time Spent: 1h (was: 50m) > Create a class hierarchy to represent environments > -- > > Key: BEAM-8402 > URL: https://issues.apache.org/jira/browse/BEAM-8402 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As a first step towards making it possible to assign different environments > to sections of a pipeline, we first need to expose environment classes to the > pipeline API. Unlike PTransforms, PCollections, Coders, and Windowings, > environments exists solely in the portability framework as protobuf objects. > By creating a hierarchy of "native" classes that represent the various > environment types -- external, docker, process, etc -- users will be able to > instantiate these and assign them to parts of the pipeline. The assignment > portion will be covered in a follow-up issue/PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to be able to run queries
[ https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Israel Herraiz updated BEAM-8458: - Summary: BigQueryIO.Read needs permissions to create datasets to be able to run queries (was: BigQueryIO.Read needs permissions to create datasets to run queries) > BigQueryIO.Read needs permissions to create datasets to be able to run queries > -- > > Key: BEAM-8458 > URL: https://issues.apache.org/jira/browse/BEAM-8458 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Israel Herraiz >Assignee: Israel Herraiz >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the > results of the query. > Therefore, Beam requires permissions to create datasets just to be able to > run a query. In practice, this means that Beam requires the role > bigQuery.User just to run queries, whereas if you use {{from}} (to read from > a table), the role bigQuery.jobUser suffices. > BigQueryIO.Read should have an option to set an existing dataset to write > the temp results of > a query, so it would be enough with having the role bigQuery.jobUser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries
[ https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Israel Herraiz updated BEAM-8458: - Description: When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the results of the query. Therefore, Beam requires permissions to create datasets just to be able to run a query. In practice, this means that Beam requires the role bigQuery.User just to run queries, whereas if you use {{from}} (to read from a table), the role bigQuery.jobUser suffices. BigQueryIO.Read should have an option to set an existing dataset to write the temp results of a query, so it would be enough with having the role bigQuery.jobUser. was: When using `fromQuery`, BigQueryIO creates a temp dataset to store the results of the query. Therefore, Beam requires permissions to create datasets just to be able to run a query. In practice, this means that Beam requires the role bigQuery.User just to run queries, whereas if you use `from` (to read from a table), the role bigQuery.jobUser suffices. BigQueryIO.Read should have an option to set an existing dataset to write the temp results of a query, so it would be enough with having the role bigQuery.jobUser. > BigQueryIO.Read needs permissions to create datasets to run queries > --- > > Key: BEAM-8458 > URL: https://issues.apache.org/jira/browse/BEAM-8458 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Israel Herraiz >Assignee: Israel Herraiz >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > When using {{fromQuery}}, BigQueryIO creates a temp dataset to store the > results of the query. > Therefore, Beam requires permissions to create datasets just to be able to > run a query. In practice, this means that Beam requires the role > bigQuery.User just to run queries, whereas if you use {{from}} (to read from > a table), the role bigQuery.jobUser suffices. > BigQueryIO.Read should have an option to set an existing dataset to write > the temp results of > a query, so it would be enough with having the role bigQuery.jobUser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries
[ https://issues.apache.org/jira/browse/BEAM-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Israel Herraiz reassigned BEAM-8458: Assignee: Israel Herraiz > BigQueryIO.Read needs permissions to create datasets to run queries > --- > > Key: BEAM-8458 > URL: https://issues.apache.org/jira/browse/BEAM-8458 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Israel Herraiz >Assignee: Israel Herraiz >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > When using `fromQuery`, BigQueryIO creates a temp dataset to store the > results of the query. > Therefore, Beam requires permissions to create datasets just to be able to > run a query. In practice, this means that Beam requires the role > bigQuery.User just to run queries, whereas if you use `from` (to read from a > table), the role bigQuery.jobUser suffices. > BigQueryIO.Read should have an option to set an existing dataset to write > the temp results of > a query, so it would be enough with having the role bigQuery.jobUser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries
[ https://issues.apache.org/jira/browse/BEAM-8458?focusedWorklogId=332166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332166 ] ASF GitHub Bot logged work on BEAM-8458: Author: ASF GitHub Bot Created on: 22/Oct/19 19:34 Start Date: 22/Oct/19 19:34 Worklog Time Spent: 10m Work Description: iht commented on pull request #9852: [BEAM-8458] Add option to set temp dataset in BigQueryIO.Read URL: https://github.com/apache/beam/pull/9852 When using fromQuery, BigQueryIO creates a temp dataset to store the results of the query. Therefore, Beam requires permissions to create datasets just to be able to run a query. With this option, BigQueryIO can write the temp results of the query to a pre-existing dataset, and therefore it only needs permissions to run queries and create tables to be able to use from Query. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [X ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [X] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries
[ https://issues.apache.org/jira/browse/BEAM-8458?focusedWorklogId=332167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332167 ] ASF GitHub Bot logged work on BEAM-8458: Author: ASF GitHub Bot Created on: 22/Oct/19 19:35 Start Date: 22/Oct/19 19:35 Worklog Time Spent: 10m Work Description: iht commented on issue #9852: [BEAM-8458] Add option to set temp dataset in BigQueryIO.Read URL: https://github.com/apache/beam/pull/9852#issuecomment-545119172 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332167) Time Spent: 20m (was: 10m) > BigQueryIO.Read needs permissions to create datasets to run queries > --- > > Key: BEAM-8458 > URL: https://issues.apache.org/jira/browse/BEAM-8458 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Israel Herraiz >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > When using `fromQuery`, BigQueryIO creates a temp dataset to store the > results of the query. > Therefore, Beam requires permissions to create datasets just to be able to > run a query. In practice, this means that Beam requires the role > bigQuery.User just to run queries, whereas if you use `from` (to read from a > table), the role bigQuery.jobUser suffices. > BigQueryIO.Read should have an option to set an existing dataset to write > the temp results of > a query, so it would be enough with having the role bigQuery.jobUser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8458) BigQueryIO.Read needs permissions to create datasets to run queries
Israel Herraiz created BEAM-8458: Summary: BigQueryIO.Read needs permissions to create datasets to run queries Key: BEAM-8458 URL: https://issues.apache.org/jira/browse/BEAM-8458 Project: Beam Issue Type: Bug Components: io-java-gcp Reporter: Israel Herraiz When using `fromQuery`, BigQueryIO creates a temp dataset to store the results of the query. Therefore, Beam requires permissions to create datasets just to be able to run a query. In practice, this means that Beam requires the role bigQuery.User just to run queries, whereas if you use `from` (to read from a table), the role bigQuery.jobUser suffices. BigQueryIO.Read should have an option to set an existing dataset to write the temp results of a query, so it would be enough with having the role bigQuery.jobUser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332162 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 19:24 Start Date: 22/Oct/19 19:24 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337705585 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. Review comment: Ack, commented on the new protocol. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332162) Time Spent: 8.5h (was: 8h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332161 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 19:21 Start Date: 22/Oct/19 19:21 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337704421 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332161) Time Spent: 8h 20m (was: 8h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332160 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 19:20 Start Date: 22/Oct/19 19:20 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337703862 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; + + // (Optional) if present, will start the stream at the specified timestamp. + google.protobuf.Timestamp start_at = 2; Review comment: Good point, removed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332160) Time Spent: 8h 10m (was: 8h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332159 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 19:19 Start Date: 22/Oct/19 19:19 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337703571 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; Review comment: Proto3 doesn't allow for explicit default values :( This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332159) Time Spent: 8h (was: 7h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332151 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 22/Oct/19 18:54 Start Date: 22/Oct/19 18:54 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-545103766 > Are you thinking you'd use beam:coder:row:v1 as the interface for the external transform, and the Java ExternalTransform implementations would handle the conversion of Row to/from PubsubMessage? There are two places that I see beam:coder:row:v1 being useful: 1. as a way to declare the construction interface of an external transform, and encode its values. A schema coder would replace the `configuration` mapping in `pipeline.ExternalConfiguration.ExternalConfigurationPayload` proto. 2. as a coder for structured elements that are exchanged between sdks Right now I'm mostly thinking about the latter, which is when `PubsubMessage` comes into play. > There's no trivial way to register a converter between Row and PubsubMessage since the latter isn't structured, Maybe I'm thinking about this wrong, but I think the `PubsubMessage` _is_ structured: ```java public class PubsubMessage { private byte[] message; private Map attributes; private String messageId; /** Returns the main PubSub message. */ public byte[] getPayload() { return message; } /** Returns the full map of attributes. This is an unmodifiable map. */ public Map getAttributeMap() { return attributes; } /** Returns the messageId of the message populated by Cloud Pub/Sub. */ @Nullable public String getMessageId() { return messageId; } ``` I'm not a Java expert by any means, but this seems like a type that would work with AutoValue, we just need to rename `message` to `payload` and `attributes` to `attributeMap`. What are the requirements for registering a Row converter? > but of course on the Java side we could have code to serialize the Row to a variety of formats to put in the PubsubMessage payload: Avro, JSON, or the row serialization format itself (although I'm not sure we'd want to encourage using that outside of Beam), would be pretty simple to add. I think the payload is not a concern when it comes to portability of external transforms: it gets encoded/decoded by another transform, not PubsubRead/Write. We can just assume that's a byte array. My grasp on the Java side is a bit tenuous, so I'd like for @mxm to confirm or deny what I've written here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332151) Time Spent: 7h 10m (was: 7h) > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 7h 10m > Remaining Estimate: 0h > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332152 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:54 Start Date: 22/Oct/19 18:54 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337692597 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; Review comment: Should we just add [default = 1]? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332152) Time Spent: 7h 50m (was: 7h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332149 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:52 Start Date: 22/Oct/19 18:52 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337691444 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. + rpc Pause (PauseRequest) returns (PauseResponse) {} + + // Sends a single element to the EventsRequest then closes the stream. + rpc Step (StepRequest) returns (StepResponse) {} + + // Responds with debugging and other cache-specific metadata. + rpc Status (StatusRequest) returns (StatusResponse) {} +} + +message StartRequest { + // (Optional) How quickly the stream will be played back, e.g. if + // playback_speed == 2, then the stream will replay events twice as fast as + // they were recorded. If unspecified, this will default to 1. + double playback_speed = 1; + + // (Optional) if present, will start the stream at the specified timestamp. + google.protobuf.Timestamp start_at = 2; Review comment: I'm not sure whether we should allow this feature since that means re-execution with start_at specified means different input data and it kinda defeats the purpose of replaying. Also, if we use Start method for also unpausing and changing playback speed, start_at either should be ignored or should return an error. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332149) Time Spent: 7h 40m (was: 7.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > *
[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered
[ https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332148 ] ASF GitHub Bot logged work on BEAM-7520: Author: ASF GitHub Bot Created on: 22/Oct/19 18:49 Start Date: 22/Oct/19 18:49 Worklog Time Spent: 10m Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer firing order in DirectRunner URL: https://github.com/apache/beam/pull/9190#issuecomment-545101905 I also plan to create the follow-up JIRAs after the merge and notify dev list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332148) Time Spent: 18.5h (was: 18h 20m) > DirectRunner timers are not strictly time ordered > - > > Key: BEAM-7520 > URL: https://issues.apache.org/jira/browse/BEAM-7520 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.13.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 18.5h > Remaining Estimate: 0h > > Let's suppose we have the following situation: > - statful ParDo with two timers - timerA and timerB > - timerA is set for window.maxTimestamp() + 1 > - timerB is set anywhere between timerB.timestamp > - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE > Then the order of timers is as follows (correct): > - timerB > - timerA > But, if timerB sets another timer (say for timerB.timestamp + 1), then the > order of timers will be: > - timerB (timerB.timestamp) > - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE) > - timerB (timerB.timestamp + 1) > Which is not ordered by timestamp. The reason for this is that when the input > watermark update is evaluated, the WatermarkManager,extractFiredTimers() will > produce both timerA and timerB. That would be correct, but when timerB sets > another timer, that breaks this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered
[ https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332146 ] ASF GitHub Bot logged work on BEAM-7520: Author: ASF GitHub Bot Created on: 22/Oct/19 18:46 Start Date: 22/Oct/19 18:46 Worklog Time Spent: 10m Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer firing order in DirectRunner URL: https://github.com/apache/beam/pull/9190#issuecomment-545100575 I'll definitely squash the history to some compact isolated commits before merging. Thanks for the approval! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332146) Time Spent: 18h 20m (was: 18h 10m) > DirectRunner timers are not strictly time ordered > - > > Key: BEAM-7520 > URL: https://issues.apache.org/jira/browse/BEAM-7520 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.13.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > Let's suppose we have the following situation: > - statful ParDo with two timers - timerA and timerB > - timerA is set for window.maxTimestamp() + 1 > - timerB is set anywhere between timerB.timestamp > - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE > Then the order of timers is as follows (correct): > - timerB > - timerA > But, if timerB sets another timer (say for timerB.timestamp + 1), then the > order of timers will be: > - timerB (timerB.timestamp) > - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE) > - timerB (timerB.timestamp + 1) > Which is not ordered by timestamp. The reason for this is that when the input > watermark update is evaluated, the WatermarkManager,extractFiredTimers() will > produce both timerA and timerB. That would be correct, but when timerB sets > another timer, that breaks this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332145 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:46 Start Date: 22/Oct/19 18:46 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337688439 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} Review comment: I think it's okay for the Start method to also change the playback speed for a running replay. We just need to document that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332145) Time Spent: 7.5h (was: 7h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332143 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:46 Start Date: 22/Oct/19 18:46 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337688439 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} Review comment: [couldn't reply to an outdated comment] I think it's okay for the Start method to also change the playback speed for a running replay. We just need to document that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332143) Time Spent: 7h 20m (was: 7h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332144 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:46 Start Date: 22/Oct/19 18:46 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337684609 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Protocol Buffers describing a service that can be used in conjunction with + * the TestStream class in order to control a pipeline remotely. + */ + +syntax = "proto3"; + +package org.apache.beam.model.interactive.v1; + +option go_package = "interactive_v1"; +option java_package = "org.apache.beam.model.interactive.v1"; +option java_outer_classname = "BeamInteractiveApi"; + +import "beam_runner_api.proto"; +import "google/protobuf/timestamp.proto"; +import "google/protobuf/duration.proto"; + + +service InteractiveService { + // A TestStream will request for events using this RPC. + rpc Events(EventsRequest) returns (stream EventsResponse) {} + + // Starts the stream of events to the EventsRequest. + rpc Start (StartRequest) returns (StartResponse) {} + + // Advances the stream to the specified offset then pauses the stream. This + // starts the stream if it is not RUNNING. + rpc Advance (AdvanceRequest) returns (AdvanceResponse) {} + + // Stops and resets the stream to the beginning. + rpc Stop (StopRequest) returns (StopResponse) {} + + // Pauses the stream of events to the EventsRequest. If there is already an + // outstanding EventsRequest streaming events, then the stream will pause + // after the EventsResponse is completed. + // To un-pause, send either a Start or Advance request. Review comment: If the playback is paused, then an Advance request is issued, does the playback resume at the previous speed after the Advance, or should it remain paused? I think the latter makes more sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332144) Time Spent: 7h 20m (was: 7h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332141 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:33 Start Date: 22/Oct/19 18:33 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337682581 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -94,6 +98,7 @@ message StatusResponse { // no recording. google.protobuf.Timestamp recording_time = 3; + // The set playback_speed from either the StartRequest or the AdvanceRequest. Review comment: Does it make sense for the StartRequest to do that? I don't want to have too many methods on the API if possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332141) Time Spent: 7h 10m (was: 7h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332139 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:31 Start Date: 22/Oct/19 18:31 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337681814 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -81,7 +86,6 @@ message PauseResponse { } Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332139) Time Spent: 7h (was: 6h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957267#comment-16957267 ] Robert Bradshaw commented on BEAM-8418: --- Note that this is a Dataflow-only issue. The FnAPI doesn't have a way to specifying what element should be used for Impulse (and Dataflow should probably be updated accordingly). > Fix handling of Impulse transform in Dataflow runner. > -- > > Key: BEAM-8418 > URL: https://issues.apache.org/jira/browse/BEAM-8418 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Following pipeline fails on Dataflow runner unless we use beam_fn_api > experiment. > {noformat} > class NoOpDoFn(beam.DoFn): > def process(self, element): > return element > p = beam.Pipeline(options=pipeline_options) > _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn()) > result = p.run() > {noformat} > The reason is that we encode Impluse payload using url-escaping in [1], while > Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF > runner expects URL escaping. > We should fix or reconcile the encoding in non-FnAPI path, and add a > ValidatesRunner test that catches this error. > [1] > https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=332137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332137 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 22/Oct/19 18:26 Start Date: 22/Oct/19 18:26 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-545092832 > How close are we to having schema coders ready to use? IIUC, we should be able to register a converter between Row -> PubsubMessage, right? What's the mechanism for that? It would be great to put that to use here as soon as the schema coders are ready. @robertwb gave #9188 a LGTM, I just need to get CI passing and I think we can merge it. I'll work on that now. Are you thinking you'd use beam:coder:row:v1 as the interface for the external transform, and the Java ExternalTransform implementations would handle the conversion of Row to/from PubsubMessage? There's no trivial way to register a converter between Row and PubsubMessage since the latter isn't structured, but of course on the Java side we could have code to serialize the Row to a variety of formats to put in the PubsubMessage payload: Avro, JSON, or the row serialization format itself (although I'm not sure we'd want to encourage using that outside of Beam), would be pretty simple to add. Maybe the format to use could be part of the external transform payload. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332137) Time Spent: 7h (was: 6h 50m) > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 7h > Remaining Estimate: 0h > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332136 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 18:24 Start Date: 22/Oct/19 18:24 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337678175 ## File path: sdks/python/apache_beam/testing/interactive_stream_test.py ## @@ -0,0 +1,118 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import unittest + +import grpc +from google.protobuf import timestamp_pb2 + +from apache_beam import coders +from apache_beam.portability.api import beam_interactive_api_pb2 as interactive_api +from apache_beam.portability.api import beam_interactive_api_pb2_grpc as interactive_api_grpc +from apache_beam.portability.api.beam_interactive_api_pb2 import InteractiveStreamHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import InteractiveStreamRecord +from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.caching.streaming_cache import StreamingCache +from apache_beam.testing.interactive_stream import InteractiveStreamController + + +def get_open_port(): + import socket + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + s.bind(('', 0)) + s.listen(1) + port = s.getsockname()[1] + s.close() + return port Review comment: Awesome thanks for this. I didn't know about this functionality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332136) Time Spent: 6h 50m (was: 6h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=332134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332134 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 22/Oct/19 18:21 Start Date: 22/Oct/19 18:21 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9741: [BEAM-7926] Visualize PCollection URL: https://github.com/apache/beam/pull/9741#discussion_r337676820 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py ## @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module visualizes PCollection data. + +For internal use only; no backwards-compatibility guarantees. +Only works with Python 3.5+. +""" +from __future__ import absolute_import + +import base64 +import logging +from datetime import timedelta + +from pandas.io.json import json_normalize + +from apache_beam import pvalue +from apache_beam.runners.interactive import interactive_environment as ie +from apache_beam.runners.interactive import pipeline_instrument as instr +from facets_overview.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator +from IPython.core.display import HTML +from IPython.core.display import Javascript +from IPython.core.display import display +from IPython.core.display import display_javascript +from IPython.core.display import update_display +from timeloop import Timeloop + +# jsons doesn't support < Python 3.5. Work around with json for legacy tests. Review comment: I've added such warning message in the interactive_environment module to check for Python version and print out warnings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332134) Time Spent: 8h 20m (was: 8h 10m) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered
[ https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332132 ] ASF GitHub Bot logged work on BEAM-7520: Author: ASF GitHub Bot Created on: 22/Oct/19 18:18 Start Date: 22/Oct/19 18:18 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9190: [BEAM-7520] Fix timer firing order in DirectRunner URL: https://github.com/apache/beam/pull/9190#issuecomment-545089710 My reading of the commit history is that there are many commits which are not meaningful by themselves. There may be some commits that stand alone. But I will squash these commits when I merge unless you would like to do your own custom squashing into some meaningful small set of commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332132) Time Spent: 18h 10m (was: 18h) > DirectRunner timers are not strictly time ordered > - > > Key: BEAM-7520 > URL: https://issues.apache.org/jira/browse/BEAM-7520 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.13.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > Let's suppose we have the following situation: > - statful ParDo with two timers - timerA and timerB > - timerA is set for window.maxTimestamp() + 1 > - timerB is set anywhere between timerB.timestamp > - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE > Then the order of timers is as follows (correct): > - timerB > - timerA > But, if timerB sets another timer (say for timerB.timestamp + 1), then the > order of timers will be: > - timerB (timerB.timestamp) > - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE) > - timerB (timerB.timestamp + 1) > Which is not ordered by timestamp. The reason for this is that when the input > watermark update is evaluated, the WatermarkManager,extractFiredTimers() will > produce both timerA and timerB. That would be correct, but when timerB sets > another timer, that breaks this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered
[ https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=332130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332130 ] ASF GitHub Bot logged work on BEAM-7520: Author: ASF GitHub Bot Created on: 22/Oct/19 18:17 Start Date: 22/Oct/19 18:17 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9190: [BEAM-7520] Fix timer firing order in DirectRunner URL: https://github.com/apache/beam/pull/9190#discussion_r337674421 ## File path: runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectRunnerTest.java ## @@ -616,6 +649,186 @@ public void processElement(ProcessContext c) { p.run(); } + /** + * Test running of {@link Pipeline} which has two {@link POutput POutputs} and finishing the first + * one triggers data being fed into the second one. + */ + @Test(timeout = 1) + public void testTwoPOutputsInPipelineWithCascade() throws InterruptedException { Review comment: Got it. Makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332130) Time Spent: 18h (was: 17h 50m) > DirectRunner timers are not strictly time ordered > - > > Key: BEAM-7520 > URL: https://issues.apache.org/jira/browse/BEAM-7520 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.13.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > Let's suppose we have the following situation: > - statful ParDo with two timers - timerA and timerB > - timerA is set for window.maxTimestamp() + 1 > - timerB is set anywhere between timerB.timestamp > - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE > Then the order of timers is as follows (correct): > - timerB > - timerA > But, if timerB sets another timer (say for timerB.timestamp + 1), then the > order of timers will be: > - timerB (timerB.timestamp) > - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE) > - timerB (timerB.timestamp + 1) > Which is not ordered by timestamp. The reason for this is that when the input > watermark update is evaluated, the WatermarkManager,extractFiredTimers() will > produce both timerA and timerB. That would be correct, but when timerB sets > another timer, that breaks this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332123 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 17:57 Start Date: 22/Oct/19 17:57 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337664333 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -94,6 +98,7 @@ message StatusResponse { // no recording. google.protobuf.Timestamp recording_time = 3; + // The set playback_speed from either the StartRequest or the AdvanceRequest. Review comment: playback speed is not set by AdvanceRequest. Playback speed is set by StartRequest, or if the stream_time is the current time and the recording is still happening, the playback speed is 1, else 0. Also, should we have a method that changes the playback speed without the user pausing and then starting? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332123) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332122 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 17:57 Start Date: 22/Oct/19 17:57 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337664736 ## File path: model/interactive/src/main/proto/beam_interactive_api.proto ## @@ -81,7 +86,6 @@ message PauseResponse { } Review comment: To un-pause, do we issue another StartRequest? If so, please document that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332122) Time Spent: 6h 40m (was: 6.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reopened BEAM-7594: - Found another instance of this: {code} 05:35:15 == 05:35:15 ERROR: test_read_from_text_with_file_name_file_pattern (apache_beam.io.textio_test.TextSourceTest) 05:35:15 -- 05:35:15 Traceback (most recent call last): 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/io/textio_test.py", line 517, in test_read_from_text_with_file_name_file_pattern 05:35:15 pipeline.run() 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", line 112, in run 05:35:15 else test_runner_api)) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/pipeline.py", line 407, in run 05:35:15 self._options).run(False) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/pipeline.py", line 420, in run 05:35:15 return self.runner.run_pipeline(self, self._options) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", line 129, in run_pipeline 05:35:15 return runner.run_pipeline(pipeline, options) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 381, in run_pipeline 05:35:15 default_environment=self._default_environment)) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 388, in run_via_runner_api 05:35:15 return self.run_stages(stage_context, stages) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 470, in run_stages 05:35:15 stage_context.safe_coders) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 753, in _run_stage 05:35:15 result, splits = bundle_manager.process_bundle(data_input, data_output) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1801, in process_bundle 05:35:15 part, expected_outputs), part_inputs): 05:35:15 File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator 05:35:15 yield fs.pop().result() 05:35:15 File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result 05:35:15 return self.__get_result() 05:35:15 File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result 05:35:15 raise self._exception 05:35:15 File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run 05:35:15 result = self.fn(*self.args, **self.kwargs) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1801, in 05:35:15 part, expected_outputs), part_inputs): 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1737, in process_bundle 05:35:15 result_future = self._worker_handler.control_conn.push(process_bundle_req) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1150, in push 05:35:15 response = self.worker.do_instruction(request) 05:35:15 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 360, in do_instruction 05:35:15 request.instruction_id) 05:35:15 File
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332117 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 17:49 Start Date: 22/Oct/19 17:49 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337661353 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,125 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import time +from concurrent.futures import ThreadPoolExecutor + +import grpc + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer + +STRING_TO_API_STATE = { +'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED, +'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED, +'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING, +} + + +class InteractiveStreamController(InteractiveServiceServicer): + def __init__(self, endpoint, streaming_cache): +self._endpoint = endpoint +self._server = grpc.server(ThreadPoolExecutor(max_workers=2)) +beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server( +self, self._server) +self._server.add_insecure_port(self._endpoint) +self._server.start() + +self._streaming_cache = streaming_cache +self._state = 'STOPPED' +self._playback_speed = 1.0 + + def Start(self, request, context): +"""Requests that the Service starts emitting elements. +""" + +self._next_state('RUNNING') +self._playback_speed = request.playback_speed or 1.0 +self._playback_speed = 1.0 / max(min(self._playback_speed, 100.0), 0.1) +return beam_interactive_api_pb2.StartResponse() + + def Stop(self, request, context): +"""Requests that the Service stop emitting elements. +""" +self._next_state('STOPPED') +return beam_interactive_api_pb2.StartResponse() + + def Pause(self, request, context): +"""Requests that the Service pause emitting elements. +""" +self._next_state('PAUSED') +return beam_interactive_api_pb2.PauseResponse() + + def Step(self, request, context): +"""Requests that the Service emit a single element from each cached source. +""" +self._next_state('STEP') +return beam_interactive_api_pb2.StepResponse() + + def Status(self, request, context): +"""Returns the status of the service. +""" +resp = beam_interactive_api_pb2.StatusResponse() +resp.stream_time.GetCurrentTime() +resp.state = STRING_TO_API_STATE[self._state] +return resp + + def _reset_state(self): +self._reader = None +self._playback_speed = 1.0 +self._state = 'STOPPED' + + def _next_state(self, state): +if self._state == 'STOPPED': + if state == 'RUNNING' or state == 'STEP': +self._reader = self._streaming_cache.reader() +elif self._state == 'RUNNING': + if state == 'STOPPED': +self._reset_state() +self._state = state + + def Events(self, request, context): +# The TestStream will wait until the stream starts. +while self._state != 'RUNNING' and self._state != 'STEP': + time.sleep(0.01) + +events = self._reader.read() +if events: + for e in events: +# Here we assume that the first event is the processing_time_event so +# that we can sleep and then emit the element. Thereby, trying to +# emulate the original stream. +if e.HasField('processing_time_event'): + sleep_duration = ( + e.processing_time_event.advance_duration * self._playback_speed + ) * 10**-6 + time.sleep(sleep_duration) +yield beam_interactive_api_pb2.EventsResponse(events=[e]) +else: + resp =
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=332116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332116 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 22/Oct/19 17:49 Start Date: 22/Oct/19 17:49 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r337661276 ## File path: sdks/python/apache_beam/testing/interactive_stream.py ## @@ -0,0 +1,125 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import time +from concurrent.futures import ThreadPoolExecutor + +import grpc + +from apache_beam.portability.api import beam_interactive_api_pb2 +from apache_beam.portability.api import beam_interactive_api_pb2_grpc +from apache_beam.portability.api.beam_interactive_api_pb2_grpc import InteractiveServiceServicer + +STRING_TO_API_STATE = { +'STOPPED': beam_interactive_api_pb2.StatusResponse.STOPPED, +'PAUSED': beam_interactive_api_pb2.StatusResponse.PAUSED, +'RUNNING': beam_interactive_api_pb2.StatusResponse.RUNNING, +} + + +class InteractiveStreamController(InteractiveServiceServicer): + def __init__(self, endpoint, streaming_cache): +self._endpoint = endpoint +self._server = grpc.server(ThreadPoolExecutor(max_workers=2)) +beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server( +self, self._server) +self._server.add_insecure_port(self._endpoint) +self._server.start() + +self._streaming_cache = streaming_cache +self._state = 'STOPPED' +self._playback_speed = 1.0 + + def Start(self, request, context): +"""Requests that the Service starts emitting elements. +""" + +self._next_state('RUNNING') +self._playback_speed = request.playback_speed or 1.0 +self._playback_speed = 1.0 / max(min(self._playback_speed, 100.0), 0.1) +return beam_interactive_api_pb2.StartResponse() + + def Stop(self, request, context): +"""Requests that the Service stop emitting elements. +""" +self._next_state('STOPPED') +return beam_interactive_api_pb2.StartResponse() + + def Pause(self, request, context): +"""Requests that the Service pause emitting elements. +""" +self._next_state('PAUSED') +return beam_interactive_api_pb2.PauseResponse() + + def Step(self, request, context): +"""Requests that the Service emit a single element from each cached source. +""" +self._next_state('STEP') +return beam_interactive_api_pb2.StepResponse() + + def Status(self, request, context): +"""Returns the status of the service. +""" +resp = beam_interactive_api_pb2.StatusResponse() +resp.stream_time.GetCurrentTime() +resp.state = STRING_TO_API_STATE[self._state] +return resp + + def _reset_state(self): +self._reader = None +self._playback_speed = 1.0 +self._state = 'STOPPED' + + def _next_state(self, state): +if self._state == 'STOPPED': + if state == 'RUNNING' or state == 'STEP': +self._reader = self._streaming_cache.reader() +elif self._state == 'RUNNING': + if state == 'STOPPED': +self._reset_state() +self._state = state + + def Events(self, request, context): +# The TestStream will wait until the stream starts. +while self._state != 'RUNNING' and self._state != 'STEP': + time.sleep(0.01) Review comment: Ack, changed to 0.25. I want to balance the feeling of "snappiness" and not wasting the user's CPU here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332116) Time Spent: 6h 20m (was: 6h 10m) > Add streaming support to Interactive Beam >